Do Threads Remain the Best Concurrency Abstraction?
Posted: Fri May 11, 2007 10:02 pm
In my humble opinion, no.
In the beginning, we used processes -- address spaces, protection domains and units of concurrent execution packaged into one. Then we separated the concurrent execution from the address space and protection domain for performance reasons, yielding threads.
But processes and threads come with their costs: elaborate scheduling systems. No two operating systems have yet to employ exactly the same scheduler, and researchers invent newer, "better" scheduling systems all the time. They have to invent new ones, because no scheduler schedules perfectly for fairness, throughput and interactivity all at once. It doesn't help that we often lay upon our schedulers the responsibility of knowing the status of I/O operations and performing various kinds of optimized inter-process communication (such as synchronous message-passing with time-slice donation).
Finally, processes/threads lack elegance. Because such systems either have or simulate parallelism, operations on processes/threads must fall into one of two groups: those valid for suspended threads and those valid for running threads. However, the operating system must often try to present the set of all threads as homogeneous, with a single fixed set of operations valid at all times.
We have used kernel-level processes and threads for decades because it seemed the most obvious way to run a time-sharing system and even worked for truly parallel systems, and damn programmatic or theoretical elegance.
However, higher-level programming languages that continue to grow in popularity for their expressiveness suggest another possible solution. Time-sharing, after all, works by interrupting the running program and saving its run-time state (such as program counter and registers) into a data structure (often the kernel-level stack). Theoretically speaking, that saved state constitutes a one-shot continuation of the interrupted program. Indeed, switching threads in a standard system involves saving the continuation of the old thread into that thread's data structure, and then loading and throwing to a new continuation from the new thread.
And if a suspended thread becomes a saved continuation, what does one call the running thread? I propose that we create entities called virtual processors that abstract over a logical processor (ie: whatever the CPU says is a processor) and the data structures describing the currently-running program. Short routines can convert between saved continuations and the contents of a logical processor by saving and loading machine state. Also, note that the kernel can create any number of virtual processors to represent any number of logical machine processors while adding only optional complexity to the kernel-level scheduler (see below, sorry).
We can also simply add some kind of value passed through on a continuation invocation, possibly the calling continuation, ala Scheme. After that, we just need an operation to construct continuations of arbitrary states from scratch and copy continuations to simulate multi-shot continuations.
Scheme programmers have demonstrated time and time again, after all, that one can use continuations to build a threading system in about a page of code in which each thread acts as a coroutine of the scheduler. Operating-system developers haven't used this idea because we haven't yet found a way to implement a preemptive concurrency system using continuations.
To implement concurrency, we add a type of objects called events. These contain any number of saved continuations for handlers, and support an operation to "fire" or "trigger" the event. When something fires an event, it dequeues all of its handlers and hands them to a scheduler that only needs to decide what kind of event-handler continuations can preempt what kind of running virtual processors. When the scheduler decide to preempt, it simply invokes ThrowCC(PREEMPTING CONTINUATION) and possibly checks the return value for errors or something. Giving the handler the preempted continuation allows it to function as a scheduler.
Note that events can serve as a convenient abstraction over machine interrupts. Making them such give us a very convenient translation from machine-level preemption to preemption in our concurrency model.
However, adding a continuation as an event handler with the currently given operations will run far too slowly for real use. Instead, we introduce an operation HandleEvent that adds the current continuation to the given event's set of handlers and then invokes a given continuation with a given value. This primitive can run much more efficiently.
So we end up with these primitive operations:
Throw(continuation,value) --> Throw never returns and it destroys the calling continuation.
ThrowCC(continuation) --> ThrowCC saves the current virtual-processor contents as a continuation and passes that continuation as the second parameter of Throw. If the passed continuation is ever invoked, ThrowCC() returns the value passed to Throw() for that continuation.
Construct(machine state) --> A saved continuation of the given machine state.
Copy(continuation) --> Copy returns a saved continuation with machine state identical to that of the given continuation, because the kernel should only copy continuations when necessary, rather than on every Throw.
Fire(event) --> Fire returns if the current computation at its call time ever runs again, but does not return a value. Fire triggers the given event, which invokes the kernel scheduler for the handlers and current computation.
HandleEvent(event,continuation,value) --> HandleEvent returns the value passed to the handler when it runs (read: the preempted continuation).
These objects and operations, implemented on a single-core system, will allow the creation of nearly-arbitrary concurrency systems in user-space, removing unneeded complexity from the kernel.
Finally, adding a single additional primitive operation will serve to bring this concurrency system into the truly parallel world of multi-core and multi-processor machines. Since a continuation cannot terminate itself without throwing to another continuation, a virtual processor can even run itself forever once started. One only needs a primitive to start an inactive virtual processor with a given continuation.
]i]StartProcessor(virtual processor,continuation,value) --> StartProcessor returns a success/error code indicating whether the given virtual processor was inactive or not. When the virtual processor was inactive, StartProcessor performs the internal operations of the kernel and machine to make the logical processor corresponding to the inactive virtual processor start computing with the given continuation, passing it the given value.[/i]
I hope this sorta-kinda essay on threads, continuations, and concurrency has provided everyone with some new, interesting seeds of thought. Flames about my opinion on threads, however, can go to Gehenna.
In the beginning, we used processes -- address spaces, protection domains and units of concurrent execution packaged into one. Then we separated the concurrent execution from the address space and protection domain for performance reasons, yielding threads.
But processes and threads come with their costs: elaborate scheduling systems. No two operating systems have yet to employ exactly the same scheduler, and researchers invent newer, "better" scheduling systems all the time. They have to invent new ones, because no scheduler schedules perfectly for fairness, throughput and interactivity all at once. It doesn't help that we often lay upon our schedulers the responsibility of knowing the status of I/O operations and performing various kinds of optimized inter-process communication (such as synchronous message-passing with time-slice donation).
Finally, processes/threads lack elegance. Because such systems either have or simulate parallelism, operations on processes/threads must fall into one of two groups: those valid for suspended threads and those valid for running threads. However, the operating system must often try to present the set of all threads as homogeneous, with a single fixed set of operations valid at all times.
We have used kernel-level processes and threads for decades because it seemed the most obvious way to run a time-sharing system and even worked for truly parallel systems, and damn programmatic or theoretical elegance.
However, higher-level programming languages that continue to grow in popularity for their expressiveness suggest another possible solution. Time-sharing, after all, works by interrupting the running program and saving its run-time state (such as program counter and registers) into a data structure (often the kernel-level stack). Theoretically speaking, that saved state constitutes a one-shot continuation of the interrupted program. Indeed, switching threads in a standard system involves saving the continuation of the old thread into that thread's data structure, and then loading and throwing to a new continuation from the new thread.
And if a suspended thread becomes a saved continuation, what does one call the running thread? I propose that we create entities called virtual processors that abstract over a logical processor (ie: whatever the CPU says is a processor) and the data structures describing the currently-running program. Short routines can convert between saved continuations and the contents of a logical processor by saving and loading machine state. Also, note that the kernel can create any number of virtual processors to represent any number of logical machine processors while adding only optional complexity to the kernel-level scheduler (see below, sorry).
We can also simply add some kind of value passed through on a continuation invocation, possibly the calling continuation, ala Scheme. After that, we just need an operation to construct continuations of arbitrary states from scratch and copy continuations to simulate multi-shot continuations.
Scheme programmers have demonstrated time and time again, after all, that one can use continuations to build a threading system in about a page of code in which each thread acts as a coroutine of the scheduler. Operating-system developers haven't used this idea because we haven't yet found a way to implement a preemptive concurrency system using continuations.
To implement concurrency, we add a type of objects called events. These contain any number of saved continuations for handlers, and support an operation to "fire" or "trigger" the event. When something fires an event, it dequeues all of its handlers and hands them to a scheduler that only needs to decide what kind of event-handler continuations can preempt what kind of running virtual processors. When the scheduler decide to preempt, it simply invokes ThrowCC(PREEMPTING CONTINUATION) and possibly checks the return value for errors or something. Giving the handler the preempted continuation allows it to function as a scheduler.
Note that events can serve as a convenient abstraction over machine interrupts. Making them such give us a very convenient translation from machine-level preemption to preemption in our concurrency model.
However, adding a continuation as an event handler with the currently given operations will run far too slowly for real use. Instead, we introduce an operation HandleEvent that adds the current continuation to the given event's set of handlers and then invokes a given continuation with a given value. This primitive can run much more efficiently.
So we end up with these primitive operations:
Throw(continuation,value) --> Throw never returns and it destroys the calling continuation.
ThrowCC(continuation) --> ThrowCC saves the current virtual-processor contents as a continuation and passes that continuation as the second parameter of Throw. If the passed continuation is ever invoked, ThrowCC() returns the value passed to Throw() for that continuation.
Construct(machine state) --> A saved continuation of the given machine state.
Copy(continuation) --> Copy returns a saved continuation with machine state identical to that of the given continuation, because the kernel should only copy continuations when necessary, rather than on every Throw.
Fire(event) --> Fire returns if the current computation at its call time ever runs again, but does not return a value. Fire triggers the given event, which invokes the kernel scheduler for the handlers and current computation.
HandleEvent(event,continuation,value) --> HandleEvent returns the value passed to the handler when it runs (read: the preempted continuation).
These objects and operations, implemented on a single-core system, will allow the creation of nearly-arbitrary concurrency systems in user-space, removing unneeded complexity from the kernel.
Finally, adding a single additional primitive operation will serve to bring this concurrency system into the truly parallel world of multi-core and multi-processor machines. Since a continuation cannot terminate itself without throwing to another continuation, a virtual processor can even run itself forever once started. One only needs a primitive to start an inactive virtual processor with a given continuation.
]i]StartProcessor(virtual processor,continuation,value) --> StartProcessor returns a success/error code indicating whether the given virtual processor was inactive or not. When the virtual processor was inactive, StartProcessor performs the internal operations of the kernel and machine to make the logical processor corresponding to the inactive virtual processor start computing with the given continuation, passing it the given value.[/i]
I hope this sorta-kinda essay on threads, continuations, and concurrency has provided everyone with some new, interesting seeds of thought. Flames about my opinion on threads, however, can go to Gehenna.