Do Threads Remain the Best Concurrency Abstraction?

Discussions on more advanced topics such as monolithic vs micro-kernels, transactional memory models, and paging vs segmentation should go here. Use this forum to expand and improve the wiki!
Crazed123
Member
Member
Posts: 248
Joined: Thu Oct 21, 2004 11:00 pm

Post by Crazed123 »

speal wrote:I've been trying to follow this discussion since the idea of an alternate to processes/threads is an appealing one to me. I wasn't familiar with continuations before, but I've tried to figure it out with some reading. Maybe I've misunderstood, but it seems to me that there's nothing inherently concurrent about a continuation. My assumption was that the behavior was much like a function call, except instead of returning a single value, the continuation will yield an intermediate value, and can be resumed at a later time to return another intermediate value. Is this correct?
Calling a continuation is resuming a continuation. Call/CC is a system call that reifies the current computation as a continuation and passes it to a given function.
If I'm right, I don't see how you can construct a system of concurrency from this, as the control flow is still in one place, but is operating in a different context when resuming a continuation.
Like Combuster said, a continuation system includes no native support for concurrency. However, it does include native (extremely easy to implement, since bookkeeping gets delegated to user-space) support for parallelism.

X separate processors running separate computations with a central (lock-protected or otherwise synchronized) location for storing continuations inside a "thread" abstraction with concurrency form an SMP/NUMA scheduler that runs on X processors, for any value of X. Making it run with an additional Y processors wouldn't even require recompiling the scheduler, because it can use synchronization to count the number of available processors at run-time. It would only require adding the processors to your machine and making sure that the kernel (which exports a continuation-based interface) can recognize and start those processors. The user-land scheduler will then simply count them and use them with no extra effort.

Basically, continuations are far more scalable as OS-level primitives than threads, since threads require that you recompile your kernel when you upgrade from 1 to 2 to 32 processors for best performance. A continuation-based kernel lets you either swap out a user-land scheduler on each upgrade or even just build one scheduler that detects the number of available processors and runs itself appropriately. The latter type of scheduler could trivially scale to very large numbers of processors.
speal
Member
Member
Posts: 43
Joined: Wed Mar 07, 2007 10:09 am
Location: Minneapolis, Minnesota
Contact:

Post by speal »

The scalability argument makes perfect sense. I like the idea. Is there some way to build some level of thread-like concurrency on top of continuations that doesn't hurt this scalability? The idea I mentioned before of resuming continuations during idle time and caching the results would be one case, but it's certainly a specialized one.
Crazed123
Member
Member
Posts: 248
Joined: Thu Oct 21, 2004 11:00 pm

Post by Crazed123 »

You use events. When a timer event occurs, the kernel runs the event handler by doing the equivalent of call/cc: it packages the interrupted computation into a continuation and passes that continuation to the event-handler continuation, running the event handler as the new current computation.

The event handler counts the tick, schedules, does its work, etc., and then decides on a new continuation to resume. If that new continuation differs from the old one, the old one can be discarded entirely or saved someplace. Then the scheduler throws to the new continuation, which resumes in the very kernel interrupt handler that originally interrupted it. Then the kernel does a return-from-interrupt and the new continuation runs.

Now, to emphasize, you can implement threads as data structures in a scheduler like the above with two states:

1) Running. The thread has no continuation stored in it.
2) Suspended. The thread contains a continuation.

The scheduler could differentiate multiple kinds of "suspended" (like ran out of timeslice, blocked on I/O, etc), but fundamentally that's your thread concurrency right there.

Now, the big question is how the kernel can present continuations and events as a coherent abstraction to the user that fits in with the rest of the system. Honestly, every design I invented ended up in Microkernel Hell, in which you have too many ultra-complicated low-level data structures infesting every place the user-land touches the kernel and the abstractions you came up with can't be extended by user-level programs (which was part of the point in the first place) because it provides a million different interfaces for 3 different things.
Avarok
Member
Member
Posts: 102
Joined: Thu Aug 30, 2007 9:09 pm

Post by Avarok »

:D Your post on page four was vastly better than it's predecessors. Well said.

That said, my threading implementation doesn't require any knowledge whatsoever of how many processors are operating except a possible removing locks for non-SMP environments; so it's not inherently flawed. To be honest, I never realized this was beneficial and still did it this way.
the old one can be discarded entirely or saved someplace. Then the scheduler throws to the new continuation, which resumes in the very kernel interrupt handler that originally interrupted it. Then the kernel does a return-from-interrupt and the new continuation runs.
I keep three pools now for stack pointers. Running ones, Ready ones, and Sleeping ones. The "stack pointers" store a pointer to the stack after saving execution state. This is done by:

Code: Select all

push fs
push gs
fxsave _bla_
pushf;
pusha;
Saving CR3 (memory isolation) is an option too, which makes a thread a process.

No thread is ever active by two CPU's because it gets moved off the Ready pool when it's switched into. CPU's just go to the Ready pool when their timer goes off. The cool thing is, there are no critical sections in my context switch code; just two CMPXCHG.

I'm currently in love with this strategy, as it's pre-emptive, efficient, atomic, simple, and everything I'm wanting seems to come inherently from it's design.

The major challenge in my head so far has been IPC / Shared Memory / Semaphores / Locks / SMP stuff.
There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies.
- C. A. R. Hoare
Meor
Posts: 13
Joined: Fri Mar 14, 2008 11:29 am

Post by Meor »

I'll throw my hat in the ring and say threads with STM/HTM will be the best concurrency model.

Transactions are encapsulation/information hiding applied to concurrency.
User avatar
binutils
Member
Member
Posts: 214
Joined: Thu Apr 05, 2007 6:07 am

Post by binutils »

Say, single process in 15nm circuit? or 3000 threads in 60nm circuit?
User avatar
binutils
Member
Member
Posts: 214
Joined: Thu Apr 05, 2007 6:07 am

Post by binutils »

Post Reply