Hi,
There's three different ways to implement threads that are common....
The first way is to implement threads entirely in user-space (e.g. a library), so that the kernel isn't involved at all. The problem here is that if one thread asks the kernel to do something and the kernel blocks (e.g. "open()" where the kernel needs to wait for disk I/O), then all threads within the process end up blocked because the kernel doesn't know about the other threads.
The second way is to implement threads entirely in the kernel. This solves the problem above (e.g. if one thread is blocked by the kernel, the kernel can still make other threads in the same process run). The problem here is performance - for e.g. to switch threads there's additional CPL=3 -> CPL=0 -> CPL=3 switching, and the kernel's scheduler is often more complex as it needs to handle everything.
Now, the idea of scheduler activations seems to be to implement threads entirely in user-space, but when the kernel needs to block the process it doesn't - instead it tells the user-space threading code that it was going to block the process, and the user-space threading code does a thread switch or something so that the entire process doesn't become blocked. I'd assume the kernel also tells the user-space threading code when a blocking operation completes (so the user-space threading code can unblock a blocked thread).
This sounds like it'd give the performance of implementing threads in user-space without the problems. However, I'd expect it'd cause additional problems. Specifically I'd be worried about race conditions (especially in a multi-CPU environment), as it'd be difficult to make blocking and unblocking a thread appear as an atomic operation (as the operation is split between the kernel and user-space code), which would make things like semaphores difficult to implement reliably (without giving user-space code too much control and creating security problems).
Of course I did say there's three different ways to implement threads that are common, and I've only described 2 of these ways so far. What's the third way?
The third way is to implement threads in the kernel *and* in user space, and to map user-level threads onto kernel-level threads. If a user-level thread needs to do something that would block then only one kernel-level thread would block (the kernel-level thread that the user-level thread was mapped onto at the time), the remaining user-level threads can be mapped onto the remaining kernel-level threads, and you could create more kernel-level threads if you need to. In this case there's no "kernel needs to block" problem, most thread switches can be done by faster user-level code without the "CPL=3 -> CPL=0 -> CPL=3" and other overhead, and the kernel can still make thread switches appear atomic (as the kernel is in full control of the kernel-level thread switches).
This also means that a kernel developer can implement threads entirely in the kernel; and different processes can decide to use kernel threads only (one kernel thread per user-level thread), use a mixture of user kernel-level threads and user-level threads (many user-level threads mapped onto several kernel-level threads), or use user-level threads only (one kernel-level thread for all user-level threads).
Note: Typically all of the above is hidden by a library (e.g. "pthreads") so that a programmer doesn't need to care how (or where) threading is implemented.
Basically, AFAIK scheduler activations are a complex mess and there's better ways to solve the same problems.
Cheers,
Brendan