Page 1 of 1

Scheduler activations

Posted: Tue May 06, 2008 10:42 pm
by yang
Hi, does anyone understand why scheduler activations have not caught on? They sound reasonable to me, but according to Wikipedia, were introduced then removed in both NetBSD and FreeBSD, and Ulrich Drepper maintains a page with a foreword implying that scheduler activations are not viable (but with no further elaboration). Wikipedia hints that they are more complex to implement - is this the principal reason behind their lack of success? Any details on that?

Re: Scheduler activations

Posted: Wed May 07, 2008 10:15 pm
by Brendan
Hi,
yang wrote:Hi, does anyone understand why scheduler activations have not caught on?
There's three different ways to implement threads that are common....

The first way is to implement threads entirely in user-space (e.g. a library), so that the kernel isn't involved at all. The problem here is that if one thread asks the kernel to do something and the kernel blocks (e.g. "open()" where the kernel needs to wait for disk I/O), then all threads within the process end up blocked because the kernel doesn't know about the other threads.

The second way is to implement threads entirely in the kernel. This solves the problem above (e.g. if one thread is blocked by the kernel, the kernel can still make other threads in the same process run). The problem here is performance - for e.g. to switch threads there's additional CPL=3 -> CPL=0 -> CPL=3 switching, and the kernel's scheduler is often more complex as it needs to handle everything.

Now, the idea of scheduler activations seems to be to implement threads entirely in user-space, but when the kernel needs to block the process it doesn't - instead it tells the user-space threading code that it was going to block the process, and the user-space threading code does a thread switch or something so that the entire process doesn't become blocked. I'd assume the kernel also tells the user-space threading code when a blocking operation completes (so the user-space threading code can unblock a blocked thread).

This sounds like it'd give the performance of implementing threads in user-space without the problems. However, I'd expect it'd cause additional problems. Specifically I'd be worried about race conditions (especially in a multi-CPU environment), as it'd be difficult to make blocking and unblocking a thread appear as an atomic operation (as the operation is split between the kernel and user-space code), which would make things like semaphores difficult to implement reliably (without giving user-space code too much control and creating security problems).

Of course I did say there's three different ways to implement threads that are common, and I've only described 2 of these ways so far. What's the third way?

The third way is to implement threads in the kernel *and* in user space, and to map user-level threads onto kernel-level threads. If a user-level thread needs to do something that would block then only one kernel-level thread would block (the kernel-level thread that the user-level thread was mapped onto at the time), the remaining user-level threads can be mapped onto the remaining kernel-level threads, and you could create more kernel-level threads if you need to. In this case there's no "kernel needs to block" problem, most thread switches can be done by faster user-level code without the "CPL=3 -> CPL=0 -> CPL=3" and other overhead, and the kernel can still make thread switches appear atomic (as the kernel is in full control of the kernel-level thread switches).

This also means that a kernel developer can implement threads entirely in the kernel; and different processes can decide to use kernel threads only (one kernel thread per user-level thread), use a mixture of user kernel-level threads and user-level threads (many user-level threads mapped onto several kernel-level threads), or use user-level threads only (one kernel-level thread for all user-level threads).

Note: Typically all of the above is hidden by a library (e.g. "pthreads") so that a programmer doesn't need to care how (or where) threading is implemented.

Basically, AFAIK scheduler activations are a complex mess and there's better ways to solve the same problems.


Cheers,

Brendan

Posted: Wed May 07, 2008 11:12 pm
by yang
Thanks a lot for the reply. I'm not sure I fully understand your description of the third alternative, however. When you say:
If a user-level thread needs to do something that would block then only one kernel-level thread would block (the kernel-level thread that the user-level thread was mapped onto at the time), the remaining user-level threads can be mapped onto the remaining kernel-level threads, and you could create more kernel-level threads if you need to.
How do you tell that a user-level thread is causing a kernel-level thread to block? Scheduler activations notify the user precisely so that the user can map onto remaining kernel level threads. Scheduler activations are also known as M:N threading, which sounds like what you were trying to get at....

Posted: Thu May 08, 2008 1:07 am
by lukem95
the problem with userland task switching is also that a programming can debug and hex edit their program so that it will not switch where needed, or crash the whole OS when run. If a virus was developed that make a slight alteration to the main drivers, so that they crashed at random points, the OS would need a total reinstall.

Posted: Thu May 08, 2008 3:29 am
by Brendan
Hi,
yang wrote:Thanks a lot for the reply. I'm not sure I fully understand your description of the third alternative, however. When you say:
If a user-level thread needs to do something that would block then only one kernel-level thread would block (the kernel-level thread that the user-level thread was mapped onto at the time), the remaining user-level threads can be mapped onto the remaining kernel-level threads, and you could create more kernel-level threads if you need to.
How do you tell that a user-level thread is causing a kernel-level thread to block?
It's easy to find out which kernel functions *might* block, and then assume they will block. For a silly example, consider a kernel API function that makes a thread sleep - the "sleep()" function in the library doesn't need a scheduler activation to know that the kernel function will probably block and doesn't need another scheduler activation to know when the kernel function returns.

There's probably better, more complex ways (for e.g. consider the O_NONBLOCK file I/O mode and the EWOULDBLOCK error code).
yang wrote:Scheduler activations notify the user precisely so that the user can map onto remaining kernel level threads. Scheduler activations are also known as M:N threading, which sounds like what you were trying to get at....
Scheduler activations use "virtual processors", where a virtual processor corresponds directly to a physical processor. For e.g. rather than having M threads mapped to N processes mapped to O CPUs, you get M threads mapped to O CPUs with no kernel-level threads.


Cheers,

Brendan

Posted: Thu May 08, 2008 7:58 am
by bewing
The only inherent advantage to it that I can see is that the userland job gets to manage the affinities of groups of subthreads -- because it has superior knowledge of those affinities.

As Brendan said, the other advantages depend entirely on if there is a lot of overhead with a kernel thread switch. Now, this happens to be true on the current x86 architecture ... but that is a flaw in the chip design, not an inherent feature of kernel thread switching.

The big drawback seems to be that it gives far too much control over task priority to a userland app. (And that it adds another level of signalling/IPC to deal with.)

And really, all this "user thread" stuff is just about avoiding blocking on the main execution thread of a job. When the code is written in the first place, the programmer should be able to clearly tell which calls are non-blocking library calls. Any call that is not a non-blocking library call should be spun off as a separate kernel thread that can block. It is the OS designer's job to make sure that spinning off kernel threads has very little overhead. If a userland job wants to abstract all of its internal functions as "threads" with an "execution queue", then that is its problem -- but abstracting things in that way adds overhead.

Posted: Thu May 08, 2008 11:12 am
by yang
Brendan wrote:Hi,
yang wrote:Thanks a lot for the reply. I'm not sure I fully understand your description of the third alternative, however. When you say:
If a user-level thread needs to do something that would block then only one kernel-level thread would block (the kernel-level thread that the user-level thread was mapped onto at the time), the remaining user-level threads can be mapped onto the remaining kernel-level threads, and you could create more kernel-level threads if you need to.
How do you tell that a user-level thread is causing a kernel-level thread to block?
It's easy to find out which kernel functions *might* block, and then assume they will block.
But this only works for system calls and other explicit forms of blocking - on which point I somewhat agree, though these folks would argue the LAIO style of performing non-blocking operations inline is better. What about implicitly blocking issues such as paging?

Posted: Thu May 08, 2008 11:34 am
by Brendan
Hi,
yang wrote:But this only works for system calls and other explicit forms of blocking - on which point I somewhat agree, though these folks would argue the LAIO style of performing non-blocking operations inline is better.
Doh - I get a 404 from that URL...

Note: For my OS I prefer doing everything with asynchronous messaging, using "kernel-level only" threads (partly because of how I do thread local storage - each thread has it's own address space where part of that address space is shared between all threads that belong to the same process, and therefore switching between threads that belong to the same process is as expensive as switching between threads that belong to different processes).
yang wrote:What about implicitly blocking issues such as paging?
In this case (for M:N threading) I'd assume that one kernel-level thread and one user-level thread would be blocked, leaving you with "M-1:N-1" threading. For e.g. with 3 kernel-level threads being used by 6 user-level threads, if one thread implicitly blocks you'd end up with 2 kernel-level threads being used by 5 user-level threads.

Worst case would be if your thread is holding a re-entrancy lock when it implicitly blocks, causing other threads to wait for your thread to be unblocked (and for the lock to be freed).


Cheers,

Brendan