Xv6 scheduler context - why?

thewrongchristian · **Joined:** Tue Apr 03, 2018 2:44 am **Posts:** 403

One thing that has intruiged me about Xv6 is its use of a scheduler thread per core.

When Xv6 is to schedule another process, it first switches context to the scheduler context, picks out the next process to execute, then switches context to that next process.

From the text, it says it does this so it doesn't stay in the old process context, which could be a problem if there is no next context to use.

In my kernel, I just have an idle thread context, which is always executable and never knowingly sleeps (it doesn't execute user code, it doesn't use file backed memory so it shouldn't sleep in page faults) so my schedule always has a thread to run next (I schedule threads, not processes).

I'm just wondering what other people use, an idle process/thread or a scheduler context?

And why?

For the idle process/thread, the benefits I see are:

Fewer context switches - Switching to another process/thread is always a single context switch, instead of two.
An idle process/thread can do stuff that you'd rather do at idle. In my kernel, I run my garbage collector in the idle thread. But you could do things like clean unused pages, buillding up a cache of zeroed pages so they can be quickly allocated as is when needed.

For a separate scheduler context, the benefits I see are:

I like the separation of running the scheduler outside of any single process/thread context. If there truly is no other process/thread to schedule, you're not left in the context of the last process/thread, which could be racking up CPU usage unnecessarily.
You can ease restrictions on idle level processes/threads, so they may sleep, but can still do the idle work you want them to do when there is nothing else to do.

In terms of costs, the only cost difference I see between the two is the extra context switch required for the scheduler thread concept. If the actual context switch is cheap, and in my kernel it is the cost of a setjmp/longjmp, then an extra context switch may be insignificant.

Of course, there is more to a process context than just the state managed by setjmp/longjmp. A user level process will have an address space as well. It may be using floating point registers. Of course, those can be handled with lazy switching in both scenarios, so I don't see that as an issue, as the scheduler context will have neither floating point state nor a user process address space, so we can use lazy switching there as well.

I think I might have just talked myself into a Xv6 style scheduler thread, but curious what other people think or use?

AptRock327 · **Joined:** Tue Sep 12, 2023 12:41 pm **Posts:** 2

I think your idle thread solution is very interesting conceptually. And it's true that it does seem to have an advantage of only one context switch.

However, I'm personally more of a fan of the scheduler having its own context, like any other process.
It offers separation for scheduler's potential data (and its retrieval) and gives a sense of properness.

But I guess it boils down to what a specific scheduler really does and what does it need.

rdos · **Joined:** Wed Oct 01, 2008 1:55 pm **Posts:** 3195

I have both. I have a scheduler context (rather a stack) per core. I wouldn't say it's switched to as a context switch. It's more that the scheduler will save the registers of the current thread in the thread control block (if needed), load the scheduler stack, decide which thread to run next, and then load the registers for the new thread from it's thread control block.

The idle threads are something else. It's a normal thread (per core) in the system process that has the lowest priority. Therefore, it will be switched to when no other thread is ready on the current core. The idle thread will basically do hlt in a loop.

I also have a single system thread that makes more long-term decisions about moving threads between cores to balance load between cores. This thread is a normal but high priority thread that is scheduled just like any other thread.

xenos · **Posted:** Wed Mar 20, 2024 4:16 pm

My preferred approach is the following, which is not quite any of "scheduler context" or "idle thread", but rather a bit of each:

When a thread enters the kernel, execution continues with a per-core kernel stack. All registers are saved in the thread's state / context save area (which is not on the stack, but part of the thread data block). The only thing that is still owned by the thread (or rather its containing process) at this point is the page table (pointed to by CR3 if you are on x86).

When a new thread is selected for execution, CR3 (or whatever its equivalent on a different architecture) is switched to the new thread (unless it runs in the same process / memory space). The scheduler exits, and returns a pointer to the new thread's saved register state. Immediately before returning to user space, the new thread's registers are loaded from there.

All processes share the same kernel page table mappings, therefore changing CR3 while in the kernel does not do anything, unless you switch back to user space, where the different user space mapping takes over.

In addition, I have one kernel-only "master" page table, which is active at boot already, before there are any user threads, while the cores are initialized. When there is no new thread to run, CR3 is switched to this page table instead, and the scheduler returns a null pointer. Instead of loading any thread's registers, the CPU is parked in a hlt until it is woken up by an interrupt. So in this sense it is rather in an "idle state", but not quite an "idle thread". Also the kernel page table is not really a "scheduler context", since the kernel (not only the scheduler) uses it only at boot and idle, but not at every scheduler invocation.

rdos · **Joined:** Wed Oct 01, 2008 1:55 pm **Posts:** 3195

xenos wrote:

My preferred approach is the following, which is not quite any of "scheduler context" or "idle thread", but rather a bit of each:

When a thread enters the kernel, execution continues with a per-core kernel stack. All registers are saved in the thread's state / context save area (which is not on the stack, but part of the thread data block). The only thing that is still owned by the thread (or rather its containing process) at this point is the page table (pointed to by CR3 if you are on x86).

When a new thread is selected for execution, CR3 (or whatever its equivalent on a different architecture) is switched to the new thread (unless it runs in the same process / memory space). The scheduler exits, and returns a pointer to the new thread's saved register state. Immediately before returning to user space, the new thread's registers are loaded from there.

All processes share the same kernel page table mappings, therefore changing CR3 while in the kernel does not do anything, unless you switch back to user space, where the different user space mapping takes over.

In addition, I have one kernel-only "master" page table, which is active at boot already, before there are any user threads, while the cores are initialized. When there is no new thread to run, CR3 is switched to this page table instead, and the scheduler returns a null pointer. Instead of loading any thread's registers, the CPU is parked in a hlt until it is woken up by an interrupt. So in this sense it is rather in an "idle state", but not quite an "idle thread". Also the kernel page table is not really a "scheduler context", since the kernel (not only the scheduler) uses it only at boot and idle, but not at every scheduler invocation.

Seems rather similar to my solution. Maybe with the exception that my task save & restore operate on the kernel stack, and save ss:esp of kernel and not of userspace. That's a bit of a requirement for supporting kernel threads and forcing (or voluntary) scheduling of kernel threads or user threads that happen to run in kernel space.

The reason I use a null thread rather than using the scheduler stack has to do with the locks that are related to the scheduler stack, and also for "legacy" reasons as I've changed this a few times in the past. Actually, I used hardware task switching to begin with, and then I didn't have any scheduler context. I had to change this as multicore and hardware task switching doesn't go well together creating race conditions.

OSDev.org

Xv6 scheduler context - why?

Who is online