Kernel "Threads" on the x86

escortkeel · Post by **escortkeel** » Mon Jul 01, 2013 5:12 am

Hi all! 2nd thread, whooo!

In the midst of writing a task scheduler, I've decided that I would like to be able to create "kernel"-ring privileged threads, in addition to "normal" user-ring ones. I've already got all of the code for user-mode task-switching, and it is working perfectly (based off PIT interrupts). I can have multiple processes, they can make syscalls, tasks may enter a "sleep" state, and it's all really cool.

I encountered problems when I switched the descriptors for tasks I was making from user-PL to kernel-PL (CS and SS), expecting everything to run smoothly. As it turns out, when the CPU doesn't have to switch privilege levels after an interrupt, it doesn't push ESP or SS, nor will an IRET returning to the same PL pop them. This has several obvious problems, including the fact that interrupts will use the kernel-PL task's stack instead of the global kernel stack, etc.

I've had a good look at the manuals (as I hope you can tell) but I can't seem to find a feature of the x86 which could help me get around this problem.

Does anyone have any experience with doing this sort of thing already? Alternately, does anyone have any ideas?

Thanks in advance,
Keeley

P.S. As a side note, I'm quite aware that the Linux kernel does this already, but I haven't been able to identify how it's arch-dependant and independent code links together.

bluemoon · Post by **bluemoon** » Mon Jul 01, 2013 5:16 am

escortkeel wrote:I encountered problems when I switched the descriptors for tasks I was making from user-PL to kernel-PL (CS and SS), expecting everything to run smoothly. As it turns out, when the CPU doesn't have to switch privilege levels after an interrupt, it doesn't push ESP or SS, nor will an IRET returning to the same PL pop them.

Wrong. When changing privilege levels the CPU do stack switch before pushing/popping esp etc. Those push/pop are performed anyway, please read the manual.

escortkeel wrote:including the fact that interrupts will use the kernel-PL task's stack instead of the global kernel stack, etc.

It depends on your design. Well studied designs are one kernel-thread per user thread, or one kernel-thread per core.

Combuster · Post by **Combuster** » Mon Jul 01, 2013 5:27 am

escortkeel wrote:This has several obvious problems, including the fact that interrupts will use the kernel-PL task's stack

Why is having different kernel stacks a problem? My kernel does exactly that and goes along just fine with it.

bluemoon wrote:Those push/pop are performed anyway

No it's not. It requires Long Mode for interrupts to consistently push those stackframes. And given that the OP mentioned ESP rather than RSP, I think that's going to be a pretty poor assumption.

That said, Long mode does provide the ability to choose a stack per-interrupt even in same-privilege interrupts. It's also annoyingly non-reentrant and I don't know if 64-bit processors are allowed to be a requirement.

egos · Post by **egos** » Mon Jul 01, 2013 7:06 am

This example shows my approach. Any questions?

escortkeel · Post by **escortkeel** » Tue Jul 02, 2013 4:36 am

Thanks everyone!

I've sorted out how everything is going to work now on paper (with a kernel-stack per user-task). Thanks for the design tips! The only dilemma that I have now is returning from a ring-0 stack to another ring-0 stack; I'll explain what I mean. Consider the following events:

Process 1 makes a syscall, and the CPU's stack is switched to Process 1's kernel stack through the TSS.
Process 1 is preempted while in kernel-land. This is fine, as the CPU's state has been dumped to Process 1's kernel stack due to the interrupt.

The "real" kernel (ISR) is now running inside of Process 1's kernel stack.
The "real" kernel decides that it would be a good idea to switch the current task to Process 2.
The "real" kernel tampers with the IRET stuff which was pushed to the stack during the interrupt so that everything points to Process 2. The TSS's ESP0 is set to Process 2's kernel stack.
The "real" kernel performs the IRET.

Process 2 is now running. Yay. It makes a syscall, and the CPU's stack is switched to Process 2's kernel stack through the TSS.
Process 2 is preempted while in kernel-land.

The "real" kernel (ISR) is now running inside of Process 2's kernel stack.
The "real" kernel decides that it would be a good idea to switch the current task back to Process 1.
The "real" kernel tampers with the IRET stuff which was pushed to the stack due to interrupt so that everything points to Process 2. The TSS's ESP0 is set to Process 1's kernel stack.
The "real" kernel performs the IRET.

>>> We now have a problem. Because the "real" kernel is returning to another _kernel_ ring piece of code, it won't switch ESP (from the IRET stuff on the stack). Hence, when the code handling Process 1's syscall gets control back and returns from the current function, it will be returning onto _Process 2_'s stack, which obviously won't end well.

I see two remotely viable solutions to this problem:

1. This one seems like the most stupid. Set up a fake user-mode proxy task so every task switch goes kernel->user->kernel and never kernel->kernel. I've never heard of anyone doing this, and is sounds super slow.

2. Instead of playing with the IRET-stuff created during the last interrupt and then returning on the current stack, I could switch stacks just before returning and craft the IRET-stuff on the target stack. That way, if we are remaining in kernel-land, ESP will remain correct.

Both solutions here seem to be dealing with unnecessary special-cases though. Obviously I'm doing something wrong, and might completely misunderstand the mechanisms behind interrupts and context switches.

So I'm interested, how do your kernels handle this situation?

bluemoon · Post by **bluemoon** » Tue Jul 02, 2013 5:01 am

For one kernel-stack per user thread design, scheduling is done on kernel-kernel level by switching stack pointers.

escortkeel · Post by **escortkeel** » Tue Jul 02, 2013 5:18 am

@Bluemoon, thanks alot. Just for information, what is the alternative to a 1 kernel stack per user stack design? Is it documented on the wiki? (Well, obviously it is just one kernel stack for everything, but how would that work with full-preemption?).

Thanks again.

gerryg400 · Post by **gerryg400** » Tue Jul 02, 2013 5:31 am

escortkeel wrote:@Bluemoon, thanks alot. Just for information, what is the alternative to a 1 kernel stack per user stack design? Is it documented on the wiki? (Well, obviously it is just one kernel stack for everything, but how would that work with full-preemption?).

Thanks again.

Some kernels use 1 kernel stack per core. There are some advantages and limitations to this approach but it works very well in a microkernel.

Pre-emption in such a kernel is usually done at predefined pre-emption points that ensure that the kernel stack of the thread being pre-empted doesn't contain any important state information. If it does, that state information is saved somewhere.

mutex · Post by **mutex** » Wed Jul 03, 2013 4:52 am

I can tell you what i do;

* Kernel have one process context (pagetables etc). Kernel have 0 threads on boot, but might have some at a later time.
* Each process i have 0..n threads.
* Each thread have a ring0 and ring3 stack. Inter privilege level switches and task switching is done trough TSS pr core / "hardware thread" and the interrupt handler stub.

System calls or interruptible interrupt handlers can be interrupted if specified. This can be controlled by APIC and some variables including enabling interrupts inside the gate. I also use some variable use to control nesting etc.

Syscalls can be synchronous (blocking) or asynchronous (non-blocking). Meaning they might return after the requested work is done, or return at once. Both can create kernel thread(s) meaning a new thread with a new stack in the kernel process is created. This way i can schedule and prioritize syscalls as well as other threads in same manner.

regards
Thomas

OSDev.org

Kernel "Threads" on the x86

Kernel "Threads" on the x86

Re: Kernel "Threads" on the x86

Re: Kernel "Threads" on the x86

Re: Kernel "Threads" on the x86

Re: Kernel "Threads" on the x86

Re: Kernel "Threads" on the x86

Re: Kernel "Threads" on the x86

Re: Kernel "Threads" on the x86

Re: Kernel "Threads" on the x86