Hi all! 2nd thread, whooo!
In the midst of writing a task scheduler, I've decided that I would like to be able to create "kernel"-ring privileged threads, in addition to "normal" user-ring ones. I've already got all of the code for user-mode task-switching, and it is working perfectly (based off PIT interrupts). I can have multiple processes, they can make syscalls, tasks may enter a "sleep" state, and it's all really cool.
I encountered problems when I switched the descriptors for tasks I was making from user-PL to kernel-PL (CS and SS), expecting everything to run smoothly. As it turns out, when the CPU doesn't have to switch privilege levels after an interrupt, it doesn't push ESP or SS, nor will an IRET returning to the same PL pop them. This has several obvious problems, including the fact that interrupts will use the kernel-PL task's stack instead of the global kernel stack, etc.
I've had a good look at the manuals (as I hope you can tell) but I can't seem to find a feature of the x86 which could help me get around this problem.
Does anyone have any experience with doing this sort of thing already? Alternately, does anyone have any ideas?
Thanks in advance,
Keeley
P.S. As a side note, I'm quite aware that the Linux kernel does this already, but I haven't been able to identify how it's arch-dependant and independent code links together.
Kernel "Threads" on the x86
- escortkeel
- Posts: 9
- Joined: Mon Jan 28, 2013 4:46 am
- Location: Canberra, Australia
- Contact:
Kernel "Threads" on the x86
I'm Keeley Hoek. | Homepage | K-OS on GitHub
Re: Kernel "Threads" on the x86
Wrong. When changing privilege levels the CPU do stack switch before pushing/popping esp etc. Those push/pop are performed anyway, please read the manual.escortkeel wrote:I encountered problems when I switched the descriptors for tasks I was making from user-PL to kernel-PL (CS and SS), expecting everything to run smoothly. As it turns out, when the CPU doesn't have to switch privilege levels after an interrupt, it doesn't push ESP or SS, nor will an IRET returning to the same PL pop them.
It depends on your design. Well studied designs are one kernel-thread per user thread, or one kernel-thread per core.escortkeel wrote:including the fact that interrupts will use the kernel-PL task's stack instead of the global kernel stack, etc.
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: Kernel "Threads" on the x86
Why is having different kernel stacks a problem? My kernel does exactly that and goes along just fine with it.escortkeel wrote:This has several obvious problems, including the fact that interrupts will use the kernel-PL task's stack
No it's not. It requires Long Mode for interrupts to consistently push those stackframes. And given that the OP mentioned ESP rather than RSP, I think that's going to be a pretty poor assumption.bluemoon wrote:Those push/pop are performed anyway
That said, Long mode does provide the ability to choose a stack per-interrupt even in same-privilege interrupts. It's also annoyingly non-reentrant and I don't know if 64-bit processors are allowed to be a requirement.
Re: Kernel "Threads" on the x86
This example shows my approach. Any questions?
If you have seen bad English in my words, tell me what's wrong, please.
- escortkeel
- Posts: 9
- Joined: Mon Jan 28, 2013 4:46 am
- Location: Canberra, Australia
- Contact:
Re: Kernel "Threads" on the x86
Thanks everyone!
I've sorted out how everything is going to work now on paper (with a kernel-stack per user-task). Thanks for the design tips! The only dilemma that I have now is returning from a ring-0 stack to another ring-0 stack; I'll explain what I mean. Consider the following events:
Process 1 makes a syscall, and the CPU's stack is switched to Process 1's kernel stack through the TSS.
Process 1 is preempted while in kernel-land. This is fine, as the CPU's state has been dumped to Process 1's kernel stack due to the interrupt.
The "real" kernel (ISR) is now running inside of Process 1's kernel stack.
The "real" kernel decides that it would be a good idea to switch the current task to Process 2.
The "real" kernel tampers with the IRET stuff which was pushed to the stack during the interrupt so that everything points to Process 2. The TSS's ESP0 is set to Process 2's kernel stack.
The "real" kernel performs the IRET.
Process 2 is now running. Yay. It makes a syscall, and the CPU's stack is switched to Process 2's kernel stack through the TSS.
Process 2 is preempted while in kernel-land.
The "real" kernel (ISR) is now running inside of Process 2's kernel stack.
The "real" kernel decides that it would be a good idea to switch the current task back to Process 1.
The "real" kernel tampers with the IRET stuff which was pushed to the stack due to interrupt so that everything points to Process 2. The TSS's ESP0 is set to Process 1's kernel stack.
The "real" kernel performs the IRET.
>>> We now have a problem. Because the "real" kernel is returning to another _kernel_ ring piece of code, it won't switch ESP (from the IRET stuff on the stack). Hence, when the code handling Process 1's syscall gets control back and returns from the current function, it will be returning onto _Process 2_'s stack, which obviously won't end well.
I see two remotely viable solutions to this problem:
1. This one seems like the most stupid. Set up a fake user-mode proxy task so every task switch goes kernel->user->kernel and never kernel->kernel. I've never heard of anyone doing this, and is sounds super slow.
2. Instead of playing with the IRET-stuff created during the last interrupt and then returning on the current stack, I could switch stacks just before returning and craft the IRET-stuff on the target stack. That way, if we are remaining in kernel-land, ESP will remain correct.
Both solutions here seem to be dealing with unnecessary special-cases though. Obviously I'm doing something wrong, and might completely misunderstand the mechanisms behind interrupts and context switches.
So I'm interested, how do your kernels handle this situation?
I've sorted out how everything is going to work now on paper (with a kernel-stack per user-task). Thanks for the design tips! The only dilemma that I have now is returning from a ring-0 stack to another ring-0 stack; I'll explain what I mean. Consider the following events:
Process 1 makes a syscall, and the CPU's stack is switched to Process 1's kernel stack through the TSS.
Process 1 is preempted while in kernel-land. This is fine, as the CPU's state has been dumped to Process 1's kernel stack due to the interrupt.
The "real" kernel (ISR) is now running inside of Process 1's kernel stack.
The "real" kernel decides that it would be a good idea to switch the current task to Process 2.
The "real" kernel tampers with the IRET stuff which was pushed to the stack during the interrupt so that everything points to Process 2. The TSS's ESP0 is set to Process 2's kernel stack.
The "real" kernel performs the IRET.
Process 2 is now running. Yay. It makes a syscall, and the CPU's stack is switched to Process 2's kernel stack through the TSS.
Process 2 is preempted while in kernel-land.
The "real" kernel (ISR) is now running inside of Process 2's kernel stack.
The "real" kernel decides that it would be a good idea to switch the current task back to Process 1.
The "real" kernel tampers with the IRET stuff which was pushed to the stack due to interrupt so that everything points to Process 2. The TSS's ESP0 is set to Process 1's kernel stack.
The "real" kernel performs the IRET.
>>> We now have a problem. Because the "real" kernel is returning to another _kernel_ ring piece of code, it won't switch ESP (from the IRET stuff on the stack). Hence, when the code handling Process 1's syscall gets control back and returns from the current function, it will be returning onto _Process 2_'s stack, which obviously won't end well.
I see two remotely viable solutions to this problem:
1. This one seems like the most stupid. Set up a fake user-mode proxy task so every task switch goes kernel->user->kernel and never kernel->kernel. I've never heard of anyone doing this, and is sounds super slow.
2. Instead of playing with the IRET-stuff created during the last interrupt and then returning on the current stack, I could switch stacks just before returning and craft the IRET-stuff on the target stack. That way, if we are remaining in kernel-land, ESP will remain correct.
Both solutions here seem to be dealing with unnecessary special-cases though. Obviously I'm doing something wrong, and might completely misunderstand the mechanisms behind interrupts and context switches.
So I'm interested, how do your kernels handle this situation?
I'm Keeley Hoek. | Homepage | K-OS on GitHub
Re: Kernel "Threads" on the x86
For one kernel-stack per user thread design, scheduling is done on kernel-kernel level by switching stack pointers.
- escortkeel
- Posts: 9
- Joined: Mon Jan 28, 2013 4:46 am
- Location: Canberra, Australia
- Contact:
Re: Kernel "Threads" on the x86
@Bluemoon, thanks alot. Just for information, what is the alternative to a 1 kernel stack per user stack design? Is it documented on the wiki? (Well, obviously it is just one kernel stack for everything, but how would that work with full-preemption?).
Thanks again.
Thanks again.
I'm Keeley Hoek. | Homepage | K-OS on GitHub
Re: Kernel "Threads" on the x86
Some kernels use 1 kernel stack per core. There are some advantages and limitations to this approach but it works very well in a microkernel.escortkeel wrote:@Bluemoon, thanks alot. Just for information, what is the alternative to a 1 kernel stack per user stack design? Is it documented on the wiki? (Well, obviously it is just one kernel stack for everything, but how would that work with full-preemption?).
Thanks again.
Pre-emption in such a kernel is usually done at predefined pre-emption points that ensure that the kernel stack of the thread being pre-empted doesn't contain any important state information. If it does, that state information is saved somewhere.
If a trainstation is where trains stop, what is a workstation ?
Re: Kernel "Threads" on the x86
I can tell you what i do;
* Kernel have one process context (pagetables etc). Kernel have 0 threads on boot, but might have some at a later time.
* Each process i have 0..n threads.
* Each thread have a ring0 and ring3 stack. Inter privilege level switches and task switching is done trough TSS pr core / "hardware thread" and the interrupt handler stub.
System calls or interruptible interrupt handlers can be interrupted if specified. This can be controlled by APIC and some variables including enabling interrupts inside the gate. I also use some variable use to control nesting etc.
Syscalls can be synchronous (blocking) or asynchronous (non-blocking). Meaning they might return after the requested work is done, or return at once. Both can create kernel thread(s) meaning a new thread with a new stack in the kernel process is created. This way i can schedule and prioritize syscalls as well as other threads in same manner.
regards
Thomas
* Kernel have one process context (pagetables etc). Kernel have 0 threads on boot, but might have some at a later time.
* Each process i have 0..n threads.
* Each thread have a ring0 and ring3 stack. Inter privilege level switches and task switching is done trough TSS pr core / "hardware thread" and the interrupt handler stub.
System calls or interruptible interrupt handlers can be interrupted if specified. This can be controlled by APIC and some variables including enabling interrupts inside the gate. I also use some variable use to control nesting etc.
Syscalls can be synchronous (blocking) or asynchronous (non-blocking). Meaning they might return after the requested work is done, or return at once. Both can create kernel thread(s) meaning a new thread with a new stack in the kernel process is created. This way i can schedule and prioritize syscalls as well as other threads in same manner.
regards
Thomas