Do each process has its own kernel stack? or we just need single kernel stack to be initialized?
Recently, I've implemented software multitasking(before user mode implemented), where all registers stored to stack on task switching.
But its now getting nightmare when I know Intel change SS and ESP from TSS as kernel stack on interrupt (i'm using single TSS).
Could you give me sugestions, please?
[SOLVED] Single Kernel Stack Or Allocate for Each Process?
[SOLVED] Single Kernel Stack Or Allocate for Each Process?
Last edited by irvanherz on Wed Jan 04, 2017 1:39 am, edited 1 time in total.
Re: Single Kernel Stack Or Allocate for Each Process?
yes, most people do use a separate kernel stack per thread, while it is possible to do it with only one stack, most people use a separate stack -- you don't need multiple TSSs, just patch the one TSS with the new thread's stack address on task switch -- just replace the TSS.ESP0 with the new TSS.ESP0 and your good
first you store the current context, then you switch stacks (current ESP, TSS.ESP0, and CR3), then you restore context -- its really very simple
to do this, you need to store ESP0 somewhere other than the stack, so that it can be restored from another thread, without needing to first know where the stack is (usually this will be stored in a thread structure that holds various information about the thread -- including the ESP0 and CR3 for the thread)
first you store the current context, then you switch stacks (current ESP, TSS.ESP0, and CR3), then you restore context -- its really very simple
to do this, you need to store ESP0 somewhere other than the stack, so that it can be restored from another thread, without needing to first know where the stack is (usually this will be stored in a thread structure that holds various information about the thread -- including the ESP0 and CR3 for the thread)
Re: Single Kernel Stack Or Allocate for Each Process?
On the top of that, I suggest you to allocate special stacks for double faults, and non maskable interrupts .
they will be used when you need to handle something bad or hardware watchdogs
they will be used when you need to handle something bad or hardware watchdogs
Re: Single Kernel Stack Or Allocate for Each Process?
Hi,
The main advantages of "single kernel stack per CPU" are:
Essentially; "single kernel stack per CPU" tends to favour micro-kernels, and "single kernel stack per thread" tends to favour monolithic kernels.
Cheers,
Brendan
There's advantages and disadvantages.irvanherz wrote:Could you give me sugestions, please?
The main advantages of "single kernel stack per CPU" are:
- It reduces memory consumed by kernel stacks (e.g. with 4 KiB per kernel stack, 1000 threads and 8 CPUs it'd be "4 MiB vs. 32 KiB").
- It improves the efficiency of CPU's caches, because each CPU's kernel stack is likely to remain in that CPU's caches.
- It's slightly easier to optimise kernel stacks for NUMA (the "kernel stack that this CPU is using was allocated for a different NUMA domain" problem).
- It's faster when 2 or more things that would cause task switches occur between "kernel entry" and "kernel exit" (e.g. between SYSCALL and SYSEXIT, or between an interrupt handler starting and IRET). This is because you end up saving user-space thread state at "kernel entry" and restoring user-space thread state at "kernel exit", so things that cause task switches to occur between "kernel entry" and "kernel exit" just cause a "which thread to return to" variable to be changed.
- It's faster when nothing causes a task switch between "kernel entry" and "kernel exit". This is because you're not saving so much user-space thread state at kernel entry or restoring it at kernel exit.
- It's much easier to handle kernel pre-emption (e.g. when kernel is in the middle of doing something lengthy/expensive and a high priority task unblocks, then you can switch to the high priority task). For "single kernel stack per CPU" you either end up with poor latency (because you can't preempt kernel and more important things have to wait) and/or end up using special/explicit "synch points" (where lengthy/expensive things are broken up into multiple smaller things separated by some sort of "should I switch to something else now that I've completed that last smaller thing" check).
Essentially; "single kernel stack per CPU" tends to favour micro-kernels, and "single kernel stack per thread" tends to favour monolithic kernels.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: Single Kernel Stack Or Allocate for Each Process?
Kernel stack is mapped in kernel space, right?yes, most people do use a separate kernel stack per thread, while it is possible to do it with only one stack, most people use a separate stack -- you don't need multiple TSSs, just patch the one TSS with the new thread's stack address on task switch -- just replace the TSS.ESP0 with the new TSS.ESP0 and your good
I think, it will be dangerous when stack overflow happens.
Thanks for your sugestions.On the top of that, I suggest you to allocate special stacks for double faults, and non maskable interrupts .
they will be used when you need to handle something bad or hardware watchdogs
OK, I'll prefer single kernel stack per thread. But, is 4KB enough for kernel in servicing the task? Or, should I switch another stack when kernel need to do some hard works?Essentially; "single kernel stack per CPU" tends to favour micro-kernels, and "single kernel stack per thread" tends to favour monolithic kernels.
-
- Member
- Posts: 283
- Joined: Mon Jan 03, 2011 6:58 pm
Re: Single Kernel Stack Or Allocate for Each Process?
4KB was used as it is the smallest space a stack can take on x86 due to pages being 4KB (At there smallest). You obviously should provide demand paging (or something else) to allow the stack to grow as much as is needed.irvanherz wrote:OK, I'll prefer single kernel stack per thread. But, is 4KB enough for kernel in servicing the task? Or, should I switch another stack when kernel need to do some hard works?
- Monk
Re: Single Kernel Stack Or Allocate for Each Process?
Hi,
Also note that (especially if you're expecting to need larger stacks and/or support IRQ nesting) it's probably worth considering using additional stacks for IRQ handlers. For example; ignoring IRQs your worst case might be 3 KiB, and with IRQ nesting (where one IRQ handler interrupts another that interrupted another) the actual worst case might be 6 KiB. Instead of having 8 KiB of kernel stack for every thread (to cope with "worst case including IRQ nesting"), maybe you only need 4 KiB for each thread plus a "4 KiB per CPU" stack that is only used by IRQ handlers (where you switch to the special "IRQ only stack" at the start of an IRQ handler). Note: This is something that Linux does.
If you're considering dynamically resized stacks (e.g. allocating more if you get a page fault because your existing stack needs to be larger); be extremely careful. The basic problem is something like (e.g.) you run out of stack and get a page fault, but then the CPU triple faults because there isn't enough stack to start the page fault handler. However; there's a huge number of corner cases and variations beyond just that basic scenario. For one random example; maybe something else causes a page fault, the page fault handler starts but is immediately (before it can save CR2) interrupted by an NMI, then the CPU runs out of stack space while trying to start the NMI handler and that causes a second page fault (which overwrites the previous value in CR2 that wasn't saved yet); and now you have to dig your way out of "trouble, 3 layers deep".
Cheers,
Brendan
For "lean and mean micro-kernel in assembly", I've used 2 KiB kernel stacks without any problem. For monolithic you'd want larger, and for C (or worse, C++) you'd want larger. The best way is to worry about it later - take an educated guess (or just use something huge for now), and then measure it to find out what you actually do need. Typically (for measurement) it's enough to pre-fill the stack/s with a magic value (e.g. "0xF00DBABE") and let the system run for a while, then check to see how many of those magic values got overwritten (and then add some more for "just in case").irvanherz wrote:OK, I'll prefer single kernel stack per thread. But, is 4KB enough for kernel in servicing the task? Or, should I switch another stack when kernel need to do some hard works?Essentially; "single kernel stack per CPU" tends to favour micro-kernels, and "single kernel stack per thread" tends to favour monolithic kernels.
Also note that (especially if you're expecting to need larger stacks and/or support IRQ nesting) it's probably worth considering using additional stacks for IRQ handlers. For example; ignoring IRQs your worst case might be 3 KiB, and with IRQ nesting (where one IRQ handler interrupts another that interrupted another) the actual worst case might be 6 KiB. Instead of having 8 KiB of kernel stack for every thread (to cope with "worst case including IRQ nesting"), maybe you only need 4 KiB for each thread plus a "4 KiB per CPU" stack that is only used by IRQ handlers (where you switch to the special "IRQ only stack" at the start of an IRQ handler). Note: This is something that Linux does.
If you're considering dynamically resized stacks (e.g. allocating more if you get a page fault because your existing stack needs to be larger); be extremely careful. The basic problem is something like (e.g.) you run out of stack and get a page fault, but then the CPU triple faults because there isn't enough stack to start the page fault handler. However; there's a huge number of corner cases and variations beyond just that basic scenario. For one random example; maybe something else causes a page fault, the page fault handler starts but is immediately (before it can save CR2) interrupted by an NMI, then the CPU runs out of stack space while trying to start the NMI handler and that causes a second page fault (which overwrites the previous value in CR2 that wasn't saved yet); and now you have to dig your way out of "trouble, 3 layers deep".
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: Single Kernel Stack Or Allocate for Each Process?
I actually have both. Each thread has its own kernel stack so it can be preempted in kernel. Each core also has its own stack that is used by the scheduler. As soon as a thread blocks, the scheduler will switch to the per-core stack, and then the stack will be reloaded with a thread kernel stack once it schedules a new thread. This makes it possible distribute the SMP scheduler so it can run on all cores at the same time. I also have a TSS per thread, but I don't use hardware task switching anymore. I use software to read and write the registers in the TSS instead.
As somebody pointed out, there should be special handlers for double faults and stack faults that are TSS-based. Otherwise, tripple faults will occur when kernel stack is exhausted or bad.
As somebody pointed out, there should be special handlers for double faults and stack faults that are TSS-based. Otherwise, tripple faults will occur when kernel stack is exhausted or bad.
Re: Single Kernel Stack Or Allocate for Each Process?
OK, all was clear now. Thanks for you all..