Hi,
gerryg400 wrote:I've been thinking about this for a while. You'd need one kernel stack per CPU (and could pre-allocate them rather than dynamically allocate them).... etc.
Brendan, have you actually decided to go this way or are you still thinking about it ? I've decided. It's one kstack per core for me.
I've decided to implement something like this for the next version of my kernel. Once I've implemented it I'll decide if I'll keep it or not.
OSwhatever wrote:I was thinking that you simply use a lock less stack storing stack candidates, if the stack is empty you allocate a new stack from the kernel heap. Upon kernel entry you pluck a kernel stack from the lock stack and push it back it when leaving. This will be more code for a system call but still feasible performance wise I think. If the the thread is preempted in the kernel it simply keeps the kernel stack during the time it is blocked.
That would work too. It'd have slightly higher overhead and use more RAM, and make it easier to support kernel threads and pre-emption.
FlashBurn wrote:I also like the idea of having just one kernelstack per core, but I don´t like the idea that my kernel is not preemptive.
OSwhatever wrote:The "global kernel lock" was something that was ridiculed in the earlier days of Linux. I'm not sure if you can seriously sell a non re-entrant kernel today. With the current multicore trend it really motivates it even more.
You're both getting confused.
For "kernel stack per CPU", all CPUs can be running kernel code at the same time. There is no "big kernel lock" and it's not a "non re-entrant kernel".
The "no pre-emption" disadvantage would only mean that task switches/pre-emption that occurs while a thread is running in kernel space would be postponed until the kernel has finished what it's doing (and is ready to return to user-space). For example, imagine a thread calls the kernel to allocate some memory. For a "fully pre-emptable" kernel, an IRQ can occur while the kernel is allocating the memory the thread wanted and the task switch can happen immediately; while for "kernel stack per CPU" the task switch would be postponed until after the memory has been allocated (and the kernel is ready to return to user space).
In my experience, it's not quite like that though. For a "fully pre-emptable" kernel that supports SMP, when it's allocating memory the kernel would have to acquire some sort of lock (more likely is acquiring multiple locks - e.g. one for the process' address space and one for a "free page stack"). The locks would be some type of "busy waiting" lock, because (if there's contention) waiting/spinning is going to be a lot faster and more efficient than expensive task switches. In this case, allowing task switches while the lock is held is a very bad idea - instead, if an IRQ occurs while the "fully pre-emptable" kernel is allocating memory, the task switch is postponed until after the kernel releases the lock.
For my previous kernels I made this an implicit part of acquiring any lock. Acquiring a lock would cause a "locks held" variable to be increased, and if a task switch occurs while "locks held != 0" the scheduler would set a "task switches postponed" flag and won't do the task switch. Whenever a lock is released it decrements the "locks held" variable, and if the variable became zero it checks the "task switches postponed" flag and does the postponed task switch then.
Of course "when the kernel releases the lock" is often when the kernel is finished doing what it's doing and is about to return to user space. In this case, "fully pre-emptable" is very similar to "no pre-emption" anyway.
Not all things are like that though. There's about 3 different groups of kernel API functions:
- functions that are so simple that they can be completed without acquiring any lock (including things like "get_process_ID()" and things that can be done with lock-free or wait-free code). These are typically so fast that nobody cares anyway.
- functions that acquire one or more locks at the same time (where in practice "fully pre-emptable" is very similar to "no pre-emption" anyway)
- functions that are lengthy/expensive (e.g. creating and destroying processes). In previous kernels I've been especially careful and split these up into pieces to ensure that there's "gaps" where no locks are held (and pre-emption can occur). I've even deliberately released a lock and then acquired the exact same lock immediately after releasing it in a few places. For "no pre-emption" you'd do the same thing - split lengthy/expensive operations into pieces (you just split things in a different way - e.g. explicit "pre-emption points" or some sort of "kernel helper" in user space).
Basically, for a micro-kernel, in practice "fully pre-emptable" ends up being very similar to a "no pre-emption" anyway - the "no pre-emption" version would only be slightly less pre-emptable.
Again, for a monolithic kernel (where lots of kernel functions involve messing about with device drivers, etc and a lot of them end up being lengthy/expensive), I wouldn't be considering something like this. For a micro-kernel, the only lengthy/expensive functions are process creation and destruction, thread creation and destruction, and the part of virtual memory management that scans the virtual address space (and decides which pages are good candidates to send to swap space).
Cheers,
Brendan