Hey there,
I'm currently refactoring some of my kernel-code, and I came to the conclusion that my stack management is probably not the best. I'm writing a Microkernel, and my original approach was this: For each new thread started, the kernel allocated only a kernel-stack, the userspace was responsible for getting it's own stack. Then I began implementing IPC with popup-threads and figured that I needed a simple way to allocate a userspace-stack for them. So I started to use a bitmap for allocating a stack for the popup-threads.
However, I'm sure that my approach is far from being optimal. Not only does it probably lead to duplicated code, it also prevents me from using kernel-based stack-protection mechanisms. I then wondered how other hobby operating systems handle this, and started to poke around. Some just handed out page-sized chunks, starting from the top of the address space, and didn't care about freeing the memory (which would be especially bad for my kernel, since it uses popup-threads).
So, how would a good stack-management look like? Even if I don't plan to implement stack-protection mechanisms at the moment, I want to have the opportunity to do it later without reimplementing everything. Where would I place the stacks? And what if a thread needs more than a single page?
Another problem that bothers me is, when and how do I clean up threads? When a thread exits, the kernel uses the kernel-stack of the thread, so I can't immediately free it. But when do I do it, then? Should I let the scheduler check if there are threads that can be cleaned up, or do I add them to a list so the idle-thread does the cleanup (and what would happen if the CPU is kept busy)? Or should I switch to a one-stack-per-CPU-approach?
I'm interested in how other OSes handle these things, but the really small ones often don't handle this very well, and the big ones (Linux etc) are too complicated to understand without investing hours. If you have solved these problems in your own OS (or in one you know), I'd be thankfull for a link to the code.
Stack management
- darkinsanity
- Member
- Posts: 45
- Joined: Wed Sep 17, 2008 3:59 am
- Location: Germany
Re: Stack management
When my kernel allocates a user stack for a thread it calls the same function in my kernel that mmap eventually calls. In other words the allocated memory looks to the thread just like any memory that it had allocated itself. My mmap has an option MAP_GUARD to put a guard page at the bottom of the stack. The kernel uses his when it allocates a stack for the thread.
When a thread exits the kernel calls munmap on the region that it allocated for the stack.
I use the 'one kernel stack per core method' so disposing of kernel stacks is not an issue I need to deal with.
When a thread exits the kernel calls munmap on the region that it allocated for the stack.
I use the 'one kernel stack per core method' so disposing of kernel stacks is not an issue I need to deal with.
If a trainstation is where trains stop, what is a workstation ?
Re: Stack management
Hi,
To help you make that decision, the advantages are:
Cheers,
Brendan
I normally have a FIFO queue of "thread data blocks" (which includes the thread's kernel stack) for threads that have been terminated but not completely cleaned up yet. This solved a few problems. For my messaging, messages are sent to a thread, and if the same thread ID is re-used too quickly then software might not know the old thread was terminated and accidentally send messages to the new thread. The FIFO queue prevents that by ensuring the same thread ID isn't re-used too quickly. The other thing it helped with is resource usage. When a thread is being spawned the kernel checks that FIFO queue of "thread data blocks", and if the oldest one is old enough it's recycled (which helps performance a little because it's already allocated and mapped into kernel space). Finally; when the kernel is running low on free physical pages it goes looking for pages it can free; which includes checking the FIFO queue of "thread data blocks" and freeing them if they're old enough.darkinsanity wrote:Another problem that bothers me is, when and how do I clean up threads? When a thread exits, the kernel uses the kernel-stack of the thread, so I can't immediately free it. But when do I do it, then? Should I let the scheduler check if there are threads that can be cleaned up, or do I add them to a list so the idle-thread does the cleanup (and what would happen if the CPU is kept busy)?
Switching from one kernel stack per thread to one kernel stack per CPU involves changing semantics that a lot of the micro-kernel's existing code may rely on. You'll have to weigh up the advantages and disadvantages and decide if it's something you want to do.darkinsanity wrote:Or should I switch to a one-stack-per-CPU-approach?
To help you make that decision, the advantages are:
- Less memory usage
- Slightly faster spawning and terminating threads
- Better cache locality for kernel (e.g. less cache and TLB misses when kernel accesses its stack after a thread switch)
- No easy way to have "kernel threads" that do things like cleaning up kernel data and/or doing work before it's needed (pre-fetching, pre-calculating, whatever) when there's nothing more important to do.
- When the kernel is in the middle of doing something less important; there's no way to stop and switch to something more important (e.g. creating a new lower priority process when a high priority IRQ occurs). This means that to avoid potential latency problems you might end up splitting up the more expensive operations into multiple small pieces so that you've got a chance to switch to more important things in-between those smaller pieces. For example; rather than creating a new process in one go; you might create some sort of "process data block", check if there's something more important to switch to, create a new/empty virtual address space for the process, check if there's something more important to switch to, then create a new "thread data block" for the process' initial thread.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: Stack management
My approach is to put thread into PENDING_CLEANUP state and switch away from it, like if thread's time-slice was used up. But, as long as scheduling algorithm is concerned, PENDING_CLEANUP is no different from RUNNABLE.darkinsanity wrote:Another problem that bothers me is, when and how do I clean up threads? When a thread exits, the kernel uses the kernel-stack of the thread, so I can't immediately free it. But when do I do it, then?
Eventually scheduler selects the thread again, but instead of switching to it, cleanup is performed.
If something looks overcomplicated, most likely it is.
- darkinsanity
- Member
- Posts: 45
- Joined: Wed Sep 17, 2008 3:59 am
- Location: Germany
Re: Stack management
That sounds reasonable. But how does your function decide where to place the stack? At the moment, my kernel has no infrastructure for managing userspace memory besides pagetables.gerryg400 wrote:When my kernel allocates a user stack for a thread it calls the same function in my kernel that mmap eventually calls. In other words the allocated memory looks to the thread just like any memory that it had allocated itself. My mmap has an option MAP_GUARD to put a guard page at the bottom of the stack. The kernel uses his when it allocates a stack for the thread.
I'm currently thinking about using a doubly-linked list of structs describing memory areas (probably using page-granularity). When memory is allocated (via malloc or for a stack), I'd simply search the list for a free place and add it there. When allocating a stack, I'd search the list backwards, to keep the stacks away from heap allocations.
Implementation should be fairly easy, but I'm worried about efficiency, especially because I'm using popup-threads and thus need fast thread-spawning and stack allocation.
Re: Stack management
Why not allocate a page (or pages) for your user stack and map it to the same address for each process? You can always grow the stack, via a page fault, if needs be.
Re: Stack management
I use that method. I have a doubly linked list of mapping structures. The mm just finds a space and inserts a new structure. I don't bother with going backwards since there will be multiple threads in the process although it might help keep the heap contiguous. I'm also not too worried about speed at this point.darkinsanity wrote:That sounds reasonable. But how does your function decide where to place the stack? At the moment, my kernel has no infrastructure for managing userspace memory besides pagetables.gerryg400 wrote:When my kernel allocates a user stack for a thread it calls the same function in my kernel that mmap eventually calls. In other words the allocated memory looks to the thread just like any memory that it had allocated itself. My mmap has an option MAP_GUARD to put a guard page at the bottom of the stack. The kernel uses his when it allocates a stack for the thread.
I'm currently thinking about using a doubly-linked list of structs describing memory areas (probably using page-granularity). When memory is allocated (via malloc or for a stack), I'd simply search the list for a free place and add it there. When allocating a stack, I'd search the list backwards, to keep the stacks away from heap allocations.
Implementation should be fairly easy, but I'm worried about efficiency, especially because I'm using popup-threads and thus need fast thread-spawning and stack allocation.
Later if you find that speed is an issue you could keep a small cache of stacks available in each process and only release stacks if they are not re-used quickly.
If a trainstation is where trains stop, what is a workstation ?