Dynamic growing of ring 0 stacks

Andy1988 · Post by **Andy1988** » Tue Nov 09, 2010 3:17 pm

Hello,
I'm trying to dynamically grow my kernel thread stacks. I create a stack which has an initial size of one page. As soon as a page fault occurs, I want to grow the stack or report a stack overflow for this thread if the stack already reached the maximum size.

The problem is, that kernel threads are running in ring 0. As soon as a an instruction tries to write to an unmapped page of the stack a page fault occurs which then tries to push the error code, eflags, cs and eip to the stack which consequently results in a double fault which also cannot push these values to the stack which leads to a triple fault. Sad thing!
OK, so I need to switch the stack as soon as a fault occurs in any of the kernel threads as you would do in userspace using a TSS.

But since the kernel threads are already running in ring 0 there is no need for the CPU to change the stacksegment and stackpointer to the SS0 and ESP0 values in the TSS. At least that's how I understand the usage of the TSS when using software scheduling.
Is there any way to handle this without moving the kernel threads to a lower privileged level?

Hangin10 · Post by **Hangin10** » Tue Nov 09, 2010 3:36 pm

Use a task gate for the double fault handler. That way you can use a known good stack to handle the double fault. However, if you are using a kernel stack per thread, that would make things overcomplicated (IMO). How much are you going to need to do during one interrupt? Linux uses 8KB for its kernel stack size. I think I just saw a thread recently where someone said they were thinking of reducing it.

As far as I know Linux does not expand its kernel stacks, so if I giant monolithic mess like Linux doesn't need much stack space, perhaps you can do better?

Although if you are making a x86-64 SMP kernel, pushing the entire SSE state on the stack uses alot more space than pushad used to

.

Andy1988 · Post by **Andy1988** » Tue Nov 09, 2010 3:54 pm

Hangin10 wrote:Use a task gate for the double fault handler. That way you can use a known good stack to handle the double fault. However, if you are using a kernel stack per thread, that would make things overcomplicated (IMO). How much are you going to need to do during one interrupt? Linux uses 8KB for its kernel stack size. I think I just saw a thread recently where someone said they were thinking of reducing it.

To be honest, I don't know how much I'll need. I don't even know if this dynamic growing is worth the hassle.
I got the scheduler working for kernel tasks and I'm currently enhancing my virtual memory management which, I think, got rather advanced.
I can easily create several memory spaces, fill them with regions of different kinds (stacks, demand paged, swapped, memory mapped io, etc.) which all inherit from each other, clone them, set them as a current space and so on.
And the only use case for the demand paging I currently have are the the stacks, so I decided to implement this for the stacks first.

Also, it took me almost 2 days to figure out why this thing triple faults instead of executing my page fault handler

I only got the idea after patching custom debugging messages into bochs to understand this. Now it makes perfect sense

Hangin10 wrote:As far as I know Linux does not expand its kernel stacks, so if I giant monolithic mess like Linux doesn't need much stack space, perhaps you can do better?

Yes, they have static sizes. At least that's what I found with Google.

But I'll try the task gate, thanks!

pcmattman · Post by **pcmattman** » Tue Nov 09, 2010 4:31 pm

Generally speaking it shouldn't be necessary to expand kernel/ring0 stacks. Drivers and other ring0 services should be using the kernel heap for most allocations, and rarely end up with a large stack frame (there are exceptions to this, of course).

In userspace however it's a different story - but it's also easier to create growing stacks in userspace. I don't yet having growing stacks but everything is in place ready for it to be implemented: you end up with the stacks in high memory, with a specific growth limit (so they don't grow too far), and the heap in lower memory, and they just grow towards each other. The heap growth doesn't usually depend on page faults, the stack growth usually does.

Dynamic growth of stacks isn't perfect though - it is possible to end up with a lot of growth in a very short time that creates a flood of #PFs (each of which locks out your physical and virtual memory managers - nasty performance hit on MP systems!). To get around this you might use speculative mapping, which is still not perfect because you may well speculate that you need to allocate, say, 16 KB as a result of a #PF when the application only really wanted 4 KB. On the other hand, the advantage is reduced memory usage for software which only uses a few KB of stack space and doesn't need that big 2 MB static stack you would give it if no dynamic stack growth was available.

It's one of those design decisions that has a variety of impacts across the system - and can really bite later if you aren't aware of the caveats!

OSDev.org

Dynamic growing of ring 0 stacks

Dynamic growing of ring 0 stacks

Re: Dynamic growing of ring 0 stacks

Re: Dynamic growing of ring 0 stacks

Re: Dynamic growing of ring 0 stacks