Handling kernel stack overflow

StevensOsDevAccount · Post by **StevensOsDevAccount** » Fri May 27, 2011 8:22 pm

I'd like to handle stack overflows in the kernel.
I imagine I'd have a value at the end of the kernel's stack that has different permissions or something that raises an interrupt when the kernel's stack over flows.
I'm just going to have a simple error message and a crash when that happens for the kernel.
How do I get a pointer to the end of the stack and set up the interrupt handling for this?
I'm using gcc and gas.

bluemoon · Post by **bluemoon** » Fri May 27, 2011 8:35 pm

A common technique is to place a guard page, eg. a page that is marked not exist, on the boundary.
In case an access is made to the page, a fault occur and you know the address on CR2.

When a page fault occurs the CPU also push useful information on the stack,
for kernel thread it become an issue since the stack itself is not accessible, it become double fault.

To solve this problem you can use a task gate on the page fault handler so that there is always good stack ready for use.
There probably some other more genius solution, but task gate method should work.

StevensOsDevAccount · Post by **StevensOsDevAccount** » Fri May 27, 2011 8:40 pm

Aw shoot. So I have to set up paging then?
Maybe I'll delay this for a little while.
Edit:
P,S, I was thinking of calling my kernel NotAnOS. Has it been done?

bluemoon · Post by **bluemoon** » Fri May 27, 2011 8:56 pm

Paging is not that hard. And if you plan to use it, use it at the very beginning.

I would say for the scale of any OSdev project, a minimal paging(*) is considered trivial.
* with sbrk() or alike, and no swap, share, copy-on-write, explode-on-access, etc.

Those advanced features can be added once you have a basic kernel skeleton.

This is a pretty good tutorial: http://www.osdever.net/tutorials/view/m ... nagement-2
And don't forget the wiki.

Brendan · Post by **Brendan** » Fri May 27, 2011 9:27 pm

Hi,

StevensOsDevAccount wrote:I'd like to handle stack overflows in the kernel.
I imagine I'd have a value at the end of the kernel's stack that has different permissions or something that raises an interrupt when the kernel's stack over flows.
I'm just going to have a simple error message and a crash when that happens for the kernel.
How do I get a pointer to the end of the stack and set up the interrupt handling for this?
I'm using gcc and gas.

When invoking an interrupt/exception handler, the CPU only switches stacks when the privilege level is changed. This means any exception you use to detect kernel stack overflow would have to use a task gate to force a stack switch somewhere. Otherwise you will end up with a triple fault.

I can see 5 options:

Use segmentation, so that you get a general protection fault when the kernel overflows its stack, where all general protection faults are handled by a task gate to force a stack switch.
Use paging and a "guard page", so that you get a page fault when the kernel overflows its stack, where all page faults are handled by a task gate to force a stack switch.
Use option 1 or 2; but to avoid the overhead of unnecessary hardware task switching (don't forget some exceptions are normal and don't indicate that something went wrong - e.g. page faults for swapping), use an interrupt or trap gate for the exception and let it cause a double fault if there's a kernel stack overflow. In this case double fault handler has to use a task gate and detect the kernel stack overflow.
Use the debug registers, so that you get a debug exception when the kernel writes to a byte near the end of the stack (so there's still enough room on the stack for the debug exception handler after the debug exception has occurred). To be honest here, I'm not too sure if this avoids the "stack overflow when stack overflow detected" problem or not - you may or may not need to use a task gate to handle it.
Insert "stack checks" in every function. At the start of every function the kernel checks its ESP to make sure there's enough space and "does something" if there isn't enough stack space. The doesn't involve any exceptions and avoids task gates. It's also a bit messy - I'm not sure if it's possible to get GCC to auto-generate these stack checks.

Regardless of how you detect a kernel stack overflow, I'm not too sure what you're going to do when you do successfully detect a kernel stack overflow. Will you:

allocate more kernel stack space (and if so, why didn't you allocated enough to begin with?)
reset the computer (you could've done nothing and relied on triple faults for this)
attempt to do a kernel panic or something, and fail most of the time because the state of "everything" could be "anything" - e.g. locks could still be held, kernel structures could be inconsistent, etc
have a tiny OS embedded inside the kernel, so that the tiny OS can actually do something without relying on anything that existed before the kernel stack overflow was detected (where the "tiny OS" includes it's own drivers, etc; so it can reset the video and tell the user what happened, or write a core dump to disk, or...).

Typically, I prefer to write kernels that don't overflow their stack so that I don't need to worry...

Cheers,

Brendan

StevensOsDevAccount · Post by **StevensOsDevAccount** » Fri May 27, 2011 9:58 pm

Maybe I will just let it triple fault.
I just don't want wierd file corruption stuff.
Safest thing is to quit.
I'll sleep on this for today. Good night everybody.

rdos · Post by **rdos** » Sat May 28, 2011 1:36 am

Brendan wrote:I can see 5 options:
Use segmentation, so that you get a general protection fault when the kernel overflows its stack, where all general protection faults are handled by a task gate to force a stack switch.

Use paging and a "guard page", so that you get a page fault when the kernel overflows its stack, where all page faults are handled by a task gate to force a stack switch.

Use option 1 or 2; but to avoid the overhead of unnecessary hardware task switching (don't forget some exceptions are normal and don't indicate that something went wrong - e.g. page faults for swapping), use an interrupt or trap gate for the exception and let it cause a double fault if there's a kernel stack overflow. In this case double fault handler has to use a task gate and detect the kernel stack overflow.

In my design, when the kernel stack overflows, it would wrap-around and generate a stack fault. Stack faults is handled with a trap gate. Therefore, a kernel stack overflow would generate a double fault. The double fault is handled with a TSS (should be one TSS per CPU). To let pagefaults (or stackfaults) be handled with a TSS is not a good idea (they are common, especially page faults).

Brendan wrote:Regardless of how you detect a kernel stack overflow, I'm not too sure what you're going to do when you do successfully detect a kernel stack overflow. Will you:

allocate more kernel stack space (and if so, why didn't you allocated enough to begin with?)

reset the computer (you could've done nothing and relied on triple faults for this)

attempt to do a kernel panic or something, and fail most of the time because the state of "everything" could be "anything" - e.g. locks could still be held, kernel structures could be inconsistent, etc

have a tiny OS embedded inside the kernel, so that the tiny OS can actually do something without relying on anything that existed before the kernel stack overflow was detected (where the "tiny OS" includes it's own drivers, etc; so it can reset the video and tell the user what happened, or write a core dump to disk, or...).

Faults in kernel have two options. If they occur within scheduler-locked regions, they would invoke the crash debugger (which is almost like a tiny OS within the OS), while if they occur outside of scheduler-locked regions the thread responsible would be put in a debug list and blocked. The fault can then be inspected either in the crash-debugger or kernel debugger. If the system contains a watchdog device-driver, the machine will reboot on all faults (this is the "production" setup). In the production setup, there is an option to save fault-information at fixed disk sectors before rebooting, and then when the reboot is finished, read them and potentially transfer the state to a FTP-server for later inspection.

Just rebooting on kernel faults means these problems will never be solved, as reboots does not give any kind of information of what happened.

StevensOsDevAccount · Sat May 28, 2011 10:22 am

I like method 3 too if it makes performance better, (I'm writing all my interrupts in assembly for performance).

OSDev.org

Handling kernel stack overflow

Handling kernel stack overflow

Re: Handling kernel stack overflow

Re: Handling kernel stack overflow

Re: Handling kernel stack overflow

Re: Handling kernel stack overflow

Re: Handling kernel stack overflow

Re: Handling kernel stack overflow

Re: Handling kernel stack overflow