Double fault TSS problems

xeyes · Post by **xeyes** » Tue Feb 02, 2021 1:29 am

sj95126 wrote:
xeyes wrote:
sj95126 wrote: For example, if you've swapped out your division-by-zero handler, then a division exception would trigger a page fault. You'd be expecting this and can recover as you designed it to.
Isn't this a page fault not double fault?
No, it would be a double fault. If your handler is swapped out, either you've cleared the present flag in the IDT (P=0), or the address of the handler isn't valid (P=0), or both, but either way you're going to get a second exception because the handler cannot be executed.

Were you able to get the CPU to call your double fault handler by setting 0 as IDT entry 0's address and do a div by 0?

I tried sometime ago and the CPU only called my page fault handler, have to use the following to reliably force double fault:

Code: Select all

xor %%esp, %%esp; ret

xeyes · Post by **xeyes** » Tue Feb 02, 2021 1:32 am

nullplan wrote:
sj95126 wrote:For example, if you've swapped out your division-by-zero handler, then a division exception would trigger a page fault. You'd be expecting this and can recover as you designed it to. Note that I'm not necessarily advocating for swapping out your exception handlers, just that you could.
Note that that is a terrible idea: The double fault is an abort type exception, and therefore the return address given in the interrupt frame is invalid (unpredictable). Therefore, once the double fault handler is invoked, it must not return before setting that address to a know good value. The address of the faulting instruction in this case would be lost.
xeyes wrote:That's why I'm curious is DF simply unrecoverable in all cases.
No, only most of them. See below for details.
nexos wrote:IMO doing this way allows for Double Faults to be handled cleanly. Linux and ReactOS do this, so it must be considered good practice
Linux only handles very specific double faults, and panics for all others. In particular, it handles double faults occurring while on an ESPFIX stack. Since interrupts are disabled when the read-only ESPFIX stack is loaded, pretty much the only way to fail is if the IRET itself fails. And that can only happen due to invalid addresses in the interrupt frame, which can only happen because the program did something stupid, and therefore those double faults just emulate a general protection fault due to bad IRET.

This is not a concern for those of us that don't allow 16-bit code to run at all in their OSes, and so this use case disappears. My kernel only panics on double fault.

Thanks for sharing, that sounds like a really exotic case.

sj95126 · Post by **sj95126** » Tue Feb 02, 2021 12:16 pm

xeyes wrote:Were you able to get the CPU to call your double fault handler by setting 0 as IDT entry 0's address and do a div by 0?

Err, ok, you got me. I answered without testing actual code first.

It turns out my answer was a little too specific. A #DE that triggers a #PF does not result in a double fault, because of the classification of #DE as a "contributory" exception. For those who were brain-fading on the specifics of this like I was, it's detailed in the Intel manuals, volume 3A, chapter 6, tables 6-4 and 6-5.

However, as #DE leading to a #NP *does* generate a double fault (I just tested it), my answer was still mostly correct. There theoretically could be a reason to design your code so that #DF is not a fatal event. But it's not really a good idea.

I'll stop arguing semantics now.

xeyes · Post by **xeyes** » Tue Feb 02, 2021 2:17 pm

sj95126 wrote:
xeyes wrote:Were you able to get the CPU to call your double fault handler by setting 0 as IDT entry 0's address and do a div by 0?
Err, ok, you got me. I answered without testing actual code first.

It turns out my answer was a little too specific. A #DE that triggers a #PF does not result in a double fault, because of the classification of #DE as a "contributory" exception. For those who were brain-fading on the specifics of this like I was, it's detailed in the Intel manuals, volume 3A, chapter 6, tables 6-4 and 6-5.

However, as #DE leading to a #NP *does* generate a double fault (I just tested it), my answer was still mostly correct. There theoretically could be a reason to design your code so that #DF is not a fatal event. But it's not really a good idea.

I'll stop arguing semantics now.

I asked simply because I had a comment above the xor esp esp code saying "div0 doesn't work for this it calls PF handler" so got curious about your example. As I could very well had set up something incorrectly back then. What is #NP though?

Yup I'm all set on letting the DF handler always panic.

nullplan · Post by **nullplan** » Tue Feb 02, 2021 2:38 pm

xeyes wrote:Thanks for sharing, that sounds like a really exotic case.

It's an x86 CPU misfeature: When returning from interrupt from 32-bit or 64-bit mode to 16-bit mode, the high 16 bits of ESP don't get updated and leak. Malicious code could be trying to use those bits for some purpose. It would at least say where the current task's kernel stack is. It doesn't really matter to normal applications, as 16-bit mode also has 16-bit stacks, and therefore the high bits of ESP are ignored, but the info-leak is still a problem. In 32-bit mode, Linux uses some clever math and a stack segment with a nonzero base address to set the high bits ESP to what userspace had them at without actually moving the stack. But in 64-bit mode that trick is unavailable. So instead they try to randomize the bits in question. At boot time, an ESPFIX stack is allocated (48 bytes) and mapped twice, once writable and once read-only. And at least the read-only version is mapped to a random virtual address, with a spread so large that all of the 16-bits in question are affected by it. Before returning to 16-bit mode, Linux copies the interrupt frame to the writable version of the ESPFIX stack, then loads the read-only version into ESP and performs an IRET. If it works, great, if not, you get a double fault. See the wiki page on CPU Bugs for more info, although this is basically all of it.

sj95126 · Post by **sj95126** » Tue Feb 02, 2021 2:49 pm

xeyes wrote:I asked simply because I had a comment above the xor esp esp code saying "div0 doesn't work for this it calls PF handler" so got curious about your example. As I could very well had set up something incorrectly back then.
Sorry, that was snark in my own direction. I spend several posts insisting that you could recover from double faults, followed by insisting that it was a bad idea. I didn't need to be so persistent.
What is #NP though?

#NP is a "segment not present" exception. If the P (present) flag in an IDT entry is set to 0, and the relevant exception is triggered, it will result in #NP (which sometimes results in a double fault).

OSDev.org

Double fault TSS problems

Re: Double fault TSS problems

Re: Double fault TSS problems

Re: Double fault TSS problems

Re: Double fault TSS problems

Re: Double fault TSS problems

Re: Double fault TSS problems