Double fault TSS problems
Double fault TSS problems
Hello,
This is my first question for my new OS! First, I am taking more time to structure it, and I think it is structured well. Anyway, I have just finished making my GDT and IDT code. This was the first time I based it solely on the Intel SDM. So, I decided that I would have a separate TSS for double faults. Executing int $0x08 works, the handler gets called. To try to replicate the most common reason for double faults (invalid kernel stack), I tried NULLing ESP and then executing an int. Logically, the double fault should get called, and then a task switch would occur to my TSS. It triple faulted on doing this! Everything looks okay in Bochs, but I have no clue whats going on. It acting like it is not a task gate at all, when it it is task gate in the IDT. The relevant code it at https://github.com/Nexware-Project/micr ... 6/cpu/i386
Thanks,
nexos
This is my first question for my new OS! First, I am taking more time to structure it, and I think it is structured well. Anyway, I have just finished making my GDT and IDT code. This was the first time I based it solely on the Intel SDM. So, I decided that I would have a separate TSS for double faults. Executing int $0x08 works, the handler gets called. To try to replicate the most common reason for double faults (invalid kernel stack), I tried NULLing ESP and then executing an int. Logically, the double fault should get called, and then a task switch would occur to my TSS. It triple faulted on doing this! Everything looks okay in Bochs, but I have no clue whats going on. It acting like it is not a task gate at all, when it it is task gate in the IDT. The relevant code it at https://github.com/Nexware-Project/micr ... 6/cpu/i386
Thanks,
nexos
Re: Double fault TSS problems
Sounds like an invalid TSS.
[I have contributed all my knowledge to this answer.]
[I have contributed all my knowledge to this answer.]
Skylight: https://github.com/austanss/skylight
I make stupid mistakes and my vision is terrible. Not a good combination.
NOTE: Never respond to my posts with "it's too hard".
I make stupid mistakes and my vision is terrible. Not a good combination.
NOTE: Never respond to my posts with "it's too hard".
Re: Double fault TSS problems
No, it isn't an invalid TSS as it works, only not when needed. Oh well, for now, I will just make it reboot on double fault and come back to that later.
Re: Double fault TSS problems
There are multiple problems. The first one being that the DF TSS is a local variable. You also need one more TSS, which will be the normally active one.
Re: Double fault TSS problems
Really? Didn't know that. I will go try this and see what happens. The local variable would also be a problem as well. Didn't notice that.Gigasoft wrote:There are multiple problems. The first one being that the DF TSS is a local variable. You also need one more TSS, which will be the normally active one.
Edit - after fixing that and fixing another bug where I didn't set CR3 in the TSS, it now works. Thanks for your help!
Re: Double fault TSS problems
I've been curious about whether the handler do anything to 'recover' from this?
Like try to fix the back linked TSS and IRET to it or try to kill the faulted user program and move onto something else.
Does double fault always mean that the kernel itself is buggy and that panic is the only way out?
Like try to fix the back linked TSS and IRET to it or try to kill the faulted user program and move onto something else.
Does double fault always mean that the kernel itself is buggy and that panic is the only way out?
Re: Double fault TSS problems
Not necessarily - a double fault may be the expected result depending on how you've implemented your handlers.xeyes wrote:Does double fault always mean that the kernel itself is buggy and that panic is the only way out?
For example, if you've swapped out your division-by-zero handler, then a division exception would trigger a page fault. You'd be expecting this and can recover as you designed it to. Note that I'm not necessarily advocating for swapping out your exception handlers, just that you could.
I don't recommend something like this, for the fairly simple reason that *usually* a double fault in the kernel is very bad, and often unrecoverable, and you're better off not making the double fault handler too complicated so it can do the important job of shutting down as responsibly as possible - crash dump, etc. etc. Once you're in a double fault, it's best not to push your luck too far.
Re: Double fault TSS problems
The reason why is because double faults normally occur because the kernel stack is invalid, then a page fault triggers next time a push or pop occurs. In kernel mode, the CPU tries to push the state on the stack, which is invalid, hence triggering a double fault. The double fault handler needs a valid stack, hence it does a triple fault. By using a task gate for double faults which points to a TSS, the double fault handler can cleanly trap these issues, hence making debugging a little simpler.
Re: Double fault TSS problems
Isn't this a page fault not double fault?sj95126 wrote: For example, if you've swapped out your division-by-zero handler, then a division exception would trigger a page fault. You'd be expecting this and can recover as you designed it to.
Re: Double fault TSS problems
Most DF occurrences I've seen is the page fault handler itself page faults and recursively uses the kernel stack until reaching an un-mapped page.nexos wrote:The reason why is because double faults normally occur because the kernel stack is invalid, then a page fault triggers next time a push or pop occurs. In kernel mode, the CPU tries to push the state on the stack, which is invalid, hence triggering a double fault. The double fault handler needs a valid stack, hence it does a triple fault. By using a task gate for double faults which points to a TSS, the double fault handler can cleanly trap these issues, hence making debugging a little simpler.
It doesn't make much sense to fix a stack trashed by a runaway PF (or other) handler as it's too hard to figure out what went wrong and what can be done when running as the DF handler.
That's why I'm curious is DF simply unrecoverable in all cases.
Re: Double fault TSS problems
No, it would be a double fault. If your handler is swapped out, either you've cleared the present flag in the IDT (P=0), or the address of the handler isn't valid (P=0), or both, but either way you're going to get a second exception because the handler cannot be executed.xeyes wrote:Isn't this a page fault not double fault?sj95126 wrote: For example, if you've swapped out your division-by-zero handler, then a division exception would trigger a page fault. You'd be expecting this and can recover as you designed it to.
Re: Double fault TSS problems
IMO doing this way allows for Double Faults to be handled cleanly. Linux and ReactOS do this, so it must be considered good practice
Re: Double fault TSS problems
Note that that is a terrible idea: The double fault is an abort type exception, and therefore the return address given in the interrupt frame is invalid (unpredictable). Therefore, once the double fault handler is invoked, it must not return before setting that address to a know good value. The address of the faulting instruction in this case would be lost.sj95126 wrote:For example, if you've swapped out your division-by-zero handler, then a division exception would trigger a page fault. You'd be expecting this and can recover as you designed it to. Note that I'm not necessarily advocating for swapping out your exception handlers, just that you could.
No, only most of them. See below for details.xeyes wrote:That's why I'm curious is DF simply unrecoverable in all cases.
Linux only handles very specific double faults, and panics for all others. In particular, it handles double faults occurring while on an ESPFIX stack. Since interrupts are disabled when the read-only ESPFIX stack is loaded, pretty much the only way to fail is if the IRET itself fails. And that can only happen due to invalid addresses in the interrupt frame, which can only happen because the program did something stupid, and therefore those double faults just emulate a general protection fault due to bad IRET.nexos wrote:IMO doing this way allows for Double Faults to be handled cleanly. Linux and ReactOS do this, so it must be considered good practice
This is not a concern for those of us that don't allow 16-bit code to run at all in their OSes, and so this use case disappears. My kernel only panics on double fault.
Carpe diem!
Re: Double fault TSS problems
Right, which is why I said I don't recommend it.nullplan wrote:Note that that is a terrible idea: The double fault is an abort type exception, and therefore the return address given in the interrupt frame is invalid (unpredictable). Therefore, once the double fault handler is invoked, it must not return before setting that address to a know good value. The address of the faulting instruction in this case would be lost.sj95126 wrote:For example, if you've swapped out your division-by-zero handler, then a division exception would trigger a page fault. You'd be expecting this and can recover as you designed it to. Note that I'm not necessarily advocating for swapping out your exception handlers, just that you could.
Theoretically, if you had a multithreaded kernel, and it was a worker thread that double faulted, and you've properly stored the intermediate results of that worker thread, and/or could restart its task, then you could simply abandon that thread and you wouldn't care about not having the return address.
But as I said, IMO, the safest thing to do with a double fault is shut down (panic) as gracefully as possible as quickly as possible. It might have been something very important that failed to happen as a result of the double fault, and you likely would only make things much much worse trying to investigate and/or repair it.
Re: Double fault TSS problems
Note that when I say handle, that could mean a panic. In all cases, my double fault handler will panic. It is just to prevent triple faults.