Double fault handler

Neuromancer · Post by **Neuromancer** » Mon Dec 13, 2004 10:25 am

Hi,
I have created a double fault handler in my OS. When a double fault exception occurs, a task switch takes place and switches to the handler. The task management works (i can jump manually to the handler with jmp <descriptor>:0> and the IDT setting for this interrupt is OK (i can call it manually with a int

.
I'd like to test it, and I set ESP to a dummy value (0xDEADBEEF) and since the page fault handler that should be called works on the stack, a double fault should occur. But the only think I get is a computer freeze (or reboot, but under QEMU everything freezes up).
Does anyone had this kind of problem and could help me?
Thanks.

Pype.Clicker · Post by **Pype.Clicker** » Mon Dec 13, 2004 11:19 am

hmm, just to make sure, you're deadbeef'ing ESP at DPL0 already ? if not, the page fault will reuse ESP0 instead of double-faulting ...

Candy · Post by **Candy** » Mon Dec 13, 2004 11:22 am

On what stack does the double fault handler push ITS information?

The Lazy Neuromancer · Tue Dec 14, 2004 8:52 am

The double fault handler uses a malloc'd stack, and its ESP is correct, since calling it directly works flawlessly.
By the way, Bochs does the Double fault correctly, but running it on a real machine makes it reboot.

Pype.Clicker · Post by **Pype.Clicker** » Tue Dec 14, 2004 9:51 am

maybe you should try to CLI; HLT in the double-fault-handler task or things could get wrong faster than expected. Remember that at this point, the CPU is only one step ahead of reseting.

The Lazy Neuromancer · Wed Dec 15, 2004 10:33 am

I found out that the problem is on the task switching mechanism. Everything works on virtual machines when doing a jmp <tss desc>:0, but on a real hardware it does not work.
I'll gonna check that.

Pype.Clicker · Post by **Pype.Clicker** » Thu Dec 16, 2004 4:51 am

that sounds like a "assumed non-initialized memory will contain zeroes" problem to me ... staying tuned ...

Neuromancer · Post by **Neuromancer** » Thu Dec 16, 2004 11:10 am

What do you want to say?

The kernel and the double fault TSS are memory_clear()'d and the GDT is clean, too..

Pype.Clicker · Post by **Pype.Clicker** » Fri Dec 17, 2004 3:03 am

that was just one common mistake due to the fact BOCHS' memory is always cleared, while real hardware isn't ...

Another common such thing comes from BOCHS' BIOS enabling A20 at startup while some realPC still don't...

... though i have of course no evidence it's the cause of the trouble you're facing ...

distantvoices · Post by **distantvoices** » Fri Dec 17, 2004 3:51 am

@neuromancer: pagefault handler ... you by any chances are using paging?

Have you zeroed out your page directory and the page tables ere inserting pages tables/pages? That's a*very* nasty thing to debug, because you just don't see where the bloody bugs come from for they don't follow any well traceable sample. I've carried around this kinda bug for half a year ere I could muster the nerves to trace it down with brute force - and with some well put thinking in a few spare hours between cooking and working.

Just check your paging code and eventually add code to zero out the page tables a priori.

Neuromancer · Post by **Neuromancer** » Sat Dec 18, 2004 9:15 am

The TSSes, page directory and page table are put in the .bss section of the kernel. Therefore, I don't do any dynamic allocation. But I'll check the code for the 5435346 time.
The A20 gate is enabled since I am using GRUB.
I use paging (with global pages enabled on supported processors)

However, thanks a lot guys.

I attach the source code URL: http://www.evilmafia.org/Nehemiah.tar.bz2

distantvoices · Post by **distantvoices** » Sat Dec 18, 2004 11:01 am

What's that for an archive? I can't open nor decompress it with tar -jvxf *.tar.bz2

Neuromancer · Post by **Neuromancer** » Sat Dec 18, 2004 4:51 pm

Yay found the nasty, ugly, bad problem!

I had a bad bug in the memory manager, which scanning through the E820 map, coalesced adjacent entries _without_ checking if they were of the same type.
This caused the real system memory map to be coalesced in one only big block of reserved memory (since the first entry was reserved) and I have no memory left for the physical page stack. So when I loaded a CR3 for the double fault handler, I got a bogus physical page (random value), I cleared it (and it could point to a non-RAM address) and used it to create the double fault's page directory.
This has not happened on Bochs and QEMU because they do not report the entire memory map (that is, they report only free RAM and not adjacent) so the buggy code would not run and I had no problem. ;D

@beyond infinity: maybe I have uploaded badly the archive. I'll retry and post the working code on http://www.evilmafia.org/Nehemiah-working.tar.bz2 (added page invalidation on unmaps)

OSDev.org

Double fault handler

Double fault handler

Re:Double fault handler

Re:Double fault handler

Re:Double fault handler

Re:Double fault handler

Re:Double fault handler

Re:Double fault handler

Re:Double fault handler

Re:Double fault handler

Re:Double fault handler

Re:Double fault handler

Re:Double fault handler

Re:Double fault handler