Page 1 of 1
Double fault handler
Posted: Mon Dec 13, 2004 10:25 am
by Neuromancer
Hi,
I have created a double fault handler in my OS. When a double fault exception occurs, a task switch takes place and switches to the handler. The task management works (i can jump manually to the handler with jmp <descriptor>:0> and the IDT setting for this interrupt is OK (i can call it manually with a int
.
I'd like to test it, and I set ESP to a dummy value (0xDEADBEEF) and since the page fault handler that should be called works on the stack, a double fault should occur. But the only think I get is a computer freeze (or reboot, but under QEMU everything freezes up).
Does anyone had this kind of problem and could help me?
Thanks.
Re:Double fault handler
Posted: Mon Dec 13, 2004 11:19 am
by Pype.Clicker
hmm, just to make sure, you're deadbeef'ing ESP at DPL0 already ? if not, the page fault will reuse ESP0 instead of double-faulting ...
Re:Double fault handler
Posted: Mon Dec 13, 2004 11:22 am
by Candy
On what stack does the double fault handler push ITS information?
Re:Double fault handler
Posted: Tue Dec 14, 2004 8:52 am
by The Lazy Neuromancer
The double fault handler uses a malloc'd stack, and its ESP is correct, since calling it directly works flawlessly.
By the way, Bochs does the Double fault correctly, but running it on a real machine makes it reboot.
Re:Double fault handler
Posted: Tue Dec 14, 2004 9:51 am
by Pype.Clicker
maybe you should try to CLI; HLT in the double-fault-handler task or things could get wrong faster than expected. Remember that at this point, the CPU is only one step ahead of reseting.
Re:Double fault handler
Posted: Wed Dec 15, 2004 10:33 am
by The Lazy Neuromancer
I found out that the problem is on the task switching mechanism. Everything works on virtual machines when doing a jmp <tss desc>:0, but on a real hardware it does not work.
I'll gonna check that.
Re:Double fault handler
Posted: Thu Dec 16, 2004 4:51 am
by Pype.Clicker
that sounds like a "assumed non-initialized memory will contain zeroes" problem to me ... staying tuned ...
Re:Double fault handler
Posted: Thu Dec 16, 2004 11:10 am
by Neuromancer
What do you want to say?
The kernel and the double fault TSS are memory_clear()'d and the GDT is clean, too..
Re:Double fault handler
Posted: Fri Dec 17, 2004 3:03 am
by Pype.Clicker
that was just one common mistake due to the fact BOCHS' memory is always cleared, while real hardware isn't ...
Another common such thing comes from BOCHS' BIOS enabling A20 at startup while some realPC still don't...
... though i have of course no evidence it's the cause of the trouble you're facing ...
Re:Double fault handler
Posted: Fri Dec 17, 2004 3:51 am
by distantvoices
@neuromancer: pagefault handler ... you by any chances are using paging?
Have you zeroed out your page directory and the page tables ere inserting pages tables/pages? That's a*very* nasty thing to debug, because you just don't see where the bloody bugs come from for they don't follow any well traceable sample. I've carried around this kinda bug for half a year ere I could muster the nerves to trace it down with brute force - and with some well put thinking in a few spare hours between cooking and working.
Just check your paging code and eventually add code to zero out the page tables a priori.
Re:Double fault handler
Posted: Sat Dec 18, 2004 9:15 am
by Neuromancer
The TSSes, page directory and page table are put in the .bss section of the kernel. Therefore, I don't do any dynamic allocation. But I'll check the code for the 5435346 time.
The A20 gate is enabled since I am using GRUB.
I use paging (with global pages enabled on supported processors)
However, thanks a lot guys.
I attach the source code URL:
http://www.evilmafia.org/Nehemiah.tar.bz2
Re:Double fault handler
Posted: Sat Dec 18, 2004 11:01 am
by distantvoices
What's that for an archive? I can't open nor decompress it with tar -jvxf *.tar.bz2
Re:Double fault handler
Posted: Sat Dec 18, 2004 4:51 pm
by Neuromancer
Yay found the nasty, ugly, bad problem!
I had a bad bug in the memory manager, which scanning through the E820 map, coalesced adjacent entries _without_ checking if they were of the same type.
This caused the real system memory map to be coalesced in one only big block of reserved memory (since the first entry was reserved) and I have no memory left for the physical page stack. So when I loaded a CR3 for the double fault handler, I got a bogus physical page (random value), I cleared it (and it could point to a non-RAM address) and used it to create the double fault's page directory.
This has not happened on Bochs and QEMU because they do not report the entire memory map (that is, they report only free RAM and not adjacent) so the buggy code would not run and I had no problem. ;D
@beyond infinity: maybe I have uploaded badly the archive. I'll retry and post the working code on
http://www.evilmafia.org/Nehemiah-working.tar.bz2 (added page invalidation on unmaps)