Double fault handler

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
Neuromancer

Double fault handler

Post by Neuromancer »

Hi,
I have created a double fault handler in my OS. When a double fault exception occurs, a task switch takes place and switches to the handler. The task management works (i can jump manually to the handler with jmp <descriptor>:0> and the IDT setting for this interrupt is OK (i can call it manually with a int 8).
I'd like to test it, and I set ESP to a dummy value (0xDEADBEEF) and since the page fault handler that should be called works on the stack, a double fault should occur. But the only think I get is a computer freeze (or reboot, but under QEMU everything freezes up).
Does anyone had this kind of problem and could help me?
Thanks.
User avatar
Pype.Clicker
Member
Member
Posts: 5964
Joined: Wed Oct 18, 2006 2:31 am
Location: In a galaxy, far, far away
Contact:

Re:Double fault handler

Post by Pype.Clicker »

hmm, just to make sure, you're deadbeef'ing ESP at DPL0 already ? if not, the page fault will reuse ESP0 instead of double-faulting ...
User avatar
Candy
Member
Member
Posts: 3882
Joined: Tue Oct 17, 2006 11:33 pm
Location: Eindhoven

Re:Double fault handler

Post by Candy »

On what stack does the double fault handler push ITS information?
The Lazy Neuromancer

Re:Double fault handler

Post by The Lazy Neuromancer »

The double fault handler uses a malloc'd stack, and its ESP is correct, since calling it directly works flawlessly.
By the way, Bochs does the Double fault correctly, but running it on a real machine makes it reboot.
User avatar
Pype.Clicker
Member
Member
Posts: 5964
Joined: Wed Oct 18, 2006 2:31 am
Location: In a galaxy, far, far away
Contact:

Re:Double fault handler

Post by Pype.Clicker »

maybe you should try to CLI; HLT in the double-fault-handler task or things could get wrong faster than expected. Remember that at this point, the CPU is only one step ahead of reseting.
The Lazy Neuromancer

Re:Double fault handler

Post by The Lazy Neuromancer »

I found out that the problem is on the task switching mechanism. Everything works on virtual machines when doing a jmp <tss desc>:0, but on a real hardware it does not work.
I'll gonna check that.
User avatar
Pype.Clicker
Member
Member
Posts: 5964
Joined: Wed Oct 18, 2006 2:31 am
Location: In a galaxy, far, far away
Contact:

Re:Double fault handler

Post by Pype.Clicker »

that sounds like a "assumed non-initialized memory will contain zeroes" problem to me ... staying tuned ...
Neuromancer

Re:Double fault handler

Post by Neuromancer »

What do you want to say? :o
The kernel and the double fault TSS are memory_clear()'d and the GDT is clean, too..
User avatar
Pype.Clicker
Member
Member
Posts: 5964
Joined: Wed Oct 18, 2006 2:31 am
Location: In a galaxy, far, far away
Contact:

Re:Double fault handler

Post by Pype.Clicker »

that was just one common mistake due to the fact BOCHS' memory is always cleared, while real hardware isn't ...

Another common such thing comes from BOCHS' BIOS enabling A20 at startup while some realPC still don't...

... though i have of course no evidence it's the cause of the trouble you're facing ...
distantvoices
Member
Member
Posts: 1600
Joined: Wed Oct 18, 2006 11:59 am
Location: Vienna/Austria
Contact:

Re:Double fault handler

Post by distantvoices »

@neuromancer: pagefault handler ... you by any chances are using paging?

Have you zeroed out your page directory and the page tables ere inserting pages tables/pages? That's a*very* nasty thing to debug, because you just don't see where the bloody bugs come from for they don't follow any well traceable sample. I've carried around this kinda bug for half a year ere I could muster the nerves to trace it down with brute force - and with some well put thinking in a few spare hours between cooking and working.

Just check your paging code and eventually add code to zero out the page tables a priori.
... the osdever formerly known as beyond infinity ...
BlueillusionOS iso image
Neuromancer

Re:Double fault handler

Post by Neuromancer »

The TSSes, page directory and page table are put in the .bss section of the kernel. Therefore, I don't do any dynamic allocation. But I'll check the code for the 5435346 time.
The A20 gate is enabled since I am using GRUB.
I use paging (with global pages enabled on supported processors)

However, thanks a lot guys.

I attach the source code URL: http://www.evilmafia.org/Nehemiah.tar.bz2
distantvoices
Member
Member
Posts: 1600
Joined: Wed Oct 18, 2006 11:59 am
Location: Vienna/Austria
Contact:

Re:Double fault handler

Post by distantvoices »

What's that for an archive? I can't open nor decompress it with tar -jvxf *.tar.bz2
... the osdever formerly known as beyond infinity ...
BlueillusionOS iso image
Neuromancer

Re:Double fault handler

Post by Neuromancer »

Yay found the nasty, ugly, bad problem! 8)
I had a bad bug in the memory manager, which scanning through the E820 map, coalesced adjacent entries _without_ checking if they were of the same type.
This caused the real system memory map to be coalesced in one only big block of reserved memory (since the first entry was reserved) and I have no memory left for the physical page stack. So when I loaded a CR3 for the double fault handler, I got a bogus physical page (random value), I cleared it (and it could point to a non-RAM address) and used it to create the double fault's page directory.
This has not happened on Bochs and QEMU because they do not report the entire memory map (that is, they report only free RAM and not adjacent) so the buggy code would not run and I had no problem. ;D

@beyond infinity: maybe I have uploaded badly the archive. I'll retry and post the working code on http://www.evilmafia.org/Nehemiah-working.tar.bz2 (added page invalidation on unmaps)
Post Reply