[Solved] Triple Fault on Second Page Fault

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
foliagecanine
Member
Member
Posts: 148
Joined: Sun Aug 23, 2020 4:35 pm

[Solved] Triple Fault on Second Page Fault

Post by foliagecanine »

Hello.
I've run into a really strange bug.
Recently I've been working on fixing my task scheduler and stuff. I decided to try to fix my fault handler since it crashed the entire OS when one program crashes.
When I use a test program that reads from address 0 (which is unmapped), it catches it and kills the process as it should. The OS continues to run fine. I can run programs, exit them, etc.
However, when I run the test program a second time, it causes a triple fault.

My first attempt to figure out what was happening was to add "-d int" to QEMU. However, QEMU spat out a hundred or so of the following:
check_exception old: 0xffffffff new 0xe
174874: v=0e e=0000 i=0 cpl=0 IP=0008:c0101526 pc=c0101526 SP=0010:c0155a41 CR2=00000000
Then after those it printed:
check_exception old: 0xffffffff new 0xd
175039: v=0d e=001a i=0 cpl=0 IP=0008:000f06ac pc=000f06ac SP=0010:00000fc8 env->regs[R_EAX]=000f6206
check_exception old: 0xd new 0xd
175040: v=08 e=0000 i=0 cpl=0 IP=0008:000f06ac pc=000f06ac SP=0010:00000fc8 env->regs[R_EAX]=000f6206
check_exception old: 0x8 new 0xd
(Triple Fault)
My second attempt was to step through and compare registers in both instances.
Unfortunately, the instruction that caused the triple fault was in usermode, so this was difficult.
From what I could tell, important registers like esp, eip, cr3, etc were exactly the same.

What other methods can I use to figure this out?

(Code will be posted in next post)
Last edited by foliagecanine on Fri Feb 26, 2021 4:13 pm, edited 1 time in total.
My OS: TritiumOS
https://github.com/foliagecanine/tritium-os
void warranty(laptop_t laptop) { if (laptop.broken) return laptop; }
I don't get it: Why's the warranty void?
foliagecanine
Member
Member
Posts: 148
Joined: Sun Aug 23, 2020 4:35 pm

Re: Triple Fault on Second Page Fault

Post by foliagecanine »

Important code:
idt-asm.asm (page fault idt entry points to page_fault):

Code: Select all

orig_eax dw 0
retaddr dw 0
errcode dw 0

global page_fault

page_fault:
  mov dword [orig_eax],eax
  pop eax
  mov dword [errcode],eax
  mov [ready_esp],esp
  pop eax
  mov dword [retaddr],eax
  push eax
  mov eax, dword [orig_eax]
  pusha
  mov eax, dword [errcode]
  push eax
  mov eax, dword [retaddr]
  push eax
  call exception_page_fault
  popa
  iret
exceptions.c:

Code: Select all

uint32_t address;

void exception_page_fault(uint32_t retaddr, uint32_t error) {
	disable_tasking();
	asm volatile("mov %%cr2, %0":"=a"(address):);
	if(error&4) {
		kerror("[Exception.Fault] Usermode Page Privelage Fault!");
		// ... bunch of printfs that print out the error code, fault address, and page permissions removed
		exit_program(1);
		// Should not reach here.
		for(;;);
		return;
	} else {
		terminal_setcolor(0x1F);
		terminal_refresh();
		kerror("[Exception.Fault] Kernel Page Fault!");
		// ... bunch of printfs that print out the error code, fault address, and page permissions removed
		abort();
		for(;;);
	}
}
task.c (contains exit_program, too long to post here): https://github.com/foliagecanine/tritiu ... 386/task.c
My OS: TritiumOS
https://github.com/foliagecanine/tritium-os
void warranty(laptop_t laptop) { if (laptop.broken) return laptop; }
I don't get it: Why's the warranty void?
Octocontrabass
Member
Member
Posts: 5568
Joined: Mon Mar 25, 2013 7:01 pm

Re: Triple Fault on Second Page Fault

Post by Octocontrabass »

foliagecanine wrote:My first attempt to figure out what was happening was to add "-d int" to QEMU. However, QEMU spat out a hundred or so of the following:
Without seeing the whole log I can't say for sure, but it kinda sounds like an exception is happening inside the exception handler, causing the stack to overflow and eventually leading to a triple fault. So exactly what code is at 0xc0101526? (You might find addr2line helpful here.)
foliagecanine
Member
Member
Posts: 148
Joined: Sun Aug 23, 2020 4:35 pm

Re: Triple Fault on Second Page Fault

Post by foliagecanine »

Octocontrabass wrote:So exactly what code is at 0xc0101526?
It is the address of the page_fault function.
Also, QEMU says that the page containing the page_fault function IS mapped in the TLB right before the exception happens:
(qemu) info tlb
...
00000000c0101000: 0000000000101000 ----A---W
...
Gigasoft
Member
Member
Posts: 856
Joined: Sat Nov 21, 2009 5:11 pm

Re: Triple Fault on Second Page Fault

Post by Gigasoft »

page_fault is overwritten by the "mov dword [errcode],eax" instruction. And don't forget to load DS and ES.
foliagecanine
Member
Member
Posts: 148
Joined: Sun Aug 23, 2020 4:35 pm

Re: Triple Fault on Second Page Fault

Post by foliagecanine »

Gigasoft wrote:page_fault is overwritten by the "mov dword [errcode],eax" instruction. And don't forget to load DS and ES.
You were 100% correct!
I accidentally thought "dword -> dw" instead of "dword -> dd"
Works perfectly now.
Thank you Gigasoft!
My OS: TritiumOS
https://github.com/foliagecanine/tritium-os
void warranty(laptop_t laptop) { if (laptop.broken) return laptop; }
I don't get it: Why's the warranty void?
Post Reply