Page 1 of 1

[Solved] Triple Fault on Second Page Fault

Posted: Fri Feb 26, 2021 1:20 pm
by foliagecanine
Hello.
I've run into a really strange bug.
Recently I've been working on fixing my task scheduler and stuff. I decided to try to fix my fault handler since it crashed the entire OS when one program crashes.
When I use a test program that reads from address 0 (which is unmapped), it catches it and kills the process as it should. The OS continues to run fine. I can run programs, exit them, etc.
However, when I run the test program a second time, it causes a triple fault.

My first attempt to figure out what was happening was to add "-d int" to QEMU. However, QEMU spat out a hundred or so of the following:
check_exception old: 0xffffffff new 0xe
174874: v=0e e=0000 i=0 cpl=0 IP=0008:c0101526 pc=c0101526 SP=0010:c0155a41 CR2=00000000
Then after those it printed:
check_exception old: 0xffffffff new 0xd
175039: v=0d e=001a i=0 cpl=0 IP=0008:000f06ac pc=000f06ac SP=0010:00000fc8 env->regs[R_EAX]=000f6206
check_exception old: 0xd new 0xd
175040: v=08 e=0000 i=0 cpl=0 IP=0008:000f06ac pc=000f06ac SP=0010:00000fc8 env->regs[R_EAX]=000f6206
check_exception old: 0x8 new 0xd
(Triple Fault)
My second attempt was to step through and compare registers in both instances.
Unfortunately, the instruction that caused the triple fault was in usermode, so this was difficult.
From what I could tell, important registers like esp, eip, cr3, etc were exactly the same.

What other methods can I use to figure this out?

(Code will be posted in next post)

Re: Triple Fault on Second Page Fault

Posted: Fri Feb 26, 2021 1:20 pm
by foliagecanine
Important code:
idt-asm.asm (page fault idt entry points to page_fault):

Code: Select all

orig_eax dw 0
retaddr dw 0
errcode dw 0

global page_fault

page_fault:
  mov dword [orig_eax],eax
  pop eax
  mov dword [errcode],eax
  mov [ready_esp],esp
  pop eax
  mov dword [retaddr],eax
  push eax
  mov eax, dword [orig_eax]
  pusha
  mov eax, dword [errcode]
  push eax
  mov eax, dword [retaddr]
  push eax
  call exception_page_fault
  popa
  iret
exceptions.c:

Code: Select all

uint32_t address;

void exception_page_fault(uint32_t retaddr, uint32_t error) {
	disable_tasking();
	asm volatile("mov %%cr2, %0":"=a"(address):);
	if(error&4) {
		kerror("[Exception.Fault] Usermode Page Privelage Fault!");
		// ... bunch of printfs that print out the error code, fault address, and page permissions removed
		exit_program(1);
		// Should not reach here.
		for(;;);
		return;
	} else {
		terminal_setcolor(0x1F);
		terminal_refresh();
		kerror("[Exception.Fault] Kernel Page Fault!");
		// ... bunch of printfs that print out the error code, fault address, and page permissions removed
		abort();
		for(;;);
	}
}
task.c (contains exit_program, too long to post here): https://github.com/foliagecanine/tritiu ... 386/task.c

Re: Triple Fault on Second Page Fault

Posted: Fri Feb 26, 2021 2:00 pm
by Octocontrabass
foliagecanine wrote:My first attempt to figure out what was happening was to add "-d int" to QEMU. However, QEMU spat out a hundred or so of the following:
Without seeing the whole log I can't say for sure, but it kinda sounds like an exception is happening inside the exception handler, causing the stack to overflow and eventually leading to a triple fault. So exactly what code is at 0xc0101526? (You might find addr2line helpful here.)

Re: Triple Fault on Second Page Fault

Posted: Fri Feb 26, 2021 2:36 pm
by foliagecanine
Octocontrabass wrote:So exactly what code is at 0xc0101526?
It is the address of the page_fault function.
Also, QEMU says that the page containing the page_fault function IS mapped in the TLB right before the exception happens:
(qemu) info tlb
...
00000000c0101000: 0000000000101000 ----A---W
...

Re: Triple Fault on Second Page Fault

Posted: Fri Feb 26, 2021 3:30 pm
by Gigasoft
page_fault is overwritten by the "mov dword [errcode],eax" instruction. And don't forget to load DS and ES.

Re: Triple Fault on Second Page Fault

Posted: Fri Feb 26, 2021 4:13 pm
by foliagecanine
Gigasoft wrote:page_fault is overwritten by the "mov dword [errcode],eax" instruction. And don't forget to load DS and ES.
You were 100% correct!
I accidentally thought "dword -> dw" instead of "dword -> dd"
Works perfectly now.
Thank you Gigasoft!