Page 1 of 1

Understanding OS Crash

Posted: Fri Nov 26, 2021 11:11 pm
by jwhitehorn
I'm trying to expand my debugging, and could use a few pointers.

Some code that I've recently added is faulting. Based on some research I've added -no-reboot and -d int to my QEMU options. I now have this:
0: v=20 e=0000 i=0 cpl=0 IP=0008:ffffffff80106485 pc=ffffffff80106485 SP=0000:ffffffff8000ff88 env->regs[R_EAX]=0000000000000000
RAX=0000000000000000 RBX=0000000000000000 RCX=ffffffff80114560 RDX=0000000000000000
RSI=0000000000000000 RDI=ffffffff80114560 RBP=ffffffff8000ff88 RSP=ffffffff8000ff88
R8 =ffffffff8000ffc4 R9 =ffffffff8000ffc0 R10=ffffffff8000ffbc R11=0000000000000000
R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
RIP=ffffffff80106485 RFL=00000282 [--S----] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 0000000000000000 00000000 00000000
CS =0008 0000000000000000 00000000 00209800 DPL=0 CS64 [---]
SS =0000 0000000000000000 00000000 00000000
DS =0000 0000000000000000 00000000 00000000
FS =0000 ffffffff803f9800 00000000 00000000
GS =0000 0000000000000000 00000000 00000000
LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
TR =0030 ffffffff803f9400 00000067 0000e900 DPL=3 TSS64-avl
GDT= ffffffff803f9000 0000003f
IDT= ffffffff803fa000 00000fff
1: v=40 e=0000 i=1 cpl=3 IP=0023:0000000000000015 pc=0000000000000015 SP=002b:0000000000001000 env->regs[R_EAX]=0000000000000007
RAX=0000000000000007 RBX=0000000000000000 RCX=0000000000000000 RDX=0000000000000000
RSI=000000000000002c RDI=0000000000000022 RBP=0000000000000000 RSP=0000000000001000
R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
RIP=0000000000000015 RFL=00000202 [-------] CPL=3 II=0 A20=1 SMM=0 HLT=0
ES =0000 0000000000000000 00000000 00000000
CS =0023 0000000000000000 00000000 0020f800 DPL=3 CS64 [---]
SS =002b 0000000000000000 00000000 0000f200 DPL=3 DS [-W-]
DS =0000 0000000000000000 00000000 00000000
FS =0000 ffffffff803f9800 00000000 00000000
GS =0000 0000000000000000 00000000 00000000
LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
TR =0030 ffffffff803f9400 00000067 0000e900 DPL=3 TSS64-avl
GDT= ffffffff803f9000 0000003f
IDT= ffffffff80TR =0030 ffffffff803f9400 00000067 0000e900 DPL=3 TSS64-avl
GDT= ffffffff803f9000 0000003f
IDT= ffffffff803fa000 00000fff
CR0=80000011 CR2=0000000040000fa8 CR3=000000000dffe000 CR4=00000020
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
CCS=0000000000000000 CCD=ffffffff8dffffd8 CCO=EFLAGS
EFER=0000000000000500
check_exception old: 0xe new 0xe
3: v=08 e=0000 i=0 cpl=3 IP=0023:0000000000000015 pc=0000000000000015 SP=002b:0000000000001000 env->regs[R_EAX]=0000000000000007
RAX=0000000000000007 RBX=0000000000000000 RCX=0000000000000000 RDX=0000000000000000
RSI=000000000000002c RDI=0000000000000022 RBP=0000000000000000 RSP=0000000000001000
R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
RIP=0000000000000015 RFL=00000202 [-------] CPL=3 II=0 A20=1 SMM=0 HLT=0
ES =0000 0000000000000000 00000000 00000000
CS =0023 0000000000000000 00000000 0020f800 DPL=3 CS64 [---]
SS =002b 0000000000000000 00000000 0000f200 DPL=3 DS [-W-]
DS =0000 0000000000000000 00000000 00000000
FS =0000 ffffffff803f9800 00000000 00000000
GS =0000 0000000000000000 00000000 00000000
LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
TR =0030 ffffffff803f9400 00000067 0000e900 DPL=3 TSS64-avl
GDT= ffffffff803f9000 0000003f
IDT= ffffffff803fa000 00000fff
CR0=80000011 CR2=0000000040000fa8 CR3=000000000dffe000 CR4=00000020
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
CCS=0000000000000000 CCD=ffffffff8dffffd8 CCO=EFLAGS
EFER=0000000000000500
check_exception old: 0x8 new 0xe
The problem is, I'm not sure what many of these are nor how I'd go about googling it to learn more. From my research I take it the "v=" bit corresponds to interrupt vectors (I think?), and that this is generally telling me what the snapshot of the CPU was when it faulted, but other than that I'm at a loss for how to dig into this.

Any tips/pointers, resources, or even just some basic terminology to search for would be greatly appreciated. Thanks!

Re: Understanding OS Crash

Posted: Sat Nov 27, 2021 12:13 am
by nullplan
So, the "v=" part tells you what vector was activated. First it was interrupt 20h, then 40h, then 8h. Well, that last one is not good, that's a double fault. What caused the double fault?
jwhitehorn wrote:check_exception old: 0xe new 0xe
Ah, so it was a page fault while handing a page fault. I'm guessing the page fault entry in the IDT is not filled out correctly. Or your page tables are wonky. Also, the IP looks weird: Offset 0x15? Do you really let user space execute in the null page?

In any case, most of these are simply registers. They tell you what the registers were at the time an exceptional condition occurred. But the stuff in between the register dumps is usually more important.

Re: Understanding OS Crash

Posted: Sat Nov 27, 2021 12:38 am
by iansjack
A problem with the stack whilst running the page fault handler is another likely cause.

Re: Understanding OS Crash

Posted: Sat Nov 27, 2021 8:05 am
by jwhitehorn
Thank you both, that helps a lot. It also validates my suspecison, based on the code I'd recently added before this behavior started occuring.

For future edification, I do want to ask how you knew this:
nullplan wrote:What caused the double fault?
jwhitehorn wrote:check_exception old: 0xe new 0xe
Ah, so it was a page fault while handing a page fault.
I've found a listing of x86 interrupts, but I'm not sure what to call these - "check exceptions"? Because I'm not finding much when I google, and I'd love to know how you knew that 0xE was a page fault.

Re: Understanding OS Crash

Posted: Sat Nov 27, 2021 11:33 am
by iansjack
Intel and AMD provide very detailed programmer's manuals. (Check their Web sites.) These provide this sort of information and are required reading. (Not necessarily read from start to end!)

Re: Understanding OS Crash

Posted: Sat Nov 27, 2021 1:18 pm
by jwhitehorn
iansjack wrote:Intel and AMD provide very detailed programmer's manuals. (Check their Web sites.) These provide this sort of information and are required reading. (Not necessarily read from start to end!)
Thank you.

I am aware that this is required knowledge - that is why I'm asking for clarification.

I have many, and have read many more, technical references from both AMD and Intel. In this case, however, I lacked the knowledge of what this was even called.

From additional searching I found this resouce. Hopefully this clarification is helpful for anyone else reading this in the future.

Re: Understanding OS Crash

Posted: Sat Nov 27, 2021 1:34 pm
by nullplan
jwhitehorn wrote: and I'd love to know how you knew that 0xE was a page fault.
0xe is 14. So I looked at my own IDT code to check what interrupt 14 is. If I didn't have that, I would have looked at a CPU manual to check the same thing. Both AMD and Intel provide lengthy lists about the exceptions you can get.

Re: Understanding OS Crash

Posted: Sat Nov 27, 2021 1:59 pm
by jwhitehorn
nullplan wrote: 0xe is 14. So I looked at my own IDT code to check what interrupt 14 is. If I didn't have that, I would have looked at a CPU manual to check the same thing. Both AMD and Intel provide lengthy lists about the exceptions you can get.
Thank you as well.

I did end up finding a table that defined all those, so I'm good now. But I do appreciate both of your help!