Page 1 of 1

Debug question

Posted: Thu Dec 22, 2016 2:08 am
by bzt
Ok, in short, I've messed up something with my APIC+IOAPIC code which generates a double fault in qemu as well as in bochs. I wanted to see why, so I've recompiled bochs to print out all interrupt calls. This is what I've got:

Code: Select all

(0) [0x0000001057a6] 0008:ffffffffffe077a6 (unk. ctxt): iret                      ; 48cf
<bochs:2> s
Next at t=16454120
(0) [0x0000003180e8] 0023:00000000002000e8 (unk. ctxt): ret                       ; c3
<bochs:3> print-stack
Stack address size 8
 | STACK 0x00000000001fffa0 [0x00000000:0x002080e9]
 | STACK 0x00000000001fffa8 [0x00000000:0x002060e9]
 | STACK 0x00000000001fffb0 [0x00000000:0x002040e9]
 | STACK 0x00000000001fffb8 [0x00000000:0x002020e8]
 | STACK 0x00000000001fffc0 [0x00000000:0x002000f9]
 | STACK 0x00000000001fffc8 [0x00000000:0x00000001]
 | STACK 0x00000000001fffd0 [0x00000000:0x001fffd8]
 | STACK 0x00000000001fffd8 [0x00000000:0x001fffe8]
 | STACK 0x00000000001fffe0 [0x00000000:0x00000000]
 | STACK 0x00000000001fffe8 [0x00006d65:0x74737973]
 | STACK 0x00000000001ffff0 [0x00000000:0x00000000]
 | STACK 0x00000000001ffff8 [0x00000000:0x00000000]
 | STACK 0x0000000000200000 [0x00010102:0x464c457f]
 | STACK 0x0000000000200008 [0x00000000:0x00000000]
 | STACK 0x0000000000200010 [0x00000001:0x003e0003]
 | STACK 0x0000000000200018 [0x00000000:0x000000f9]
<bochs:4> s
00016454120e[CPU0  ] interrupt(): vector = 08, TYPE = 0, EXT = 1
Next at t=16454121
(0) [0x000000103081] 0008:ffffffffffe05081 (unk. ctxt): lock bts qword ptr ds:0xffffffffffe1605c, 0x00 ; f0480fba2c255c60e1ff00
<bochs:5> 
As you can see, everything seems to be okay:
bochs:2 - the iret in kernel space is executed, and control transfers to user space.
bochs:3 - in userspace the stack is ok, mapped, a valid return addresses on top of the stack etc.
bochs:4 - but when I step through the "ret", double fault raised IMMEDIATELY!
bochs:5 - my double fault handler starts

How is that possible? It's not a pending interrupt or NMI, that would have been printed out as well! It's not a page fault either! And it's definitely not a problem in one of my ISRs, as a) they are working just fine with PIC, b) the first ISR that gets called is the double fault's handler, right away!

I've tried to mask NMI to be sure, and printed out bochs' signal_event() calls as well, but nothing, the double fault is the first exception. According to the AMD and Intel specs, that should never ever happen! And it's not a problem with bochs, as it's also raised in qemu...

So my question is: has anybody seen something like this before? How to debug this? Any ideas?

EDIT: here's the output with PIC:

Code: Select all

(0) [0x0000003180e8] 0023:00000000002000e8 (unk. ctxt): ret                       ; c3
<bochs:3> print-stack
Stack address size 8
 | STACK 0x00000000001fffa0 [0x00000000:0x002080e9]
 | STACK 0x00000000001fffa8 [0x00000000:0x002060e9]
 | STACK 0x00000000001fffb0 [0x00000000:0x002040e9]
 | STACK 0x00000000001fffb8 [0x00000000:0x002020e8]
 | STACK 0x00000000001fffc0 [0x00000000:0x002000f9]
 | STACK 0x00000000001fffc8 [0x00000000:0x00000001]
 | STACK 0x00000000001fffd0 [0x00000000:0x001fffd8]
 | STACK 0x00000000001fffd8 [0x00000000:0x001fffe8]
 | STACK 0x00000000001fffe0 [0x00000000:0x00000000]
 | STACK 0x00000000001fffe8 [0x00006d65:0x74737973]
 | STACK 0x00000000001ffff0 [0x00000000:0x00000000]
 | STACK 0x00000000001ffff8 [0x00000000:0x00000000]
 | STACK 0x0000000000200000 [0x00010102:0x464c457f]
 | STACK 0x0000000000200008 [0x00000000:0x00000000]
 | STACK 0x0000000000200010 [0x00000001:0x003e0003]
 | STACK 0x0000000000200018 [0x00000000:0x000000f9]
<bochs:4> page 0x1fffa0
PML4: 0x000000000001b027    ps         A pcd pwt U W P
PDPE: 0x000000000001c027    ps         A pcd pwt U W P
 PDE: 0x800000000001d027 XD ps         A pcd pwt U W P
 PTE: 0x8000000000020007 XD    g pat d a pcd pwt U W P
linear page 0x00000000001ff000 maps to physical page 0x000000020000
<bochs:5> s
Next at t=16452006
(0) [0x0000003470e9] 0023:00000000002080e9 (unk. ctxt): push rbp                  ; 55
<bochs:6> 


Re: Debug question

Posted: Thu Dec 22, 2016 2:35 am
by xenos
The interrupt message says EXT = 1, so that would be an external IRQ. Have you remapped IRQs? Have you disabled / masked them?

Re: Debug question

Posted: Thu Dec 22, 2016 3:26 am
by alexfru
vector = 08 most likely means IRQ0 (PIT interrupt).

Re: Debug question

Posted: Thu Dec 22, 2016 8:47 am
by bzt
Thanks for the answer.

Yes, I've masked all ISA IRQs in PIC as well as NMI, and only enabled and routed IRQ1. It seems it doesn't matter, if you have already a pending IRQ8 when you set TASKPI to 0, it will fire... So all ISA IRQs has to be routed, regardless if they are masked or not. What's interesting, if I route PIC to 20-2F before I mask all of it's interrupts, the pending IRQ8 won't fire. I think it's bug in IOAPIC code when it's emulating PIC.

So the solution is: doesn't matter if you want to use PIC or IOAPIC, you'll have to remap PIC anyway.

Re: Debug question

Posted: Thu Dec 22, 2016 12:55 pm
by hgoel
Just checking because I encountered something very similar before, make sure your stacks in the TSS are valid.