Debug question

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
User avatar
bzt
Member
Member
Posts: 1584
Joined: Thu Oct 13, 2016 4:55 pm
Contact:

Debug question

Post by bzt »

Ok, in short, I've messed up something with my APIC+IOAPIC code which generates a double fault in qemu as well as in bochs. I wanted to see why, so I've recompiled bochs to print out all interrupt calls. This is what I've got:

Code: Select all

(0) [0x0000001057a6] 0008:ffffffffffe077a6 (unk. ctxt): iret                      ; 48cf
<bochs:2> s
Next at t=16454120
(0) [0x0000003180e8] 0023:00000000002000e8 (unk. ctxt): ret                       ; c3
<bochs:3> print-stack
Stack address size 8
 | STACK 0x00000000001fffa0 [0x00000000:0x002080e9]
 | STACK 0x00000000001fffa8 [0x00000000:0x002060e9]
 | STACK 0x00000000001fffb0 [0x00000000:0x002040e9]
 | STACK 0x00000000001fffb8 [0x00000000:0x002020e8]
 | STACK 0x00000000001fffc0 [0x00000000:0x002000f9]
 | STACK 0x00000000001fffc8 [0x00000000:0x00000001]
 | STACK 0x00000000001fffd0 [0x00000000:0x001fffd8]
 | STACK 0x00000000001fffd8 [0x00000000:0x001fffe8]
 | STACK 0x00000000001fffe0 [0x00000000:0x00000000]
 | STACK 0x00000000001fffe8 [0x00006d65:0x74737973]
 | STACK 0x00000000001ffff0 [0x00000000:0x00000000]
 | STACK 0x00000000001ffff8 [0x00000000:0x00000000]
 | STACK 0x0000000000200000 [0x00010102:0x464c457f]
 | STACK 0x0000000000200008 [0x00000000:0x00000000]
 | STACK 0x0000000000200010 [0x00000001:0x003e0003]
 | STACK 0x0000000000200018 [0x00000000:0x000000f9]
<bochs:4> s
00016454120e[CPU0  ] interrupt(): vector = 08, TYPE = 0, EXT = 1
Next at t=16454121
(0) [0x000000103081] 0008:ffffffffffe05081 (unk. ctxt): lock bts qword ptr ds:0xffffffffffe1605c, 0x00 ; f0480fba2c255c60e1ff00
<bochs:5> 
As you can see, everything seems to be okay:
bochs:2 - the iret in kernel space is executed, and control transfers to user space.
bochs:3 - in userspace the stack is ok, mapped, a valid return addresses on top of the stack etc.
bochs:4 - but when I step through the "ret", double fault raised IMMEDIATELY!
bochs:5 - my double fault handler starts

How is that possible? It's not a pending interrupt or NMI, that would have been printed out as well! It's not a page fault either! And it's definitely not a problem in one of my ISRs, as a) they are working just fine with PIC, b) the first ISR that gets called is the double fault's handler, right away!

I've tried to mask NMI to be sure, and printed out bochs' signal_event() calls as well, but nothing, the double fault is the first exception. According to the AMD and Intel specs, that should never ever happen! And it's not a problem with bochs, as it's also raised in qemu...

So my question is: has anybody seen something like this before? How to debug this? Any ideas?

EDIT: here's the output with PIC:

Code: Select all

(0) [0x0000003180e8] 0023:00000000002000e8 (unk. ctxt): ret                       ; c3
<bochs:3> print-stack
Stack address size 8
 | STACK 0x00000000001fffa0 [0x00000000:0x002080e9]
 | STACK 0x00000000001fffa8 [0x00000000:0x002060e9]
 | STACK 0x00000000001fffb0 [0x00000000:0x002040e9]
 | STACK 0x00000000001fffb8 [0x00000000:0x002020e8]
 | STACK 0x00000000001fffc0 [0x00000000:0x002000f9]
 | STACK 0x00000000001fffc8 [0x00000000:0x00000001]
 | STACK 0x00000000001fffd0 [0x00000000:0x001fffd8]
 | STACK 0x00000000001fffd8 [0x00000000:0x001fffe8]
 | STACK 0x00000000001fffe0 [0x00000000:0x00000000]
 | STACK 0x00000000001fffe8 [0x00006d65:0x74737973]
 | STACK 0x00000000001ffff0 [0x00000000:0x00000000]
 | STACK 0x00000000001ffff8 [0x00000000:0x00000000]
 | STACK 0x0000000000200000 [0x00010102:0x464c457f]
 | STACK 0x0000000000200008 [0x00000000:0x00000000]
 | STACK 0x0000000000200010 [0x00000001:0x003e0003]
 | STACK 0x0000000000200018 [0x00000000:0x000000f9]
<bochs:4> page 0x1fffa0
PML4: 0x000000000001b027    ps         A pcd pwt U W P
PDPE: 0x000000000001c027    ps         A pcd pwt U W P
 PDE: 0x800000000001d027 XD ps         A pcd pwt U W P
 PTE: 0x8000000000020007 XD    g pat d a pcd pwt U W P
linear page 0x00000000001ff000 maps to physical page 0x000000020000
<bochs:5> s
Next at t=16452006
(0) [0x0000003470e9] 0023:00000000002080e9 (unk. ctxt): push rbp                  ; 55
<bochs:6> 

User avatar
xenos
Member
Member
Posts: 1121
Joined: Thu Aug 11, 2005 11:00 pm
Libera.chat IRC: xenos1984
Location: Tartu, Estonia
Contact:

Re: Debug question

Post by xenos »

The interrupt message says EXT = 1, so that would be an external IRQ. Have you remapped IRQs? Have you disabled / masked them?
Programmers' Hardware Database // GitHub user: xenos1984; OS project: NOS
alexfru
Member
Member
Posts: 1112
Joined: Tue Mar 04, 2014 5:27 am

Re: Debug question

Post by alexfru »

vector = 08 most likely means IRQ0 (PIT interrupt).
User avatar
bzt
Member
Member
Posts: 1584
Joined: Thu Oct 13, 2016 4:55 pm
Contact:

Re: Debug question

Post by bzt »

Thanks for the answer.

Yes, I've masked all ISA IRQs in PIC as well as NMI, and only enabled and routed IRQ1. It seems it doesn't matter, if you have already a pending IRQ8 when you set TASKPI to 0, it will fire... So all ISA IRQs has to be routed, regardless if they are masked or not. What's interesting, if I route PIC to 20-2F before I mask all of it's interrupts, the pending IRQ8 won't fire. I think it's bug in IOAPIC code when it's emulating PIC.

So the solution is: doesn't matter if you want to use PIC or IOAPIC, you'll have to remap PIC anyway.
User avatar
hgoel
Member
Member
Posts: 89
Joined: Sun Feb 09, 2014 7:11 pm
Libera.chat IRC: hgoel
Location: Within a meter of a computer

Re: Debug question

Post by hgoel »

Just checking because I encountered something very similar before, make sure your stacks in the TSS are valid.
"If the truth is a cruel mistress, than a lie must be a nice girl"
Working on Cardinal
Find me at [url=irc://chat.freenode.net:6697/Cardinal-OS]#Cardinal-OS[/url] on freenode!
Post Reply