CPU still in Ring3 on Interrupt

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
User avatar
zhiayang
Member
Member
Posts: 368
Joined: Tue Dec 27, 2011 7:57 am
Libera.chat IRC: zhiayang

Re: CPU still in Ring3 on Interrupt

Post by zhiayang »

bluemoon wrote:How about "mov $0x28, %ax; ltr %ax"? This is the only thing differ with how I do in my kernel.

Without a serious debugging session, what I can help is limited to educated guess.

Whoops, sorry for the long lack of response, I didn't have my computer with me over the past few days.


No, changing 0x2B to 0x28 doesn't help.
Do you have some kind of suggestion on where I should focus debugging on? Clearly changing random values in the TSS/GDT isn't going to help and I've probably gone through them about 10 times now.

The fundamental problem is, on executing the interrupt, QEMU sets CS=0x8, Bochs sets CS=0xB... there's not much other differences.
Thanks, and sorry for troubling you thus far.
User avatar
zhiayang
Member
Member
Posts: 368
Joined: Tue Dec 27, 2011 7:57 am
Libera.chat IRC: zhiayang

Re: CPU still in Ring3 on Interrupt

Post by zhiayang »

Bump please?

I've checked and rechecked the GDT and TSS... And there doesn't seem to be issues there...
If anyone needs more info, the repo's in my sig.


Thanks!
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: CPU still in Ring3 on Interrupt

Post by Combuster »

I'm expecting I'm not going to spot the issue based on the descriptions alone.

Give me a way to test it locally. The fact that two simulations disagree indicates that you're trying to do something cornercase in general.
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
User avatar
zhiayang
Member
Member
Posts: 368
Joined: Tue Dec 27, 2011 7:57 am
Libera.chat IRC: zhiayang

Re: CPU still in Ring3 on Interrupt

Post by zhiayang »

Combuster wrote:I'm expecting I'm not going to spot the issue based on the descriptions alone.

Give me a way to test it locally. The fact that two simulations disagree indicates that you're trying to do something cornercase in general.
Hi Combuster,

Thanks for the reply. If you're running OS X the binaries for binutils and clang should be in the repo. Else, I'm using binutils 2.22 and clang 3.2. Most of the things should work, minus the disk mounting in the makefile.

Use

Code: Select all

 output/disk.img 
as the disk image. About 64-128 MB of memory should do.



In past cases where QEMU and Bochs disagree, it's usually because Bochs is somewhat stricter in terms of behaviour... And the problem is usually in my code. This time I don't know where to even look for the issue.


EDIT: I'm not sure of the state of the disk image (it's a HDD image btw) on the repo, so use this:
https://dl.dropboxusercontent.com/u/20627716/disk.img

In QEMU it should produce a nice Division By Zero error with like register dumps and everything.
In Bochs, if single-stepping through the code, it'll stall in a GPF loop trying to change %ds and/or access %cr2.
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: CPU still in Ring3 on Interrupt

Post by Combuster »

First bug found: your code doesn't bother to check for long mode support. :wink:
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: CPU still in Ring3 on Interrupt

Post by Combuster »

Second problem: I get a crash by pagefault which is caused by what looks like a (builtin) memset to an invalid address - and CS equals 0x8 rather than 0xB...

Code: Select all

00031492763i[CPU0 ] CS.mode = 64 bit
00031492763i[CPU0 ] SS.mode = 64 bit
00031492763i[CPU0 ] EFER   = 0x00000501
00031492763i[CPU0 ] | RAX=0000000000000000  RBX=00000000ffffffff
00031492763i[CPU0 ] | RCX=0000000000000004  RDX=0000000000000004
00031492763i[CPU0 ] | RSP=0000000000a34d40  RBP=0000000000a34d78
00031492763i[CPU0 ] | RSI=0000000000000000  RDI=000000010027f0f0
00031492763i[CPU0 ] |  R8=000000010027f0f0   R9=0000000000000400
00031492763i[CPU0 ] | R10=000000000000000a  R11=000000000000000b
00031492763i[CPU0 ] | R12=0000000000000020  R13=0000000000000040
00031492763i[CPU0 ] | R14=000000000000027f  R15=0000000000000280
00031492763i[CPU0 ] | IOPL=0 id vip vif ac vm RF nt of df IF tf sf zf af pf cf
00031492763i[CPU0 ] | SEG sltr(index|ti|rpl)     base    limit G D
00031492763i[CPU0 ] |  CS:0008( 0001| 0|  0) 00000000 ffffffff 1 0
00031492763i[CPU0 ] |  DS:0010( 0002| 0|  0) 00000000 ffffffff 1 0
00031492763i[CPU0 ] |  SS:0010( 0002| 0|  0) 00000000 ffffffff 1 0
00031492763i[CPU0 ] |  ES:0010( 0002| 0|  0) 00000000 ffffffff 1 0
00031492763i[CPU0 ] |  FS:0010( 0002| 0|  0) 00000000 ffffffff 1 0
00031492763i[CPU0 ] |  GS:0010( 0002| 0|  0) 00000000 ffffffff 1 0
00031492763i[CPU0 ] |  MSR_FS_BASE:0000000000000000
00031492763i[CPU0 ] |  MSR_GS_BASE:0000000000000000
00031492763i[CPU0 ] | RIP=0000000000206cf7 (0000000000206cf7)
00031492763i[CPU0 ] | CR0=0xe0000013 CR2=0xfffffffffffffff8
00031492763i[CPU0 ] | CR3=0x00001000 CR4=0x00000620
(0).[31492763] [0x0000000000206cf7] 0008:0000000000206cf7 (unk. ctxt): rep stosq qword ptr es:[rdi], rax ; f348ab
00031492763e[CPU0 ] exception(): 3rd (14) exception with no resolution, shutdown status is 00h, resetting
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
User avatar
zhiayang
Member
Member
Posts: 368
Joined: Tue Dec 27, 2011 7:57 am
Libera.chat IRC: zhiayang

Re: CPU still in Ring3 on Interrupt

Post by zhiayang »

Combuster wrote:Second problem: I get a crash by pagefault which is caused by what looks like a (builtin) memset to an invalid address - and CS equals 0x8 rather than 0xB...

Code: Select all

00031492763i[CPU0 ] CS.mode = 64 bit
00031492763i[CPU0 ] SS.mode = 64 bit
00031492763i[CPU0 ] EFER   = 0x00000501
00031492763i[CPU0 ] | RAX=0000000000000000  RBX=00000000ffffffff
00031492763i[CPU0 ] | RCX=0000000000000004  RDX=0000000000000004
00031492763i[CPU0 ] | RSP=0000000000a34d40  RBP=0000000000a34d78
00031492763i[CPU0 ] | RSI=0000000000000000  RDI=000000010027f0f0
00031492763i[CPU0 ] |  R8=000000010027f0f0   R9=0000000000000400
00031492763i[CPU0 ] | R10=000000000000000a  R11=000000000000000b
00031492763i[CPU0 ] | R12=0000000000000020  R13=0000000000000040
00031492763i[CPU0 ] | R14=000000000000027f  R15=0000000000000280
00031492763i[CPU0 ] | IOPL=0 id vip vif ac vm RF nt of df IF tf sf zf af pf cf
00031492763i[CPU0 ] | SEG sltr(index|ti|rpl)     base    limit G D
00031492763i[CPU0 ] |  CS:0008( 0001| 0|  0) 00000000 ffffffff 1 0
00031492763i[CPU0 ] |  DS:0010( 0002| 0|  0) 00000000 ffffffff 1 0
00031492763i[CPU0 ] |  SS:0010( 0002| 0|  0) 00000000 ffffffff 1 0
00031492763i[CPU0 ] |  ES:0010( 0002| 0|  0) 00000000 ffffffff 1 0
00031492763i[CPU0 ] |  FS:0010( 0002| 0|  0) 00000000 ffffffff 1 0
00031492763i[CPU0 ] |  GS:0010( 0002| 0|  0) 00000000 ffffffff 1 0
00031492763i[CPU0 ] |  MSR_FS_BASE:0000000000000000
00031492763i[CPU0 ] |  MSR_GS_BASE:0000000000000000
00031492763i[CPU0 ] | RIP=0000000000206cf7 (0000000000206cf7)
00031492763i[CPU0 ] | CR0=0xe0000013 CR2=0xfffffffffffffff8
00031492763i[CPU0 ] | CR3=0x00001000 CR4=0x00000620
(0).[31492763] [0x0000000000206cf7] 0008:0000000000206cf7 (unk. ctxt): rep stosq qword ptr es:[rdi], rax ; f348ab
00031492763e[CPU0 ] exception(): 3rd (14) exception with no resolution, shutdown status is 00h, resetting

Thanks for your reply,

The function surrounding RIP=0x206CF7 is indeed a memset... is it possible to tell me what is %rip before the call to <_ZN7Library6Memory3SetEPvim> ?

Weird though:

1. Somehow the IDT is messed up... Could you do an IDT dump?
2. The Bochs PCI-BGA thing is enabled, right?
3. CR2 = 0xFFFFFFFFFFFFFFF8 is quite weird, quite weird indeed. There isn't anything that immediately arouses my suspicion.
4. Of course, I don't get any of these errors...


EDIT:
If, at that point, CS=0x8 on a fault... Either your Bochs does magic that my Bochs doesn't, or (the more likely case) the fault happened in kernel-mode.

It could be my TSS setting code, I haphazardly assume there's memory from 4-5MB, but those shouldn't be causing any issues...


Here's my bochsrc for reference:

Code: Select all

# configuration file generated by Bochs
plugin_ctrl: unmapped=1, biosdev=1, speaker=1, extfpuirq=1, parallel=1, serial=1, gameport=1, iodebug=1
config_interface: textconfig
display_library: x
memory: host=512, guest=512
romimage: file="/usr/local/Cellar/bochs/2.6/share/bochs/BIOS-bochs-latest"
vgaromimage: file="/usr/local/Cellar/bochs/2.6/share/bochs/VGABIOS-lgpl-latest"
boot: floppy, cdrom, disk
floppy_bootsig_check: disabled=0
# no floppy
# no floppyb
ata0: enabled=1, ioaddr1=0x1f0, ioaddr2=0x3f0, irq=14
ata0-master: type=disk, path="disk.img", mode=flat, cylinders=162, heads=8, spt=63, model="Generic 1234", biosdetect=auto, translation=auto
ata0-slave: type=none
ata1: enabled=1, ioaddr1=0x170, ioaddr2=0x370, irq=15
ata1-master: type=disk, path="Data.img", mode=flat, cylinders=162, heads=8, spt=63, model=“Generic 1234”, biosdetect=auto, translation=auto
ata1-slave: type=none
ata2: enabled=0
ata3: enabled=0
pci: enabled=1, chipset=i440fx, slot1=pcivga
vga: extension=vbe, update_freq=60
cpu: count=1, ips=8000000, model=corei5_arrandale_m520, reset_on_triple_fault=0, cpuid_limit_winnt=0, ignore_bad_msrs=1, mwait_is_nop=0
print_timestamps: enabled=0
debugger_log: -
magic_break: enabled=1
port_e9_hack: enabled=0
private_colormap: enabled=0
clock: sync=none, time0=local, rtc_sync=0
# no cmosimage
# no loader
log: bochs.log
logprefix: %t%e%d
debug: action=ignore
info: action=report
error: action=report
panic: action=ask
keyboard: type=mf, serial_delay=250, paste_delay=100000, user_shortcut=none
mouse: type=ps2, enabled=0, toggle=ctrl+mbutton
parport1: enabled=0
parport2: enabled=0
com1: enabled=0
com2: enabled=0
com3: enabled=0
com4: enabled=0


Also, I should check for long mode shouldn't I :p
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: CPU still in Ring3 on Interrupt

Post by Combuster »

For the reference, I just grabbed the harddisk image in question as not to bother with setting up the tooling.

The full mixup, with annotated dumps - and yes, you are pulling edge case stunts with these settings.
00125984685d[CPU0 ] page walk for address 0x000000010027d0f0
00125984717d[CPU0 ] page walk for address 0x000000010027e0f0

00125984749d[CPU0 ] page walk for address 0x000000010027f0f0
00125984749d[CPU0 ] PAE PTE: entry not present
00125984749d[CPU0 ] page fault for address 000000010027f0f0 @ 0000000000206cfa
00125984749d[CPU0 ] exception(0x0e): error_code=0002
00125984749d[CPU0 ] interrupt(): vector = 0e, TYPE = 3, EXT = 1
00125984749d[CPU0 ] interrupt(long mode): INTERRUPT TO SAME PRIVILEGE
00125984749d[CPU0 ] interrupt(long mode): trap to IST, vector = 1
00125984749d[CPU0 ] page walk for address 0xfffffffffffffff8
00125984749d[CPU0 ] PAE PDPE: entry not present
00125984749d[CPU0 ] page fault for address fffffffffffffff8 @ 0000000000206cf7
00125984749d[CPU0 ] exception(0x0e): error_code=0002
00125984749d[CPU0 ] exception(0x08): error_code=0000
00125984749d[CPU0 ] interrupt(): vector = 08, TYPE = 3, EXT = 1
00125984749d[CPU0 ] interrupt(long mode): INTERRUPT TO SAME PRIVILEGE
00125984749d[CPU0 ] interrupt(long mode): trap to IST, vector = 1
00125984749d[CPU0 ] page walk for address 0xfffffffffffffff8
00125984749d[CPU0 ] PAE PDPE: entry not present
00125984749d[CPU0 ] page fault for address fffffffffffffff8 @ 0000000000206cf7
00125984749d[CPU0 ] exception(0x0e): error_code=0002
Next at t=31506443
(0) [0x0000000000206cf7] 0008:0000000000206cf7 (unk. ctxt): rep stosq qword ptr es:[rdi], rax ; f348ab
<bochs:5> print-stack
Stack address size 8
| STACK 0x0000000000a34d40 [0x00000000:0x002019a8]
| STACK 0x0000000000a34d48 [0x00000040:0x00000270]
| STACK 0x0000000000a34d50 [0x00000000:0x00000040]
| STACK 0x0000000000a34d58 [0x00000000:0x002171e4]
| STACK 0x0000000000a34d60 [0x00000000:0x00000020]
| STACK 0x0000000000a34d68 [0x00000000:0x002171f0]
| STACK 0x0000000000a34d70 [0x00000000:0x002171e0]
| STACK 0x0000000000a34d78 [0x00000000:0x00a34db8]
| STACK 0x0000000000a34d80 [0x00000000:0x0020ae84]
| STACK 0x0000000000a34d88 [0x00000020:0x00210b00]
| STACK 0x0000000000a34d90 [0x00000000:0x00000020]
| STACK 0x0000000000a34d98 [0x00000000:0x00210700]
| STACK 0x0000000000a34da0 [0x00000000:0x0000000a]
| STACK 0x0000000000a34da8 [0x00000000:0x00000000]
| STACK 0x0000000000a34db0 [0x00000000:0x0000000a]
| STACK 0x0000000000a34db8 [0x00000000:0x00a34ef8]

<bochs:6> info tab
cr3: 0x0000000000001000
0x00000000-0x007dafff -> 0x0000000000000000-0x00000000007dafff
0x007db000-0x007dbfff -> 0x0000000000207000-0x0000000000207fff
0x007e1000-0x007e1fff -> 0x000000000020a000-0x000000000020afff
0x007ed000-0x007edfff -> 0x0000000000000000-0x0000000000000fff
0x007ef000-0x007effff -> 0x0000000000000000-0x0000000000000fff
0x007f1000-0x007f1fff -> 0x0000000000205000-0x0000000000205fff
0x007f6000-0x007f6fff -> 0x0000001f007f6000-0x0000001f007f6fff
0x007fd000-0x007fdfff -> 0x0000000000204000-0x0000000000204fff
0x00a33000-0x00a3afff -> 0x0000000000a33000-0x0000000000a3afff
0xfffff000-0xffffffff -> 0x00000000fffff000-0x00000000ffffffff

<bochs:9> info gdt
Global Descriptor Table (base=0x00000000002007b0, limit=55):
GDT[0x00]=??? descriptor hi=0x00000000, lo=0x00000000
GDT[0x01]=Code segment, base=0x00000000, limit=0xffffffff, Execute/Read, Conforming, Accessed, 64-bit (Conforming, why?!)
GDT[0x02]=Data segment, base=0x00000000, limit=0xffffffff, Read/Write, Accessed
GDT[0x03]=Code segment, base=0x00000000, limit=0xffffffff, Execute/Read, Conforming, 64-bit
GDT[0x04]=Data segment, base=0x00000000, limit=0xffffffff, Read/Write
GDT[0x05]=32-Bit TSS (Available) at 0x00500000, length 0x00068
GDT[0x06]=??? descriptor hi=0x00000000, lo=0x00000000

<bochs:10> info idt 0 63
Interrupt Descriptor Table (base=0x0000000000216080, limit=4095):
IDT[0x00]=64-Bit Interrupt Gate target=0x0008:0000000000204724, DPL=3 (Wrong! This allows any task to call INT 0 rather than being exclusively for #DE)
IDT[0x01]=64-Bit Interrupt Gate target=0x0008:000000000020472d, DPL=0
IDT[0x02]=64-Bit Interrupt Gate target=0x0008:0000000000204736, DPL=0
IDT[0x03]=64-Bit Interrupt Gate target=0x0008:000000000020473f, DPL=0
IDT[0x04]=64-Bit Interrupt Gate target=0x0008:0000000000204748, DPL=0
IDT[0x05]=64-Bit Interrupt Gate target=0x0008:0000000000204751, DPL=0
IDT[0x06]=64-Bit Interrupt Gate target=0x0008:000000000020475a, DPL=0
IDT[0x07]=64-Bit Interrupt Gate target=0x0008:0000000000204763, DPL=0
IDT[0x08]=64-Bit Interrupt Gate target=0x0008:000000000020476c, DPL=0
IDT[0x09]=64-Bit Interrupt Gate target=0x0008:0000000000204773, DPL=0
IDT[0x0a]=64-Bit Interrupt Gate target=0x0008:000000000020477c, DPL=0
IDT[0x0b]=64-Bit Interrupt Gate target=0x0008:0000000000204783, DPL=0
IDT[0x0c]=64-Bit Interrupt Gate target=0x0008:000000000020478a, DPL=0
IDT[0x0d]=64-Bit Interrupt Gate target=0x0008:0000000000204791, DPL=0
IDT[0x0e]=64-Bit Interrupt Gate target=0x0008:0000000000204798, DPL=0
IDT[0x0f]=64-Bit Interrupt Gate target=0x0008:000000000020479f, DPL=0
IDT[0x10]=64-Bit Interrupt Gate target=0x0008:00000000002047a8, DPL=0
IDT[0x11]=64-Bit Interrupt Gate target=0x0008:00000000002047b1, DPL=0
IDT[0x12]=64-Bit Interrupt Gate target=0x0008:00000000002047ba, DPL=0
IDT[0x13]=64-Bit Interrupt Gate target=0x0008:00000000002047c3, DPL=0
IDT[0x14]=64-Bit Interrupt Gate target=0x0008:00000000002047cc, DPL=0
IDT[0x15]=64-Bit Interrupt Gate target=0x0008:00000000002047d5, DPL=0
IDT[0x16]=64-Bit Interrupt Gate target=0x0008:00000000002047de, DPL=0
IDT[0x17]=64-Bit Interrupt Gate target=0x0008:00000000002047e7, DPL=0
IDT[0x18]=64-Bit Interrupt Gate target=0x0008:00000000002047f0, DPL=0
IDT[0x19]=64-Bit Interrupt Gate target=0x0008:00000000002047f9, DPL=0
IDT[0x1a]=64-Bit Interrupt Gate target=0x0008:0000000000204802, DPL=0
IDT[0x1b]=64-Bit Interrupt Gate target=0x0008:000000000020480b, DPL=0
IDT[0x1c]=64-Bit Interrupt Gate target=0x0008:0000000000204814, DPL=0
IDT[0x1d]=64-Bit Interrupt Gate target=0x0008:000000000020481d, DPL=0
IDT[0x1e]=64-Bit Interrupt Gate target=0x0008:0000000000204826, DPL=0
IDT[0x1f]=64-Bit Interrupt Gate target=0x0008:000000000020482f, DPL=0
IDT[0x20]=64-Bit Interrupt Gate target=0x0008:000000000020a7a8, DPL=0
IDT[0x21]=64-Bit Interrupt Gate target=0x0008:0000000000204856, DPL=0
IDT[0x22]=64-Bit Interrupt Gate target=0x0008:000000000020485c, DPL=0
IDT[0x23]=64-Bit Interrupt Gate target=0x0008:0000000000204862, DPL=0
IDT[0x24]=64-Bit Interrupt Gate target=0x0008:0000000000204868, DPL=0
IDT[0x25]=64-Bit Interrupt Gate target=0x0008:000000000020486e, DPL=0
IDT[0x26]=64-Bit Interrupt Gate target=0x0008:0000000000204874, DPL=0
IDT[0x27]=64-Bit Interrupt Gate target=0x0008:000000000020487a, DPL=0
IDT[0x28]=64-Bit Interrupt Gate target=0x0008:0000000000204880, DPL=0
IDT[0x29]=64-Bit Interrupt Gate target=0x0008:0000000000204886, DPL=0
IDT[0x2a]=64-Bit Interrupt Gate target=0x0008:000000000020488c, DPL=0
IDT[0x2b]=64-Bit Interrupt Gate target=0x0008:0000000000204892, DPL=0
IDT[0x2c]=64-Bit Interrupt Gate target=0x0008:0000000000204898, DPL=0
IDT[0x2d]=64-Bit Interrupt Gate target=0x0008:000000000020489e, DPL=0
IDT[0x2e]=64-Bit Interrupt Gate target=0x0008:00000000002048a4, DPL=0
IDT[0x2f]=64-Bit Interrupt Gate target=0x0008:00000000002048aa, DPL=0
(rest of the entries are zeroed)

<bochs:16> x /40x 0x216080
[bochs]:
0x0000000000216080 <bogus+ 0>: 0x00084724 0x0020ee01 0x00000000 0x00000000 (Why the IST?!)
0x0000000000216090 <bogus+ 16>: 0x0008472d 0x00208e01 0x00000000 0x00000000
0x00000000002160a0 <bogus+ 32>: 0x00084736 0x00208e01 0x00000000 0x00000000
0x00000000002160b0 <bogus+ 48>: 0x0008473f 0x00208e01 0x00000000 0x00000000
0x00000000002160c0 <bogus+ 64>: 0x00084748 0x00208e01 0x00000000 0x00000000
0x00000000002160d0 <bogus+ 80>: 0x00084751 0x00208e01 0x00000000 0x00000000
0x00000000002160e0 <bogus+ 96>: 0x0008475a 0x00208e01 0x00000000 0x00000000
0x00000000002160f0 <bogus+ 112>: 0x00084763 0x00208e01 0x00000000 0x00000000
0x0000000000216100 <bogus+ 128>: 0x0008476c 0x00208e01 0x00000000 0x00000000
0x0000000000216110 <bogus+ 144>: 0x00084773 0x00208e01 0x00000000 0x00000000

<bochs:17> info tss
tr:s=0x0, base=0x0000000000000000, valid=1 (no TSS?!)

I'll leave it to you to write up a proper summary of what you thought you were doing :wink:
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
User avatar
zhiayang
Member
Member
Posts: 368
Joined: Tue Dec 27, 2011 7:57 am
Libera.chat IRC: zhiayang

Re: CPU still in Ring3 on Interrupt

Post by zhiayang »

Code: Select all

Next at t=31506443
(0) [0x0000000000206cf7] 0008:0000000000206cf7 (unk. ctxt): rep stosq qword ptr es:[rdi], rax ; f348ab
<bochs:5> print-stack
Stack address size 8
 | STACK 0x0000000000a34d40 [0x00000000:0x002019a8]
 | STACK 0x0000000000a34d48 [0x00000040:0x00000270]
 | STACK 0x0000000000a34d50 [0x00000000:0x00000040]
 | STACK 0x0000000000a34d58 [0x00000000:0x002171e4]
 | STACK 0x0000000000a34d60 [0x00000000:0x00000020]
 | STACK 0x0000000000a34d68 [0x00000000:0x002171f0]
 | STACK 0x0000000000a34d70 [0x00000000:0x002171e0]
 | STACK 0x0000000000a34d78 [0x00000000:0x00a34db8]
 | STACK 0x0000000000a34d80 [0x00000000:0x0020ae84]
 | STACK 0x0000000000a34d88 [0x00000020:0x00210b00]
 | STACK 0x0000000000a34d90 [0x00000000:0x00000020]
 | STACK 0x0000000000a34d98 [0x00000000:0x00210700]
 | STACK 0x0000000000a34da0 [0x00000000:0x0000000a]
 | STACK 0x0000000000a34da8 [0x00000000:0x00000000]
 | STACK 0x0000000000a34db0 [0x00000000:0x0000000a]
 | STACK 0x0000000000a34db8 [0x00000000:0x00a34ef8]

Looking at the stack (since you didn't tell me %rip before the call to memset, i shall have to assume from the stack)

Code: Select all

 | STACK 0x0000000000a34d78 [0x00000000:0x00a34db8]
This appears to be the 'saved RBP' bit of the stack; +8 should be the return address: 0x0020ae84... Which happens to be some CPUID stuff.

That thing, i must admit, isn't my code, but it hasn't broken before...
And it's tempting to want to see what's below

Code: Select all

 | STACK 0x0000000000a34db8 [0x00000000:0x00a34ef8]
...

Some questions:
1. Is there any output on the screen?
2. If so, what output? If things go as expected, you should at least see a version number and the GRUB memory map. (If the fault is indeed in the CPUID code)
3. If not, I think bochsrc may need some adjustment :p




Next...

Code: Select all

<bochs:6> info tab
cr3: 0x0000000000001000
0x00000000-0x007dafff -> 0x0000000000000000-0x00000000007dafff
0x007db000-0x007dbfff -> 0x0000000000207000-0x0000000000207fff
0x007e1000-0x007e1fff -> 0x000000000020a000-0x000000000020afff
0x007ed000-0x007edfff -> 0x0000000000000000-0x0000000000000fff
0x007ef000-0x007effff -> 0x0000000000000000-0x0000000000000fff
0x007f1000-0x007f1fff -> 0x0000000000205000-0x0000000000205fff
0x007f6000-0x007f6fff -> 0x0000001f007f6000-0x0000001f007f6fff
0x007fd000-0x007fdfff -> 0x0000000000204000-0x0000000000204fff
0x00a33000-0x00a3afff -> 0x0000000000a33000-0x0000000000a3afff
0xfffff000-0xffffffff -> 0x00000000fffff000-0x00000000ffffffff
Looks normal.

Code: Select all

<bochs:9> info gdt   
Global Descriptor Table (base=0x00000000002007b0, limit=55):
GDT[0x00]=??? descriptor hi=0x00000000, lo=0x00000000
GDT[0x01]=Code segment, base=0x00000000, limit=0xffffffff, Execute/Read, [b]Conforming, Accessed, 64-bit (Conforming, why?!)[/b]
GDT[0x02]=Data segment, base=0x00000000, limit=0xffffffff, Read/Write, Accessed
GDT[0x03]=Code segment, base=0x00000000, limit=0xffffffff, Execute/Read, [b]Conforming[/b], 64-bit
GDT[0x04]=Data segment, base=0x00000000, limit=0xffffffff, Read/Write
GDT[0x05]=32-Bit TSS (Available) at 0x00500000, length 0x00068
GDT[0x06]=??? descriptor hi=0x00000000, lo=0x00000000
I'm not sure what caused me to choose 'conforming' over non-conforming... I think it was something I read in the manual about switching to conforming/non-conforming segments from a different privilege level... But it escapes me now.
Could this be the cause of CS being 0xB?

Code: Select all

<bochs:10> info idt 0 63
Interrupt Descriptor Table (base=0x0000000000216080, limit=4095):
[b]IDT[0x00]=64-Bit Interrupt Gate target=0x0008:0000000000204724, DPL=3[/b] (Wrong! This allows any task to call INT 0 rather than being exclusively for #DE)
Correct! I intentionally set it as such; it's easy to identify. I just use INT $0x00 to test if I can return to Ring0 with interrupts.

Code: Select all

<bochs:16> x /40x 0x216080
[bochs]:
0x0000000000216080 <bogus+       0>:	0x00084724 0x0020[b][u]e[/u][/b]e[b][u]01[/u][/b] 0x00000000 0x00000000 (Why the IST?!)
0x0000000000216090 <bogus+      16>:	0x0008472d 0x00208e[b]01[/b] 0x00000000 0x00000000
0x00000000002160a0 <bogus+      32>:	0x00084736 0x00208e[b]01[/b] 0x00000000 0x00000000
0x00000000002160b0 <bogus+      48>:	0x0008473f 0x00208e[b]01[/b] 0x00000000 0x00000000
0x00000000002160c0 <bogus+      64>:	0x00084748 0x00208e[b]01[/b] 0x00000000 0x00000000
0x00000000002160d0 <bogus+      80>:	0x00084751 0x00208e[b]01[/b] 0x00000000 0x00000000
0x00000000002160e0 <bogus+      96>:	0x0008475a 0x00208e[b]01[/b] 0x00000000 0x00000000
0x00000000002160f0 <bogus+     112>:	0x00084763 0x00208e[b]01[/b] 0x00000000 0x00000000
0x0000000000216100 <bogus+     128>:	0x0008476c	0x00208e[b]01[/b] 0x00000000 0x00000000
0x0000000000216110 <bogus+     144>:	0x00084773	0x00208e[b]01[/b] 0x00000000 0x00000000
I'm going to assume this is a dump of the IDT in-memory...
Why not the IST?

Again: 0xEE instead of 0x8E is to set DPL=3 so I can INT $0x00 from Ring3
The IST is supposed to set the interrupt handler to a known good stack... 0x9000 to be precise. Could it be Bochs having something there?

Code: Select all

<bochs:17> info tss
tr:[b]s=0x0, base=0x0000000000000000[/b], valid=1 (no TSS?!)

Yes TSS: Again, this makes me think you're crashing somewhere before the attempt to enter user mode.
This should be it:

Code: Select all

tr:s=0x28, base=0x0000000000500000, valid=1
Again, Bochs insists on using the legacy-mode structure to display information, and I get register values, not IST values. But it doesn't matter, the TSS should be there.


Finally: You shouldn't need to recompile anything. If you can mount the disk image, (and you should be able to), go into

Code: Select all

/System/system.cfg
And set all the PRINTXXX things to true.
That should give a rough estimate of where the crash is occurring.

Bochs should change to 1024x640 (don't change the resolution, there's serious bugs in my scrolling code that somehow only makes it work for this resolution), 32bpp

You will always see the version number.
If you set PRINTMEM to true, you should see the GRUB memory map.
That shouldn't cause a crash, so you should see a printout of the total amount of memory.

If it crashes now, (ie. after displaying total memory) it's definitely the CPUID code.
If not, then you should see PCI information, followed by ATA drive information. Neither of which should cause a crash; both are simply displaying what was already probed previously.

After that, you should see something similar to READELF's output: ELF signature, etc. etc; Program headers and Section Headers.
Right after, I init the TSS and jump to user mode. That works flawlessly.
In usermode [src/userspace/hello/hello.cpp] I simply do INT $0x00.

That concludes my summary of what I'm doing (:




PS: What are the 'edge case stunts' you're talking about? My settings are fairly straightforward with the exception of multiple HDD images being used (which are there to test some FS things)...

Thanks Combuster!
User avatar
zhiayang
Member
Member
Posts: 368
Joined: Tue Dec 27, 2011 7:57 am
Libera.chat IRC: zhiayang

Re: CPU still in Ring3 on Interrupt

Post by zhiayang »

God damn what was I thinking.

I think I must have misread the manual or something, it was indeed conforming (see, see, stop conforming, young teenage kids! Rebel against societal norms! conforming causes crashes!)

Once I set that to 0, it works flawlessly!
QEMU probably doesn't do the privilege checks described in the manual...


But I'm still interested in seeing why you PageFault, don't go away Combuster!


EDIT: Yup, this also solved the issue in VBox. But it turns out both VBox and Bochs have terribly slow disk I/O speeds...
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: CPU still in Ring3 on Interrupt

Post by Combuster »

I'm going to skip all the RTFM questions.
requimrar wrote:Next...

Code: Select all

<bochs:6> info tab
cr3: 0x0000000000001000
0x00000000-0x007dafff -> 0x0000000000000000-0x00000000007dafff
0x007db000-0x007dbfff -> 0x0000000000207000-0x0000000000207fff
0x007e1000-0x007e1fff -> 0x000000000020a000-0x000000000020afff
0x007ed000-0x007edfff -> 0x0000000000000000-0x0000000000000fff
0x007ef000-0x007effff -> 0x0000000000000000-0x0000000000000fff
0x007f1000-0x007f1fff -> 0x0000000000205000-0x0000000000205fff
0x007f6000-0x007f6fff -> 0x0000001f007f6000-0x0000001f007f6fff
0x007fd000-0x007fdfff -> 0x0000000000204000-0x0000000000204fff
0x00a33000-0x00a3afff -> 0x0000000000a33000-0x0000000000a3afff
0xfffff000-0xffffffff -> 0x00000000fffff000-0x00000000ffffffff
Looks normal.
And it turns out to include non-existent memory

Code: Select all

<bochs:10> info idt 0 63
Interrupt Descriptor Table (base=0x0000000000216080, limit=4095):
[b]IDT[0x00]=64-Bit Interrupt Gate target=0x0008:0000000000204724, DPL=3[/b] (Wrong! This allows any task to call INT 0 rather than being exclusively for #DE)
Correct! I intentionally set it as such; it's easy to identify. I just use INT $0x00 to test if I can return to Ring0 with interrupts.
Interrupts 0-31 are officially reserved for architectural purposes. You will need to fix it at some point.
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
User avatar
zhiayang
Member
Member
Posts: 368
Joined: Tue Dec 27, 2011 7:57 am
Libera.chat IRC: zhiayang

Re: CPU still in Ring3 on Interrupt

Post by zhiayang »

Combuster wrote:I'm going to skip all the RTFM questions.
And it turns out to include non-existent memory

Oh... I'll need to look into that.
What are the RTFM questions? I need to RTFM for sure to re-read that conforming bit again.

Interrupts 0-31 are officially reserved for architectural purposes. You will need to fix it at some point.
I know. This is an intentional, temporary thing. I already have another interrupt vector for system calls and the like.


I can't just let this thing slide, you haven't told me where exactly the code crashes. (I'm fairly certain it's the CPUID code, based on the stack-dump alone anyway)
User avatar
bluemoon
Member
Member
Posts: 1761
Joined: Wed Dec 01, 2010 3:41 am
Location: Hong Kong

Re: CPU still in Ring3 on Interrupt

Post by bluemoon »

requimrar wrote:QEMU probably doesn't do the privilege checks described in the manual...
lol. You'd learnt the hard way.
QEMU is known to be cutting conner for speed, and many other things are not strictly emulated like real hardware. It's meant to run linux and windows without explosion, but not necessary realistic/correct emulation.
User avatar
zhiayang
Member
Member
Posts: 368
Joined: Tue Dec 27, 2011 7:57 am
Libera.chat IRC: zhiayang

Re: CPU still in Ring3 on Interrupt

Post by zhiayang »

bluemoon wrote:
requimrar wrote:QEMU probably doesn't do the privilege checks described in the manual...
lol. You'd learnt the hard way.
QEMU is known to be cutting conner for speed, and many other things are not strictly emulated like real hardware. It's meant to run linux and windows without explosion, but not necessary realistic/correct emulation.

I would totally be using Bochs IF:

1. It didn't show 2 icons in my dock.
2. It scrolled faster -> I know I should use an offscreen buffer but... I'm lazy!
3. Disk I/O via ports would be faster. Both VBox and Bochs seem to suffer from the same problem, apparently port i/o is slow as hell... I actually got the impression my OS hangs on VBox, but in reality it was just taking forever to load the config file...




Haha.
I knew QEMU cut corners, but I didn't know to what extent... guess now I do (:
Thanks bluemoon and Combuster.

(PS Combuster, any more details on the #PF you're getting?)
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: CPU still in Ring3 on Interrupt

Post by Combuster »

Requimrar: Grab this and enjoy your own guru meditation :wink:
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
Post Reply