Page 1 of 1
Paging Issue (Triple Fault)
Posted: Thu May 04, 2017 3:41 am
by HackerCow
Hey everyone,
I'm building my own OS, and (as many others, I imagine), I'm having a bit of trouble getting paging to work. I've decided to implement a higher-half kernel, and after a bunch of failed attempts and plenty of hair-pulling I eventually decided to follow the (alternate) Higher Half tutorial from the OSDev wiki.
At this point - just to see if I can get things running at all - I've literally replaced my entire init routine and linker script with the code from the Tutorial (
http://wiki.osdev.org/User:Glauxosdever ... Bare_Bones).
The problem for me is that there's no real way to check if the paging setup was successful (I'm using qemu). The system just triple faults.
I've managed to track down the exact instruction that causes the triple fault, the call to the init function:
Code: Select all
...
# Set up the stack.
mov $stack_top, %esp
# Enter the high-level kernel.
# Putting a hlt before this line works just fine
call init
# Theres a while(1); in the first line of init(), so the call is what causes the triple fault
...
qemu gives the following output on the crash (over and over again, obviously):
Code: Select all
Triple fault
CPU Reset (CPU 0)
EAX=2badb002 EBX=00009500 ECX=00126000 EDX=0012c003
ESI=0012d000 EDI=001274b4 EBP=c012c924 ESP=c012c50c
EIP=c010115d EFL=00000096 [--S-AP-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010 00000000 ffffffff 00cf9300 DPL=0 DS [-WA]
CS =0008 00000000 ffffffff 00cf9a00 DPL=0 CS32 [-R-]
SS =0010 00000000 ffffffff 00cf9300 DPL=0 DS [-WA]
DS =0010 00000000 ffffffff 00cf9300 DPL=0 DS [-WA]
FS =0010 00000000 ffffffff 00cf9300 DPL=0 DS [-WA]
GS =0010 00000000 ffffffff 00cf9300 DPL=0 DS [-WA]
LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT
TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy
GDT= 000cb268 00000027
IDT= 00000000 000003ff
CR0=80010011 CR2=00000040 CR3=00126000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
CCS=00000418 CCD=c012c50c CCO=SUBL
EFER=0000000000000000
FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00001f80
FPR0=0000000000000000 0000 FPR1=0000000000000000 0000
FPR2=0000000000000000 0000 FPR3=0000000000000000 0000
FPR4=0000000000000000 0000 FPR5=0000000000000000 0000
FPR6=0000000000000000 0000 FPR7=0000000000000000 0000
XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000
XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000
XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000
XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000
CPU Reset (CPU 0)
EAX=00000032 EBX=0000000a ECX=535ff912 EDX=000000e6
ESI=00000000 EDI=00100000 EBP=00000000 ESP=00000fc4
EIP=000ef043 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010 00000000 ffffffff 00cf9300 DPL=0 DS [-WA]
CS =0008 00000000 ffffffff 00cf9b00 DPL=0 CS32 [-RA]
SS =0010 00000000 ffffffff 00cf9300 DPL=0 DS [-WA]
DS =0010 00000000 ffffffff 00cf9300 DPL=0 DS [-WA]
FS =0010 00000000 ffffffff 00cf9300 DPL=0 DS [-WA]
GS =0010 00000000 ffffffff 00cf9300 DPL=0 DS [-WA]
LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT
TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy
GDT= 000f7080 00000037
IDT= 000f70be 00000000
CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
CCS=00000000 CCD=00000001 CCO=LOGICL
EFER=0000000000000000
FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00001f80
FPR0=0000000000000000 0000 FPR1=0000000000000000 0000
FPR2=0000000000000000 0000 FPR3=0000000000000000 0000
FPR4=0000000000000000 0000 FPR5=0000000000000000 0000
FPR6=0000000000000000 0000 FPR7=0000000000000000 0000
XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000
XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000
XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000
XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000
Side note: I'm calling qemu with the following options:
Code: Select all
-d cpu_reset -s -kernel build/src/kernel/kernel -serial file:serial.log -drive file=harddrive.img,format=raw
I'd really appreciate some help on this, since I'm pretty much lost here. A good starting point would even be to know whether or not the wiki tutorial is correct (there are some syntax errors in there, so there's reason to doubt its correctness).
Re: Paging Issue (Triple Fault)
Posted: Thu May 04, 2017 5:27 am
by LtG
Few things..
While modifying the code to assist in debugging (inserting halt for example) can be used it's usually better to just use a debugger (gdb), it might take a few hours to learn or a few days but if you are going to continue osdev you're going to need it anyway so might as well pay the price now and get the benefits going forward instead of postponing it and paying for that for as long as you postpone it.
It seems you have enabled protected mode and paging and you have set a CR3 to point to 0x00126000, however what's there? Remember the CR3 points to physical memory, not virtual though that's probably not the issue here (guessing)..
Not much info to go from, so I'd start looking at the page directory and associated tables as well as the destination of your CALL to init. Assuming the CALL is the last instruction ran when you get the error, then:
EIP=c010115d
So where would this virtual address point to in the physical memory (based on the page directory and tables you created)? With gdb (or other debugger) you can easily break point at the CALL instruction and then check the page directory (and tables) just before the CALL to make sure that the _virtual_ address the CALL is going to jump to is mapped properly. Also it could be that the issue is with your stack, given that CALL uses stack implicitly. You can easily test that by replacing the CALL with a JMP and seeing does the error happen later.
You could also do a disassembly of the whole thing (if it's not too large) and post it here, it might help.
Finally, did you do a full copy/paste of the wiki tutorial? I'm assuming it should work if you follow the tutorial and don't deviate from it at all, and only then start modifying, that way it's easier for you to find out what deviations (change you want) causes crashes and focus on those..
Re: Paging Issue (Triple Fault)
Posted: Thu May 04, 2017 5:49 am
by iansjack
According to CR2 the faulting address is 0x40. So why would your code be trying to read or write that address? You really need to implement a minimal exception handler for page faults that halts execution. You can then examine the error code pushed to the stack.
Another thing you can do is to manually inspect your page table to see that it looks reasonable.
Re: Paging Issue (Triple Fault)
Posted: Thu May 04, 2017 6:10 am
by HackerCow
Thanks for the replies!
LtG wrote:it's usually better to just use a debugger (gdb), it might take a few hours to learn or a few days but if you are going to continue osdev you're going to need it anyway so might as well pay the price now and get the benefits going forward instead of postponing it and paying for that for as long as you postpone it.
I'm quite aware of gdb, and I've been using it for some time (for osdev and other projects). With this particular issue, however, gdb throws an error when trying to connect to the remote target (qemu):
Code: Select all
Remote 'g' packet reply is too long: 000000000000000000000000000000000000000000000000630 (snip)
I assumed this has to do with my bug and tried debugging without gdb, though you're right; using gdb would be better.
LtG wrote:
It seems you have enabled protected mode and paging and you have set a CR3 to point to 0x00126000, however what's there? Remember the CR3 points to physical memory, not virtual though that's probably not the issue here (guessing)..
AFAIK CR3 should point to the page directory used during boot. I haven't checked this, however.
LtG wrote:
Also it could be that the issue is with your stack, given that CALL uses stack implicitly. You can easily test that by replacing the CALL with a JMP and seeing does the error happen later.
That's what I thought, though everything that has something to do with the stack looks sane to me (and is copied directly from the tutorial)
LtG wrote:
You could also do a disassembly of the whole thing (if it's not too large) and post it here, it might help.
There you go:
https://pastebin.com/2Sqj0ngJ
Keep in mind that I have a lot of other stuff in there, though 99% of it isn't called due to the while(1); in init().
LtG wrote:
Finally, did you do a full copy/paste of the wiki tutorial? I'm assuming it should work if you follow the tutorial and don't deviate from it at all, and only then start modifying, that way it's easier for you to find out what deviations (change you want) causes crashes and focus on those..
The tutorial doesn't provide much (it's just a Bare Bones thing), but I've copied the init code and the linker script 1:1 (which is a closed system as far as I understand it).
iansjack wrote:
According to CR2 the faulting address is 0x40. So why would your code be trying to read or write that address? You really need to implement a minimal exception handler for page faults that halts execution. You can then examine the error code pushed to the stack.
I do have proper exception handlers, though I can't get to a point where I can register them since I can't even call init.
I've wondered about 0x40 too. I guess gdb could really help me here, but for some reason it doesn't work for this issue.
iansjack wrote:
Another thing you can do is to manually inspect your page table to see that it looks reasonable.
How would I go about doing that? I can't set up the serial connection, I can't print to the screen, I can't use gdb.
In conclusion, I think my best bet is getting gdb to work again... (unless someone can spot my issue just from the code)
Re: Paging Issue (Triple Fault)
Posted: Thu May 04, 2017 6:19 am
by LtG
Don't have time to do proper quoting, but the GDB issue you are seeing I'm guessing (just a guess) that it's related to GDB and QEMU communicating. That is, one is sending x86_32 and the other is sending 86_64 info, I think GDB doesn't know how to change it on the fly.. After starting GDB:
set architecture i386:x86-32
target remote localhost:1234
or:
set architecture i386:x86-64
Those will come in handy once you start to move to long mode where you want to break point the OS right after it starts (still in real or protected mode), then disconnect GDB (but leave the OS breaked) and start new GDB with 64-bit arch.. bit annoying but what can you do..
"AFAIK CR3 should point to the page directory used during boot. I haven't checked this, however."
What do you mean "during boot"? It points to what you set it to point to.. or who set it?
"That's what I thought, though everything that has something to do with the stack looks sane to me (and is copied directly from the tutorial)"
Note, if your paging is messed up, then the stack (which is virtual address, thus implicitly uses paging if paging is enabled) won't work either. So the CALL uses stack implicitly which uses paging implicitly, and the CALL also uses paging implicitly, all of this assuming paging is enabled. It affects _every_ memory address except the CR3..
"How would I go about doing that? I can't set up the serial connection, I can't print to the screen, I can't use gdb."
Use GDB =)
I hope my advice regarding that helps..
I'll try to check the disassembly at some point but don't have the time for it now.. hopefully GDB will resolve this for you..
[edit] In case it wasn't clear, you start GDB without any command line parameters in this case (namely without the host to connect to, as that is done from GDB with the "target remote ..."..
Re: Paging Issue (Triple Fault)
Posted: Thu May 04, 2017 6:25 am
by iansjack
Your gdb problem: are you specifying the object file name of your kernel on the command line?
Inspecting memory: use the debugger built in to the qemu monitor. Either write an exception routine for page faults which halts the processor (best) or else insert a "hlt" instruction (or a "jmp .") after the page-table setup but before the crash. Then you can inspect data in the monitor. You can also do an "info mem" to see what qemu thinks of your page table.
Re: Paging Issue (Triple Fault)
Posted: Thu May 04, 2017 6:31 am
by HackerCow
LtG wrote:
set architecture i386:x86-32
target remote localhost:1234
Oh hey, that did it! I guess some update must have messed something up there... (I use Arch Linux, so updates are pretty frequent). Thanks!
LtG wrote:
What do you mean "during boot"? It points to what you set it to point to.. or who set it?
Here's what I do (what the tutorial does):
Code: Select all
movl $(boot_pagedir - 0xC0000000), %ecx
movl %ecx, %cr3
I'm assuming boot_pagedir lies at the address that you mentioned. I guess I'll check that, since gdb is actually working now
LtG wrote:
Note, if your paging is messed up, then the stack (which is virtual address, thus implicitly uses paging if paging is enabled) won't work either. So the CALL uses stack implicitly which uses paging implicitly, and the CALL also uses paging implicitly, all of this assuming paging is enabled. It affects _every_ memory address except the CR3..
Yes, I'm aware of that. I guess I'll just debug the issue with gdb and see what I can find out. Thanks for your help so far!
Re: Paging Issue (Triple Fault)
Posted: Thu May 04, 2017 6:35 am
by HackerCow
iansjack wrote:
You can also do an "info mem" to see what qemu thinks of your page table.
Thanks for that tip! The following mappings are active at the beginning of init (break init, continue in gdb):
Code: Select all
00000000c0100000-00000000c012d000 000000000002d000 -rw
00000000c03ff000-00000000c0400000 0000000000001000 -rw
I'm not sure that's what I expected... It looks like the page tables are messed up.
Re: Paging Issue (Triple Fault)
Posted: Thu May 04, 2017 6:42 am
by Octocontrabass
HackerCow wrote:(unless someone can spot my issue just from the code)
Code: Select all
c010115d: 65 a1 14 00 00 00 mov %gs:0x14,%eax
What compiler are you using? What are the options you're passing to it?
Re: Paging Issue (Triple Fault)
Posted: Thu May 04, 2017 6:50 am
by HackerCow
Octocontrabass wrote:
What compiler are you using? What are the options you're passing to it?
Code: Select all
$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-linux-gnu/6.3.1/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /build/gcc-multilib/src/gcc/configure --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/ --enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++ --enable-shared --enable-threads=posix --enable-libmpx --with-system-zlib --with-isl --enable-__cxa_atexit --disable-libunwind-exceptions --enable-clocale=gnu --disable-libstdcxx-pch --disable-libssp --enable-gnu-unique-object --enable-linker-build-id --enable-lto --enable-plugin --enable-install-libiberty --with-linker-hash-style=gnu --enable-gnu-indirect-function --enable-multilib --disable-werror --enable-checking=release
Thread model: posix
gcc version 6.3.1 20170306 (GCC)
I'm using CMake. Here's my output for "make VERBOSE=1":
https://pastebin.com/SAs5hZpU (home directory snipped)
You can see all the options there
Re: Paging Issue (Triple Fault)
Posted: Thu May 04, 2017 6:53 am
by HackerCow
Octocontrabass wrote:
Code: Select all
c010115d: 65 a1 14 00 00 00 mov %gs:0x14,%eax
Hold on, could it be that the stack protectors (-fstack-protector-strong) are causing this?
Edit: It was the stack protector (facepalm). Now THAT'S a stupid mistake.
Re: Paging Issue (Triple Fault)
Posted: Thu May 04, 2017 6:58 am
by Octocontrabass
HackerCow wrote:Target: x86_64-pc-linux-gnu
You may have missed it since it's only mentioned briefly at the top of the bare bones tutorial, but
you should be using a cross-compiler.
Re: Paging Issue (Triple Fault)
Posted: Thu May 04, 2017 3:59 pm
by LtG
HackerCow, did you skip the cross compiler? If so, may I ask why?
I'm asking, assuming you skipped it, because my first guess would be that it seemed like a bore and you wanted to get to OS dev.. if that's the case then it might be useful to rethink the article:
- Emphasize the importance
- Current rough estimate on how much effort and time it takes (30-60 min CPU? and maybe 5 min user?)
- Maybe re-org the article; Quick steps with very short descriptions, no changes allowed!; Longer version that explains what is done and allows the users to customize
Not sure how feasible the last point is, but it's probably the most important one. I'm guessing a lot of people are intimidated by the process and postpone it which is why I suggested above to have a quick steps version that just tells the reader what to do without deviations and the longer version for later and those who want to do things their own way.. Also I'm guessing a lot of people don't "care" about it, after all they want to osdev, not mess with the tools =)
Re: Paging Issue (Triple Fault)
Posted: Fri May 05, 2017 2:59 am
by HackerCow
I did actually skip the cross compiler step, but not exactly for the reasons you mentioned. The reason was actually a combination of:
1) A misunderstanding as to what a cross compiler actually is
I always kinda assumed a cross compiler was just used if you compiled to a different architecture (e.g. x86-64 -> arm64). As a result, I thought for staying within the same architecture and just "switching" the bitness (64 -> 32) the -m32 flag would suffice. Since it worked, I just rolled with it and never questioned it. Obviously I was wrong, and after reading the article it makes a lot of sense too.
2) I just plain old missed it
Like Octontrabass suggested, I just skipped over it. I'd say I'm fairly proficient in C, and I usually know my way around the toolchain pretty well, so I just skimmed over the first few parts of the bare bone tutorial. I guess that wasn't a good choice in hindsight, since I apparently missed some fairly important osdev-specific details in the process.
That being said, I still agree with your proposals regaring the article. I think it states the importance well enough if you're a complete beginner (seldom used C, no idea what gcc is,...), but people with a little bit of general C experience are tempted to skip over those parts I think.
Anyways, thanks for hinting me toward that!