Page 2 of 2
Re: Switching to virtual memory management
Posted: Sat Nov 08, 2014 8:09 pm
by Brendan
Hi,
psychobeagle12 wrote:Is it possible that I am going about this whole thing the wrong way? I feel like this is a fairly basic task of most systems and that I shouldn't bother continuing if I can't figure this part out!
To me, it seems like you're mostly going about it the right way - e.g. having minor bugs in the implementation (and not having large problems with the design).
For an example, how does the ".startup" section get loaded at 0x0007E000 by multi-boot?
Cheers,
Brendan
Re: Switching to virtual memory management
Posted: Sat Nov 08, 2014 8:26 pm
by psychobeagle12
Brendan wrote:
To me, it seems like you're mostly going about it the right way - e.g. having minor bugs in the implementation (and not having large problems with the design).
For an example, how does the ".startup" section get loaded at 0x0007E000 by multi-boot?
I don't think I understand the question. GRUB parses the ELF executable and determines the load addresses for each section, correct? At least, again, that was my understanding. So, based on my readelf output, GRUB will correctly load .startup to 0x7e000, .text to 0x00100000, and .data, .bss to 0x00102000. Again, I think I am misunderstanding the question, or maybe the point of the question. I am sure that there is something quite important in your response, my mind just isn't putting it together...
Re: Switching to virtual memory management
Posted: Sat Nov 08, 2014 8:41 pm
by eryjus
CRAP!!! I totally missed this the first time through.. and the second, and the third....
You're missing a 0.
Re: Switching to virtual memory management
Posted: Sat Nov 08, 2014 8:44 pm
by psychobeagle12
eryjus wrote:
CRAP!!! I totally missed this the first time through.. and the second, and the third....
You're missing a 0.
I WAS missing a zero lol. I corrected this several posts ago (it was a typo.) See? Every time I think I have it beat, it still fails...
Re: Switching to virtual memory management
Posted: Sat Nov 08, 2014 8:54 pm
by eryjus
Bummer! I thought that was it. This has become as much a quest for me as it is for you!
What is bothering me is the value in CR2. This does not reconcile with the value in EIP and the code in that address. For a page fault, CR2 is the address causing the fault to occur and EIP is the point in the code. The value in eax/CR2 is well beyond what you have mapped in your page tables. I'm honestly debating on whether you are getting a page fault or GPF leading to the triple fault.
Someone smarter than me is going to have to chime in with a big shove in the right direction.
Re: Switching to virtual memory management
Posted: Sat Nov 08, 2014 8:59 pm
by psychobeagle12
That was one of the first problems I noticed as well, the completely odd value of CR2. I didn't think that the GPF following PF would change CR2. Or does it? I felt that the issue was PF->GPF->TF since there are two references in the bochs output to
Code: Select all
interrupt(): vector must be within IDT table limits, IDT.limit = 0x0
Just seemed to make sense given the fact that paging had just been enabled.
Edit: I'm not the only one who, looking at this code, thinks that my way of setting up paging should work, right? Like I'm not on some wild goose chase to a solution to a deeper problem?
Re: Switching to virtual memory management
Posted: Sat Nov 08, 2014 9:38 pm
by Brendan
Hi,
psychobeagle12 wrote:Brendan wrote:For an example, how does the ".startup" section get loaded at 0x0007E000 by multi-boot?
I don't think I understand the question. GRUB parses the ELF executable and determines the load addresses for each section, correct? At least, again, that was my understanding. So, based on my readelf output, GRUB will correctly load .startup to 0x7e000, .text to 0x00100000, and .data, .bss to 0x00102000. Again, I think I am misunderstanding the question, or maybe the point of the question. I am sure that there is something quite important in your response, my mind just isn't putting it together...
Sadly the multi-boot specification itself says nothing about which load addresses are valid and which aren't; and doesn't explicitly say that you can't ask to be loaded at (e.g.) physical address 0x00000000 (and trash the BDA) or 0x0009C000 (and trash the EBDA) or 0x000C0000 (video ROM) or 0x000F0000 (BIOS ROM).
My understanding is that (for multi-boot), everything in the first 1 MiB of memory is not guaranteed to be usable (e.g. there's no guarantee that 0x0007E000 isn't in use by the boot loader itself), and the only safe load addresses are 0x00100000 or higher (but not too much higher as there's no guarantee that the computer has enough RAM either).
Cheers,
Brendan
Re: Switching to virtual memory management
Posted: Sat Nov 08, 2014 10:04 pm
by Brendan
Hi,
psychobeagle12 wrote:That was one of the first problems I noticed as well, the completely odd value of CR2. I didn't think that the GPF following PF would change CR2. Or does it?
GPF doesn't change CR2.
Have you tried putting a magic breakpoint ("xchg ebx,ebx") just before the "jmp 0x08:paging_code" instruction, and then inspecting the page directory, page tables, contents of RAM at (virtual address) 0xC0101500, etc; before the crash occurs?
Cheers,
Brendan
Re: Switching to virtual memory management
Posted: Sun Nov 09, 2014 5:18 am
by Combuster
CR2=CR0 is a rather typical. Consider:
Code: Select all
mov eax, cr0
or eax, cr0_pg
mov cr0, eax
(...)
EAX now contains the value of CR0. When paging kicks in and the mapping is off, the following code looks like empty memory instead:
Code: Select all
db, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
That disassembles to a long sequence of add [eax], al; add [eax], al; etc. We also know that eax equals CR0, making this a very specific symptom that there actually is a page table for that address (otherwise, CR2=EIP), but that your page table points at the wrong location.
Re: Switching to virtual memory management
Posted: Sun Nov 09, 2014 7:24 am
by psychobeagle12
I was actually only paying attention to the bochs log. I'll add the magic breakpoint and run a debugging session, see what I can find out. So now, operating under the assumption (unverified until tested) that my page tables are off, is there anything incorrect in the code that jumps out as causing the inconsistency?
Edit:
Ok, I did a memory dump right after the crash (bochs breaks at triple-fault) and the page tables are all correct, as is the page directory. All of the mappings as far as I can tell are correct. Here is a dump of page table 0 (not the whole thing):
Code: Select all
0x0007f000 <bogus+ 0>: 0x00000003 0x00001003 0x00002003 0x00003003
0x0007f010 <bogus+ 16>: 0x00004003 0x00005003 0x00006003 0x00007003
0x0007f020 <bogus+ 32>: 0x00008003 0x00009003 0x0000a003 0x0000b003
0x0007f030 <bogus+ 48>: 0x0000c003 0x0000d003 0x0000e003 0x0000f003
0x0007f040 <bogus+ 64>: 0x00010003 0x00011003 0x00012003 0x00013003
0x0007f050 <bogus+ 80>: 0x00014003 0x00015003 0x00016003 0x00017003
0x0007f060 <bogus+ 96>: 0x00018003 0x00019003 0x0001a003 0x0001b003
0x0007f070 <bogus+ 112>: 0x0001c003 0x0001d003 0x0001e003 0x0001f003
0x0007f080 <bogus+ 128>: 0x00020003 0x00021003 0x00022003 0x00023003
0x0007f090 <bogus+ 144>: 0x00024003 0x00025003 0x00026003 0x00027003
0x0007f0a0 <bogus+ 160>: 0x00028003 0x00029003 0x0002a003 0x0002b003
0x0007f0b0 <bogus+ 176>: 0x0002c003 0x0002d003 0x0002e003 0x0002f003
0x0007f0c0 <bogus+ 192>: 0x00030003 0x00031003 0x00032003 0x00033003
0x0007f0d0 <bogus+ 208>: 0x00034003 0x00035003 0x00036003 0x00037003
0x0007f0e0 <bogus+ 224>: 0x00038003 0x00039003 0x0003a003 0x0003b003
0x0007f0f0 <bogus+ 240>: 0x0003c003 0x0003d003 0x0003e003 0x0003f003
0x0007f100 <bogus+ 256>: 0x00040003 0x00041003 0x00042003 0x00043003
0x0007f110 <bogus+ 272>: 0x00044003 0x00045003 0x00046003 0x00047003
0x0007f120 <bogus+ 288>: 0x00048003 0x00049003 0x0004a003 0x0004b003
0x0007f130 <bogus+ 304>: 0x0004c003 0x0004d003 0x0004e003 0x0004f003
0x0007f140 <bogus+ 320>: 0x00050003 0x00051003 0x00052003 0x00053003
0x0007f150 <bogus+ 336>: 0x00054003 0x00055003 0x00056003 0x00057003
0x0007f160 <bogus+ 352>: 0x00058003 0x00059003 0x0005a003 0x0005b003
0x0007f170 <bogus+ 368>: 0x0005c003 0x0005d003 0x0005e003 0x0005f003
0x0007f180 <bogus+ 384>: 0x00060003 0x00061003 0x00062003 0x00063003
0x0007f190 <bogus+ 400>: 0x00064003 0x00065003 0x00066003 0x00067003
0x0007f1a0 <bogus+ 416>: 0x00068003 0x00069003 0x0006a003 0x0006b003
0x0007f1b0 <bogus+ 432>: 0x0006c003 0x0006d003 0x0006e003 0x0006f003
0x0007f1c0 <bogus+ 448>: 0x00070003 0x00071003 0x00072003 0x00073003
0x0007f1d0 <bogus+ 464>: 0x00074003 0x00075003 0x00076003 0x00077003
0x0007f1e0 <bogus+ 480>: 0x00078003 0x00079003 0x0007a003 0x0007b003
0x0007f1f0 <bogus+ 496>: 0x0007c003 0x0007d003 0x0007e003 0x0007f003
0x0007f200 <bogus+ 512>: 0x00080003 0x00081003 0x00082003 0x00083003
0x0007f210 <bogus+ 528>: 0x00084003 0x00085003 0x00086003 0x00087003
0x0007f220 <bogus+ 544>: 0x00088003 0x00089003 0x0008a003 0x0008b003
0x0007f230 <bogus+ 560>: 0x0008c003 0x0008d003 0x0008e003 0x0008f003
0x0007f240 <bogus+ 576>: 0x00090003 0x00091003 0x00092003 0x00093003
0x0007f250 <bogus+ 592>: 0x00094003 0x00095003 0x00096003 0x00097003
0x0007f260 <bogus+ 608>: 0x00098003 0x00099003 0x0009a003 0x0009b003
0x0007f270 <bogus+ 624>: 0x0009c003 0x0009d003 0x0009e003 0x0009f003
0x0007f280 <bogus+ 640>: 0x000a0003 0x000a1003 0x000a2003 0x000a3003
0x0007f290 <bogus+ 656>: 0x000a4003 0x000a5003 0x000a6003 0x000a7003
0x0007f2a0 <bogus+ 672>: 0x000a8003 0x000a9003 0x000aa003 0x000ab003
0x0007f2b0 <bogus+ 688>: 0x000ac003 0x000ad003 0x000ae003 0x000af003
0x0007f2c0 <bogus+ 704>: 0x000b0003 0x000b1003 0x000b2003 0x000b3003
0x0007f2d0 <bogus+ 720>: 0x000b4003 0x000b5003 0x000b6003 0x000b7003
0x0007f2e0 <bogus+ 736>: 0x000b8003 0x000b9003 0x000ba003 0x000bb003
0x0007f2f0 <bogus+ 752>: 0x000bc003 0x000bd003 0x000be003 0x000bf003
0x0007f300 <bogus+ 768>: 0x000c0003 0x000c1003 0x000c2003 0x000c3003
0x0007f310 <bogus+ 784>: 0x000c4003 0x000c5003 0x000c6003 0x000c7003
0x0007f320 <bogus+ 800>: 0x000c8003 0x000c9003 0x000ca003 0x000cb003
0x0007f330 <bogus+ 816>: 0x000cc003 0x000cd003 0x000ce003 0x000cf003
0x0007f340 <bogus+ 832>: 0x000d0003 0x000d1003 0x000d2003 0x000d3003
0x0007f350 <bogus+ 848>: 0x000d4003 0x000d5003 0x000d6003 0x000d7003
0x0007f360 <bogus+ 864>: 0x000d8003 0x000d9003 0x000da003 0x000db003
0x0007f370 <bogus+ 880>: 0x000dc003 0x000dd003 0x000de003 0x000df003
0x0007f380 <bogus+ 896>: 0x000e0003 0x000e1003 0x000e2003 0x000e3003
0x0007f390 <bogus+ 912>: 0x000e4003 0x000e5003 0x000e6003 0x000e7003
0x0007f3a0 <bogus+ 928>: 0x000e8003 0x000e9003 0x000ea003 0x000eb003
0x0007f3b0 <bogus+ 944>: 0x000ec003 0x000ed003 0x000ee003 0x000ef003
0x0007f3c0 <bogus+ 960>: 0x000f0003 0x000f1003 0x000f2003 0x000f3003
0x0007f3d0 <bogus+ 976>: 0x000f4003 0x000f5003 0x000f6003 0x000f7003
0x0007f3e0 <bogus+ 992>: 0x000f8003 0x000f9003 0x000fa003 0x000fb003
0x0007f3f0 <bogus+ 1008>: 0x000fc003 0x000fd003 0x000fe003 0x000ff003
0x0007f400 <bogus+ 1024>: 0x00100023 0x00101003 0x00102003 0x00103003
0x0007f410 <bogus+ 1040>: 0x00104003 0x00105003 0x00106003 0x00107003
0x0007f420 <bogus+ 1056>: 0x00108003 0x00109003 0x0010a003 0x0010b003
0x0007f430 <bogus+ 1072>: 0x0010c003 0x0010d003 0x0010e003 0x0010f003
0x0007f440 <bogus+ 1088>: 0x00110003 0x00111003 0x00112003 0x00113003
0x0007f450 <bogus+ 1104>: 0x00114003 0x00115003 0x00116003 0x00117003
0x0007f460 <bogus+ 1120>: 0x00118003 0x00119003 0x0011a003 0x0011b003
0x0007f470 <bogus+ 1136>: 0x0011c003 0x0011d003 0x0011e003 0x0011f003
0x0007f480 <bogus+ 1152>: 0x00120003 0x00121003 0x00122003 0x00123003
0x0007f490 <bogus+ 1168>: 0x00124003 0x00125003 0x00126003 0x00127003
0x0007f4a0 <bogus+ 1184>: 0x00128003 0x00129003 0x0012a003 0x0012b003
0x0007f4b0 <bogus+ 1200>: 0x0012c003 0x0012d003 0x0012e003 0x0012f003
0x0007f4c0 <bogus+ 1216>: 0x00130003 0x00131003 0x00132003 0x00133003
0x0007f4d0 <bogus+ 1232>: 0x00134003 0x00135003 0x00136003 0x00137003
0x0007f4e0 <bogus+ 1248>: 0x00138003 0x00139003 0x0013a003 0x0013b003
0x0007f4f0 <bogus+ 1264>: 0x0013c003 0x0013d003 0x0013e003 0x0013f003
0x0007f500 <bogus+ 1280>: 0x00140003 0x00141003 0x00142003 0x00143003
0x0007f510 <bogus+ 1296>: 0x00144003 0x00145003 0x00146003 0x00147003
0x0007f520 <bogus+ 1312>: 0x00148003 0x00149003 0x0014a003 0x0014b003
0x0007f530 <bogus+ 1328>: 0x0014c003 0x0014d003 0x0014e003 0x0014f003
0x0007f540 <bogus+ 1344>: 0x00150003 0x00151003 0x00152003 0x00153003
0x0007f550 <bogus+ 1360>: 0x00154003 0x00155003 0x00156003 0x00157003
0x0007f560 <bogus+ 1376>: 0x00158003 0x00159003 0x0015a003 0x0015b003
0x0007f570 <bogus+ 1392>: 0x0015c003 0x0015d003 0x0015e003 0x0015f003
0x0007f580 <bogus+ 1408>: 0x00160003 0x00161003 0x00162003 0x00163003
0x0007f590 <bogus+ 1424>: 0x00164003 0x00165003 0x00166003 0x00167003
0x0007f5a0 <bogus+ 1440>: 0x00168003 0x00169003 0x0016a003 0x0016b003
0x0007f5b0 <bogus+ 1456>: 0x0016c003 0x0016d003 0x0016e003 0x0016f003
0x0007f5c0 <bogus+ 1472>: 0x00170003 0x00171003 0x00172003 0x00173003
0x0007f5d0 <bogus+ 1488>: 0x00174003 0x00175003 0x00176003 0x00177003
0x0007f5e0 <bogus+ 1504>: 0x00178003 0x00179003 0x0017a003 0x0017b003
0x0007f5f0 <bogus+ 1520>: 0x0017c003 0x0017d003 0x0017e003 0x0017f003
0x0007f600 <bogus+ 1536>: 0x00180003
and also a small portion of pt 768:
Code: Select all
0x00080000 <bogus+ 0>: 0x00100003 0x00101003 0x00102003 0x00103003
0x00080010 <bogus+ 16>: 0x00104003 0x00105003 0x00106003 0x00107003
0x00080020 <bogus+ 32>: 0x00108003 0x00109003 0x0010a003 0x0010b003
0x00080030 <bogus+ 48>: 0x0010c003 0x0010d003 0x0010e003 0x0010f003
0x00080040 <bogus+ 64>: 0x00110003 0x00111003 0x00112003 0x00113003
0x00080050 <bogus+ 80>: 0x00114003 0x00115003 0x00116003 0x00117003
0x00080060 <bogus+ 96>: 0x00118003 0x00119003 0x0011a003 0x0011b003
0x00080070 <bogus+ 112>: 0x0011c003 0x0011d003 0x0011e003 0x0011f003
0x00080080 <bogus+ 128>: 0x00120003 0x00121003 0x00122003 0x00123003
0x00080090 <bogus+ 144>: 0x00124003 0x00125003 0x00126003 0x00127003
and the directory table entries (0 and 768):
Code: Select all
0x0007e000 <bogus+ 0>: 0x0007f023
0x0007ec00 <bogus+ 0>: 0x00080023
Re: Switching to virtual memory management
Posted: Sun Nov 09, 2014 11:11 am
by Brendan
Hi,
Bit 5 of a page table entry is an "accessed" bit. When the CPU accesses the page, it sets the bit.
Code: Select all
0x0007f400 <bogus+ 1024>: 0x00100023 0x00101003 0x00102003 0x00103003
The first page table entry here indicates that you've accessed something in the identity mapped area, in the page at 0x00100000.
Page directory entries have a similar "accessed" bit.
Code: Select all
0x0007e000 <bogus+ 0>: 0x0007f023
0x0007ec00 <bogus+ 0>: 0x00080023
Both of these were accessed. We already know something in the identity mapped area was accessed. What was accessed in kernel space? None of the page table entries shown have the "accessed" bit set, but I can only see 40 of them. This implies that whatever was accessed was not in the range from 0xC0000000 to 0xC0037FFF, but would've been in the range 0xC0038000 to 0xC03FFFFF. The address 0xC0101510 is in that range.
If we assume the CPU did access 0xC0101510 then you'd have to wonder what is at that virtual address. Does it contain a "jmp $" instruction, or does it contain an "add [eax], al" instruction (because the virtual page is full of zeroes)? If it did contain a "jmp $" then there's no likely way that it could've caused an exception (the only possible way would be an NMI).
All of this suggests that Combuster is right - the CPU successfully accesses the page but it's full of zeros, so the CPU gets an "add [eax], al" instruction, the value in EAX is still the same value you loaded into CR0 (e.g. 0xE0000011), so the CPU tries to add 0x11 to the value at (virtual address) 0xE0000011 and this trigger the page fault (due to "page not present").
If that is the case; the next question is why does the page (at physical address 0x00100000) contains zeros. Was it loaded properly and then trashed by something; or was it not loaded properly?
Cheers,
Brendan
Re: Switching to virtual memory management
Posted: Sun Nov 09, 2014 11:24 am
by psychobeagle12
Ah, I see now. This is becoming much clearer. I'll look into the page at 0xC0101510!
edit: If I am correct, the pte I am interested in is at 0x80404:
Code: Select all
0x00080400 <bogus+ 1024>: 0x00200003 0x00201003 0x00202023 0x00203003
The page was accessed by the processor! So my page most likely IS being trashed!! Ok, problem narrowed down. Now to figure out why my page is being trashed. One last time though, it ISN'T a problem with the mapping code, right? I am mapping everything correctly?
Edit: I just had a further AHA moment! My virtual address is mapping to 0x201000, and the code is loaded at physical address 0x101000!!! I'll fix it
Thank you guys so much for helping me bounce this around!
Edit (edit): Made the following changes:
Code: Select all
mov eax,PAGE_TABLE_768_ADDR+0x400
mov ebx,0x100000 | PAGE_PRIVILEGE
mov ecx,PAGE_TABLE_ENTRIES-256
The kernel bootstrap is working now
Again, a thousand thanks!