Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
00014041523i[BIOS ] Booting from 0000:7c00
00016224067i[CPU0 ] CPU is in protected mode (active)
00016224067i[CPU0 ] CS.d_b = 32 bit
00016224067i[CPU0 ] SS.d_b = 32 bit
00016224067i[CPU0 ] EFER = 0x00000000
00016224067i[CPU0 ] | RAX=00000000e0000011 RBX=0000000000001000
00016224067i[CPU0 ] | RCX=0000000000000404 RDX=0000000000000400
00016224067i[CPU0 ] | RSP=0000000000009fd7 RBP=0000000000009ff7
00016224067i[CPU0 ] | RSI=00000000000e476c RDI=000000000000ffac
00016224067i[CPU0 ] | R8=0000000000000000 R9=0000000000000000
00016224067i[CPU0 ] | R10=0000000000000000 R11=0000000000000000
00016224067i[CPU0 ] | R12=0000000000000000 R13=0000000000000000
00016224067i[CPU0 ] | R14=0000000000000000 R15=0000000000000000
00016224067i[CPU0 ] | IOPL=0 id vip vif ac vm RF nt of df if tf SF zf af PF cf
00016224067i[CPU0 ] | SEG selector base limit G D
00016224067i[CPU0 ] | SEG sltr(index|ti|rpl) base limit G D
00016224067i[CPU0 ] | CS:0008( 0001| 0| 0) 00000000 ffffffff 1 1
00016224067i[CPU0 ] | DS:0010( 0002| 0| 0) 00000000 ffffffff 1 1
00016224067i[CPU0 ] | SS:0010( 0002| 0| 0) 00000000 ffffffff 1 1
00016224067i[CPU0 ] | ES:0010( 0002| 0| 0) 00000000 ffffffff 1 1
00016224067i[CPU0 ] | FS:0010( 0002| 0| 0) 00000000 ffffffff 1 1
00016224067i[CPU0 ] | GS:0010( 0002| 0| 0) 00000000 ffffffff 1 1
00016224067i[CPU0 ] | MSR_FS_BASE:0000000000000000
00016224067i[CPU0 ] | MSR_GS_BASE:0000000000000000
00016224067i[CPU0 ] | RIP=0000000000101af7 (0000000000101af7)
00016224067i[CPU0 ] | CR0=0xe0000011 CR2=0x0000000000007ee8
00016224067i[CPU0 ] | CR3=0x00000000 CR4=0x00000000
00016224067i[CPU0 ] 0x0000000000101af7>> popad : 61
00016224067e[CPU0 ] exception(): 3rd (14) exception with no resolution, shutdown status is 00h, resetting
00016224067i[SYS ] bx_pc_system_c::Reset(HARDWARE) called
00016224067i[CPU0 ] cpu hardware reset
This kernel runs at 0x100000 (1MB) in physical memory and maps itself to 1MB in virtual memory (correctly) so there should be no problem, when I enable paging, with the IP, it should just continue along smoothly because the code it was running before the addressing mode switch is still in the exact same place. However, according to Bochs log, cr3 is not setting to the address it should be, where there is a formatted page directory waiting. Also interrupts were obviously disabled prior to this, but I'm still triple faulting on the page fault (14). It looks like the real problem is cr3 isn't setting. Help appreciated,
Thanks for the responses. Actually I've found that loading cr3 is not the problem. My page frame allocator generated a bitmap of free and used pages and the first page frame I allocated was at address 0x00 and used as my page directory. So that's not the problem. I also debugged and the memory values are correct at that location. I am however curious about the value of cr2. I think this is the problem. cr2 is supposed to be the address of the instruction that caused the fault. Bochs is reporting that CR2=0x0000000000007ee8. No code should be executing here. My kernel is at 1Mb (0x100000 - five zeros). and maps itself to the same virtual memory location (0x100000), so that when paging is enabled, the instruction pointer has no problem executing code that is no longer mapped into virtual memory where it was just a second ago executing in physical ram. Experienced paging users, is 7ee8 really the address of the instruction that caused the fault. Because if so, something terribly wrong is going on. Help appreciated,
Alright with some debugging I fixed the problem, but I would REALLY like to know why this happened. I know why it fixed the problem, but I want to know why the problem occurred at all in the first place. Since cr2 said the address that "caused" the fault (14/0xE) was 0x7ee8, I mapped the page at physical address 0x7000 to virtual address 0x7000 and tried again. I had the same triple fault response from the OS, but Bochs reported that cr2 now contained the value of 0x9fd3. So, I mapped the page at physical address 0x9000 to virtual address 0x9000 before I enabled paging. This time when I ran the OS, no fault occurred. So the problem was fixed with this code:
; @param1 (DWORD) Page directory address.
; @param2 (DWORD) Physical address of page to be mapped.
; @param3 (DWORD) Virtual address to map to.
; @return (DWORD) Page directory address.
global _vm_map
_vm_map:
So my question is why do those two pages have to be mapped into the virtual address space? Program flow never executed anywhere near either of those two pages and I access no data there either. What's their importance?
CR2 does not contain the address of the instruction that caused the page fault, but the address whose access caused the page fault. For example, if you try to perform a memory read / write to some variable and this access fails, you will get the address of that variable in CR2.
It is thus very likely that your code accessed memory that its not correctly mapped in your page tables. And it's indeed very likely that this memory is the stack, since the page fault happens at
Gerrryg400 you could be a detective. You're 100% correct, I forgot that I put the stack there. But I'm kind of concerned because the stack I allocated for the kernel is initially at address 0x9FFF. This explains why the page at 0x9000 was necessary, and it could explain why the page at 0x7000 was necessary if the page at 0x8000 was also necessary and I had a gigantic stack. However, a page fault was never thrown for an address in the 0x8000 page, and I don't believe my kernel's stack at any point covers more than 4Kb, so.... Needing the page at 0x7000 is still an unsolved case...
CR2 does not contain the address of the instruction that caused the page fault, but the address whose access caused the page fault. For example, if you try to perform a memory read / write to some variable and this access fails, you will get the address of that variable in CR2.
so the stack is around there. Not sure how you are loading, but do you have any stray pointers still pointing to the area around 0x7000. If you're using Grub it might be a vestige of that, like Grub's GDT or the multi-boot structures.
You should aim to create your own stack and unmap all the pages you are not using. It's a good test that everything is perfect. My advice is to fix all early problems properly and in a way you understand. These things are a lot harder to nail down when the crash happens once every few days when you have 4 cores running hundreds of processes.
If a trainstation is where trains stop, what is a workstation ?