OSDev.org

Posted: **Thu Feb 10, 2022 1:49 pm**

Octocontrabass wrote:
22OsC wrote:page fault with e=0000...
That means a data read from a not-present page (instead of an instruction fetch). I can't tell you what's wrong without more information, though.

22OsC wrote:in this gif there is ----A---W, not sure what A stands for (accessed?) in the kernel mapping but not sure if this can break something
It means "accessed". It won't break anything.

I can give you my source code for vmm and pmm;
vmm.cpp
vmm.h
pmm.cpp
pmm.h
bootparams is a global variable with stivale2 info (i know i can use the stivale tag but i did this because i plan to use multiple bootloaders without changing the code and for now i test with stivale2 and limine bootloader)

Posted: **Thu Feb 10, 2022 2:02 pm**

I was thinking more like QEMU's interrupt log and a disassembly of the faulting code. How will I find the bug in your code with no information about where to look?

Posted: **Thu Feb 10, 2022 2:29 pm**

Octocontrabass wrote:I was thinking more like QEMU's interrupt log and a disassembly of the faulting code. How will I find the bug in your code with no information about where to look?

oh right...
https://mab.to/pJMQ1J12L
this is the objdump of the kernel + all qemu interrupt log

Posted: **Thu Feb 10, 2022 2:44 pm**

According to QEMU's interrupt log, the faulting instruction is at 0xffffffff8002e4b8 due to a memory read from 0x100000, but according to your kernel disassembly the instruction at that address is NOP, which does not access memory.

Most likely, the wrong page has been mapped. In QEMU, memory that has not been initialized is typically filled with zero, and when zero-filled memory is executed in 64-bit mode, it's decoded as "add [rax], al". Notice how according to QEMU's log, RAX contains the value 0x100000, the address of the faulting access.

Do you know the physical address where your kernel was loaded? (If you set a breakpoint before the write to CR3, you can use "info mem" and "info tlb" to examine the bootloader's page tables.)

Posted: **Thu Feb 10, 2022 2:50 pm**

Octocontrabass wrote:According to QEMU's interrupt log, the faulting instruction is at 0xffffffff8002e4b8 due to a memory read from 0x100000, but according to your kernel disassembly the instruction at that address is NOP, which does not access memory.

Most likely, the wrong page has been mapped. In QEMU, memory that has not been initialized is typically filled with zero, and when zero-filled memory is executed in 64-bit mode, it's decoded as "add [rax], al". Notice how according to QEMU's log, RAX contains the value 0x100000, the address of the faulting access.

Do you know the physical address where your kernel was loaded? (If you set a breakpoint before the write to CR3, you can use "info mem" and "info tlb" to examine the bootloader's page tables.)

I do actually know where is the kernel located in the physical address by using stivale2_struct_tag_kernel_base_address tag. So, you want to tell me to start mapping the kernel at base physical and then run again the debugger?

Posted: **Thu Feb 10, 2022 3:05 pm**

Wait, if you have the stivale2_struct_tag_kernel_base_address tag, that means you're using fully virtual mappings, and KERNEL_VMA_OFFSET can't be a constant. You can calculate the actual kernel VMA offset by subtracting physical_base_address from virtual_base_address.

I was suggesting using the debugger to compare the bootloader's page tables with your new page tables. If the kernel mappings change after you load CR3, you have done something wrong.

Posted: **Thu Feb 10, 2022 3:45 pm**

Octocontrabass wrote:Wait, if you have the stivale2_struct_tag_kernel_base_address tag, that means you're using fully virtual mappings, and KERNEL_VMA_OFFSET can't be a constant. You can calculate the actual kernel VMA offset by subtracting physical_base_address from virtual_base_address.

I was suggesting using the debugger to compare the bootloader's page tables with your new page tables. If the kernel mappings change after you load CR3, you have done something wrong.

I did that and now there is no longer a page fault followed by triple fault, now the page fault exception handler is executed and the info about it is:
CR2: 8018b00c (it should begin with 0xFFFFF... but it's print's fault here)
Description: Supervisory process tried to write to a non-present page entry
Write Operation: Read-Only
Caused By An Instruction Fetch: No
CPU Reserved Bits: Unreserved
Kernel-Mode
Page: Not Present

and then the CR2 is only 80000000 and same description etc..

i see that before new pml4, there was a region mapped from 0 to ffe00000. And also that the page size is 0x200000. What should I do?

Posted: **Thu Feb 10, 2022 4:35 pm**

22OsC wrote:CR2: 8018b00c (it should begin with 0xFFFFF... but it's print's fault here)

That address appears to be somewhere within your kernel, although it's hard to say for sure until you fix your print. (It's a good idea to confirm with the QEMU log.) You should also print the address of the faulting instruction so you can find the code that's causing the page fault, in case it's not a problem with how your page tables are set up.

Is that address mapped correctly? You didn't include the "info tlb" output that covers that virtual address.

22OsC wrote:i see that before new pml4, there was a region mapped from 0 to ffe00000. And also that the page size is 0x200000. What should I do?

Do you want that region to be mapped? If you do, map it. You can use 2MiB pages if you like, but you need to check the MTRRs to make sure your large pages won't span a memory type boundary.

Posted: **Thu Feb 10, 2022 4:52 pm**

Octocontrabass wrote:
22OsC wrote:CR2: 8018b00c (it should begin with 0xFFFFF... but it's print's fault here)
That address appears to be somewhere within your kernel, although it's hard to say for sure until you fix your print. (It's a good idea to confirm with the QEMU log.) You should also print the address of the faulting instruction so you can find the code that's causing the page fault, in case it's not a problem with how your page tables are set up.

Is that address mapped correctly? You didn't include the "info tlb" output that covers that virtual address.

22OsC wrote:i see that before new pml4, there was a region mapped from 0 to ffe00000. And also that the page size is 0x200000. What should I do?
Do you want that region to be mapped? If you do, map it. You can use 2MiB pages if you like, but you need to check the MTRRs to make sure your large pages won't span a memory type boundary.

So I checked the TLB and the address 8018b00c is very similar to FFFFFFFF8018b000; I see that the page fault is FFFFFFFF8018b00c. and also if I let the IDT catch continuously page fault's it will throw another page fault with CR2 0x22 and e=0000 with RIP ffffffff80019033 (and then a triple fault),

Posted: **Thu Feb 10, 2022 6:26 pm**

22OsC wrote:So I checked the TLB and the address 8018b00c is very similar to FFFFFFFF8018b000; I see that the page fault is FFFFFFFF8018b00c.

Is that address mapped correctly? What does the QEMU log say about the page fault?

22OsC wrote:and also if I let the IDT catch continuously page fault's it will throw another page fault with CR2 0x22 and e=0000 with RIP ffffffff80019033 (and then a triple fault),

If you return from your page fault handler, you should see the same page fault again.

Posted: **Thu Feb 10, 2022 10:42 pm**

I hope yall don't mind me barging in but I'm suffering a similar problem. I just switched my kernel to use stivale 2 via Limine and am running into triple faults (even though my IDT is loaded). (I feel like I've been here before...)
Oddly, a hardware breakpoint in GDB has absolutely no effect whatsoever. A watchpoint does, however. Setting a watchpoint on CR3 causes me to catch what I believe to be OVMF's page table configuration. But something very odd happens after that: when I tell GDB to continue program execution, the firmware never boots anything. It just sits there doing absolutely nothing. So I have absolutely no idea how to catch Limine altering CR3 or anything because I'll always end up catching the firmware doing it first. The "info status" says the VM is running perfectly fine. It doesn't appear to be possible for me to set a count for the number of times I see CR3 change, so... I'm quite confused. Why wouldn't hardware breakpoints work? I'm running GDB properly:

Code: Select all

[cargo-make] INFO - Execute Command: "qemu-system-x86_64" "-machine" "q35,smm=off,vmport=off" "-cpu" "max,kvm=off" "-m" "8G" "-device" "virtio-balloon" "-nographic" "-device" "qemu-xhci,id=input" "-device" "usb-kbd,bus=input.0" "-device" "usb-tablet,bus=input.0" "-audiodev" "pa,id=audio0,out.mixing-engine=off,out.stream-name=kernel,in.stream-name=kernel" "-device" "intel-hda" "-device" "hda-duplex,audiodev=audio0" "-rtc" "base=localtime,clock=host,driftfix=slew" "-drive" "file=fat:rw:target/boot" "-drive" "if=pflash,format=raw,file=/usr/share/OVMF/x64/OVMF_CODE.fd,readonly=on" "-drive" "file=disk-nvme.qcow2,if=none,id=NVME01" "-device" "nvme,drive=NVME01,serial=0001" "-drive" "id=disk,file=disk-sata.qcow2,if=none" "-device" "ahci,id=ahci" "-device" "ide-hd,drive=disk,bus=ahci.0" "-debugcon" "file:qemu.log" "-global" "isa-debugcon.iobase=0x402" "-d" "int" "-D" "qemu2.log" "-device" "qemu-xhci,id=audio" "-device" "usb-audio,audiodev=usbaudio,bus=audio.0" "-audiodev" "pa,id=usbaudio,out.mixing-engine=off,out.stream-name=kernel-alsa,in.stream-name=kernel-alsa" "-device" "virtio-net,netdev=nic" "-netdev" "user,hostname=kernel,id=nic" "-device" "virtio-rng-pci,rng=rng0" "-object" "rng-random,id=rng0,filename=/dev/urandom" "-device" "virtio-gpu" "-global" "driver=cfi.pflash01,property=secure,value=on" "-no-reboot" "-no-shutdown" "-s" "-S"

Posted: **Fri Feb 11, 2022 6:20 am**

Octocontrabass wrote:
22OsC wrote:So I checked the TLB and the address 8018b00c is very similar to FFFFFFFF8018b000; I see that the page fault is FFFFFFFF8018b00c.
Is that address mapped correctly? What does the QEMU log say about the page fault?

everything is good until the exception 0xe is thrown

Code: Select all

check_exception old: 0xffffffff new 0xe
  1503: v=0e e=0002 i=0 cpl=0 IP=0008:ffffffff80016fa3 pc=ffffffff80016fa3 SP=0010:ffffffff802a97b0 CR2=ffff80008018b00c
RAX=ffff80008018b00c RBX=0000000000000000 RCX=0000000000000003 RDX=000000000018b00c
RSI=0000000000000010 RDI=000000000000005b RBP=ffffffff802a9840 RSP=ffffffff802a97b0
R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000030 R11=0000000000000010
R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
RIP=ffffffff80016fa3 RFL=00000086 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010 0000000000000000 00000000 00009300 DPL=0 DS   [-WA]
CS =0008 0000000000000000 00000000 00209a00 DPL=0 CS64 [-R-]
SS =0010 0000000000000000 00000000 00009200 DPL=0 DS   [-W-]
DS =0010 0000000000000000 00000000 00009300 DPL=0 DS   [-WA]
FS =001b 0000000000000000 00000000 0000f300 DPL=3 DS   [-WA]
GS =001b 0000000000000000 00000000 0000f300 DPL=3 DS   [-WA]
LDT=0000 0000000000000000 00000000 00008200 DPL=0 LDT
TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
GDT=     ffffffff80043d00 00000037
IDT=     ffffffff80055ca0 00000fff
CR0=80000013 CR2=ffff80008018b00c CR3=0000000000100000 CR4=00000620
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
DR6=00000000ffff0ff0 DR7=0000000000000400
CCS=000000000018b00c CCD=ffff80008018b00c CCO=ADDQ    
EFER=0000000000000d00

I see something is not working right with paging, the CR2 is ffff80008018b00c but the maximum normal map is at ffff80000604a000

Octocontrabass wrote:
22OsC wrote:and also if I let the IDT catch continuously page fault's it will throw another page fault with CR2 0x22 and e=0000 with RIP ffffffff80019033 (and then a triple fault),
If you return from your page fault handler, you should see the same page fault again.

I didn't know that, Well I put an infinite loop in the handler.

edit: I found that when I try to access the framebuffer I get the page fault, and that infinite page faults are because I try to write to the screen the info about it. Sooooo... I think this is the last thing that is not working (i hope). I see that now my FB address is now 0xffff800080000000, I have to map it to the physical address (0xffff800080000000 - 0xFFFF800000000000)?

edit2: I fixed the framebuffer but the page fault still the same and it's occurring here because after the new PML4 I try to allocate a page for heap like here, heap address is 0xFFFF800000000000 and page count is 0x10 (16).

Posted: **Fri Feb 11, 2022 1:08 pm**

Ethin wrote:So I have absolutely no idea how to catch Limine altering CR3 or anything because I'll always end up catching the firmware doing it first.

Hold on, back up. Why are you trying to catch Limine altering CR3? The triple fault is happening after Limine hands control to your kernel, isn't it?

22OsC wrote:I see something is not working right with paging, the CR2 is ffff80008018b00c but the maximum normal map is at ffff80000604a000

So, the address is actually 0xFFFF80008018B00C instead of 0xFFFFFFFF8018B00C. That's a big difference!

According to the image you posted earlier, your new page tables don't map addresses in that range the same way the bootloader did. It looks like that address is part of the HHDM area, so the bootloader set it up with a fixed offset between virtual and physical address. You can find that offset in stivale2_struct_tag_hhdm.

Right now, you're only mapping available memory, and there's no correspondence between the virtual and physical address, but the Stivale2 HHDM maps at least 4GB whether it's available memory or not, and the offset between physical and virtual addresses is fixed to a single value.

22OsC wrote:edit: I found that when I try to access the framebuffer I get the page fault, and that infinite page faults are because I try to write to the screen the info about it. Sooooo... I think this is the last thing that is not working (i hope). I see that now my FB address is now 0xffff800080000000, I have to map it to the physical address (0xffff800080000000 - 0xFFFF800000000000)?

The framebuffer is not available memory, but it's included in the HHDM area since it's within the first 4GB of the physical address space.

22OsC wrote:edit2: I fixed the framebuffer but the page fault still the same and it's occurring here because after the new PML4 I try to allocate a page for heap like here, heap address is 0xFFFF800000000000 and page count is 0x10 (16).

Why are you placing your heap inside the HHDM area? I think you need to take a step back and figure out how you're currently using the virtual address space.

Posted: **Fri Feb 11, 2022 3:16 pm**

Octocontrabass wrote:
22OsC wrote:I see something is not working right with paging, the CR2 is ffff80008018b00c but the maximum normal map is at ffff80000604a000
So, the address is actually 0xFFFF80008018B00C instead of 0xFFFFFFFF8018B00C. That's a big difference!

After a few tweaks the only Page Fault I get now is CR2=0000000000900041.

According to the image you posted earlier, your new page tables don't map addresses in that range the same way the bootloader did. It looks like that address is part of the HHDM area, so the bootloader set it up with a fixed offset between virtual and physical address. You can find that offset in stivale2_struct_tag_hhdm.

Right now, you're only mapping available memory, and there's no correspondence between the virtual and physical address, but the Stivale2 HHDM maps at least 4GB whether it's available memory or not, and the offset between physical and virtual addresses is fixed to a single value.

I made the mapping be similar as possible. I did the kernel mapping the same as how bootloader did but, the CR2 address is 0000000000900041. And this is happening after I changed the PML4 in bitmap get function. Before this, it wasn't a problem. Do I have to re-initialize the bitmap after applying the new PML4?

Octocontrabass wrote:
22OsC wrote:edit: I found that when I try to access the framebuffer I get the page fault, and that infinite page faults are because I try to write to the screen the info about it. Sooooo... I think this is the last thing that is not working (i hope). I see that now my FB address is now 0xffff800080000000, I have to map it to the physical address (0xffff800080000000 - 0xFFFF800000000000)?
The framebuffer is not available memory, but it's included in the HHDM area since it's within the first 4GB of the physical address space.

I know that.

Octocontrabass wrote:
22OsC wrote:edit2: I fixed the framebuffer but the page fault still the same and it's occurring here because after the new PML4 I try to allocate a page for heap like here, heap address is 0xFFFF800000000000 and page count is 0x10 (16).
Why are you placing your heap inside the HHDM area? I think you need to take a step back and figure out how you're currently using the virtual address space.

Where I should place my heap then? Sorry if I sound stupid but I'm not sure where I can place it. The heap can be anywhere? Like I don't have to use a canonical address or stuff like that?

Posted: **Fri Feb 11, 2022 3:31 pm**

22OsC wrote:After a few tweaks the only Page Fault I get now is CR2=0000000000900041.

You might be using a physical address somewhere instead of a virtual address. (This is why it's a good idea to use uintptr_t for physical addresses instead of a pointer.)

22OsC wrote:I made the mapping be similar as possible.

Since I can't see your current code, all I have for reference is your earlier code where you definitely were not setting up the HHDM area the same way the bootloader did.

22OsC wrote:Where I should place my heap then? Sorry if I sound stupid but I'm not sure where I can place it. The heap can be anywhere? Like I don't have to use a canonical address or stuff like that?

You can pick any canonical address that isn't already being used for something else. Make sure there's enough unused virtual address space for your heap to grow.

If you don't want a HHDM area, you don't need to have one, but it means it'll be harder to access arbitrary physical addresses.

OSDev.org

Page Fault and General Protection Fault

Re: Page Fault and General Protection Fault

Re: Page Fault and General Protection Fault

Re: Page Fault and General Protection Fault

Re: Page Fault and General Protection Fault

Re: Page Fault and General Protection Fault

Re: Page Fault and General Protection Fault

Re: Page Fault and General Protection Fault

Re: Page Fault and General Protection Fault

Re: Page Fault and General Protection Fault

Re: Page Fault and General Protection Fault

Re: Page Fault and General Protection Fault

Re: Page Fault and General Protection Fault

Re: Page Fault and General Protection Fault

Re: Page Fault and General Protection Fault

Re: Page Fault and General Protection Fault