x86-64 PML4 setup

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
vortexian
Posts: 13
Joined: Fri Apr 04, 2025 10:25 pm

x86-64 PML4 setup

Post by vortexian »

Hi all, first post here, I appreciate the time.

I'm working through the motions of getting a proper 64 bit environment set up for my kernel. The Limine bootloader dumps me into long mode, but I want to set up my own structures as much as possible. I have a basic bitmap-based physical frame allocator that distributes physical addresses and zeroes the memory out if specified. I'm flip flopping between triple faulting after setting CR3 (QEMU log shows that my mapping isn't correct, as we continually fault on the instruction after setting CR3) or a page fault when trying to write to my structures (just a skill issue).

If I understand the 64 bit paging structure properly, CR3 will be loaded according to the AMD64 spec, where:
[63:52] are zero
[51:12] is the physical memory address for the PML4 table, and
[11:0] somewhat reserved (though I am leaving unchanged),

meaning that any update to CR3 should be structured as such (some tutorials and example code I have seen do a direct move into CR3 of their PML4 pointer, but the AMD documentation doesn't seem to support that?)

The PML4 pointer is a 4kB page such that every 8 bytes is structured according to the document (some status and flag bits, bitfield for pointer to PDP table), and so on and so forth until we get down to PTEs who point to an actual physical frame. I have a couple clarification questions:

- Are the pointers to the next-level page structure specified according to AMD's document physical addresses that are right shifted by 12, or the raw physical address?
- I should be shifting the final physical address >> 12 before storing it into the PTE, correct?

I think I have some logic issues somewhere in my mapping code, which I have pasted below. I've been staring at this for a while and believe it's something simple I am missing but I haven't been able to nail it down.

My initialization code:

Code: Select all

void initialize_paging(struct limine_memmap_response* r, struct limine_file* kr) {
    // We'll have to put PML4 in some dedicated physical address.
    // Initially zeroed-out.
    // PML4 itself has to be physically mapped as well.
    pml4_addr = frame_alloc(true);
    pml4e* table = (pml4e*)(pml4_addr + get_hhdmoff());
    map_page((uint64_t)table, pml4_addr, false, true, true, false);

    for (uint8_t i = 0; i < r->entry_count; i++) {
        struct limine_memmap_entry* e = r->entries[i];

        // Will map kernel VMA + xxx -> kernel PA
        if (e->type == LIMINE_MEMMAP_KERNEL_AND_MODULES) {
            for (uint64_t bytes = 0; bytes < e->length; bytes += FRAME_ALLOCATION_SIZE) {
                map_page(KERNEL_VMA + bytes, e->base + bytes, true, true, false, false);
                map_page(get_hhdmoff() + bytes, e->base + bytes, true, true, false, false);
            }
        }

        // We will say the framebuffer starts at the beginning of the page following end of kernel
        if (e->type == LIMINE_MEMMAP_FRAMEBUFFER) {
            for (uint64_t bytes = 0; bytes < e->length; bytes += FRAME_ALLOCATION_SIZE) {
                map_page(pg_round_up((uint64_t)&KERNEL_END) + bytes, e->base + bytes, false, true, true, true);
            }
        }
    }

    set_cr3(pml4_addr);
    logf(INFO, "Initialized kernel and framebuffer page tables. PML4 beginning at 0x%lx\n", pml4_addr);
}
And my map_page code:
(EDIT: the initial page fault crash has been fixed, I was incorrectly referencing KERNEL_END.

Code: Select all

// Maps a single page vaddr -> paddr
void map_page(uint64_t vaddr, uint64_t paddr, bool is_supervisor, bool writable, bool no_execute, bool writethru) {
    pml4e* pml4table = (pml4e*)(pml4_addr + get_hhdmoff());

    uint64_t pdpt;
    if (!pml4table[PML4_INDEX(vaddr)].present) {
        pdpt = frame_alloc(true);
        pml4table[PML4_INDEX(vaddr)].present = true;
        pml4table[PML4_INDEX(vaddr)].pdpe_ptr = pdpt;
        pdpt += get_hhdmoff();
    } else {
        pdpt = pml4table[PML4_INDEX(vaddr)].pdpe_ptr + get_hhdmoff();
    }

    pdpe* pdptable = (pdpe*)pdpt;
    uint64_t pd;
    if (!pdptable[PDPT_INDEX(vaddr)].present) {
        pd = frame_alloc(true);
        pdptable[PDPT_INDEX(vaddr)].present = true;
        pdptable[PDPT_INDEX(vaddr)].pde_ptr = pd;
        pd += get_hhdmoff();
    } else {
        pd = pdptable[PDPT_INDEX(vaddr)].pde_ptr + get_hhdmoff();
    }

    pde* pdetable = (pde*)pd;
    uint64_t pt;
    if (!pdetable[PD_INDEX(vaddr)].present) {
        pt = frame_alloc(true);
        pdetable[PDPT_INDEX(vaddr)].present = true;
        pdetable[PDPT_INDEX(vaddr)].pte_ptr = pt;
        pd += get_hhdmoff();
    } else {
        pt = pdetable[PD_INDEX(vaddr)].pte_ptr + get_hhdmoff();
    }

    pte* ptable = (pte*)pt;
    ptable[PT_INDEX(vaddr)].present = 1;
    ptable[PT_INDEX(vaddr)].rw = writable;
    ptable[PT_INDEX(vaddr)].us = is_supervisor;
    ptable[PT_INDEX(vaddr)].pwt = writethru;
    ptable[PT_INDEX(vaddr)].frame = paddr >> 12;
}
I have two calls to map_page() for the kernel data - I'm not 100% on where my stuff is being executed from due to Limine dumping me into a somewhat initialized environment. The linker script states that my code will begin running at KERNEL_VMA + ..., but the memory map information, file information and HHDM offset that Limine provide seem to say otherwise, so I am unsure which one is needed.

This current iteration of the code page faults on a physical address (my earlier mentioned skill issue), but some prior commits made it through initial setup and began page fault -> triple faulting on writing CR3, which is why I believe my mapping to be incorrect.

Are there any glaring issues with this code?
Thanks!
nullplan
Member
Member
Posts: 1867
Joined: Wed Aug 30, 2017 8:24 am

Re: x86-64 PML4 setup

Post by nullplan »

vortexian wrote: Mon Apr 07, 2025 4:19 pm [63:52] are zero
[51:12] is the physical memory address for the PML4 table, and
[11:0] somewhat reserved (though I am leaving unchanged),
You are overcomplicating it. CR3 is the physical address of the PML4. And of course, the PML4 needs to be 4k aligned, which means the low 12 bits are all zero. Which also leaves them free to be used as flag bits, so they are used for the PCID stuff (but you can leave it zero for now).
vortexian wrote: Mon Apr 07, 2025 4:19 pm - Are the pointers to the next-level page structure specified according to AMD's document physical addresses that are right shifted by 12, or the raw physical address?
- I should be shifting the final physical address >> 12 before storing it into the PTE, correct?
Basically the same as above: Each page table entry is the physical address of the next level, with some flag bits ORed in.
vortexian wrote: Mon Apr 07, 2025 4:19 pm I have two calls to map_page() for the kernel data - I'm not 100% on where my stuff is being executed from due to Limine dumping me into a somewhat initialized environment. The linker script states that my code will begin running at KERNEL_VMA + ..., but the memory map information, file information and HHDM offset that Limine provide seem to say otherwise, so I am unsure which one is needed.
You can map your kernel to a fixed address no matter where it is loaded in physical RAM, provided it is loaded to a 4k boundary.
Carpe diem!
vortexian
Posts: 13
Joined: Fri Apr 04, 2025 10:25 pm

Re: x86-64 PML4 setup

Post by vortexian »

Ah, I see. Thank you!

I guess, then, I am not mapping it correctly, or I am and Limine isn't capturing all of that code in one of the requests I specify? Page fault on the instruction fetch following the CR3 update means it must not be mapped.
Octocontrabass
Member
Member
Posts: 5754
Joined: Mon Mar 25, 2013 7:01 pm

Re: x86-64 PML4 setup

Post by Octocontrabass »

vortexian wrote: Mon Apr 07, 2025 4:19 pmThe linker script states that my code will begin running at KERNEL_VMA + ..., but the memory map information, file information and HHDM offset that Limine provide seem to say otherwise, so I am unsure which one is needed.
The memory map tells you physical addresses only, and it doesn't give you enough information to know exactly which physical addresses belong to your kernel as opposed to any modules you may have also loaded.

By "file information" do you mean the Executable Address feature? That tells you the physical and virtual addresses of your kernel. This is what you need when you're mapping your kernel. (If you're not using KASLR, you don't need Limine to tell you the virtual address - it's whatever you specify in your linker script.)

The HHDM offset only applies to the direct mapping of all physical memory. Your kernel has its own mapping separate from the HHDM.
vortexian
Posts: 13
Joined: Fri Apr 04, 2025 10:25 pm

Re: x86-64 PML4 setup

Post by vortexian »

I see. I was using the Kernel file information and physical memory map, not the executable info (I believe there were two like that? Can’t check right now).

Does my page mapping function look fine? Ideally if I just tidy up who gets mapped I hope to be good after applying those fixes…
vortexian
Posts: 13
Joined: Fri Apr 04, 2025 10:25 pm

Re: x86-64 PML4 setup

Post by vortexian »

There was a bug in the page map function, where I was using PDPT_INDEX instead of PD_INDEX. Fixed that but it is still triple faulting. The QEMU logs report:

Code: Select all

check_exception old: 0xffffffff new 0xe
     0: v=0e e=0010 i=0 cpl=0 IP=0028:ffffffff80002d9d pc=ffffffff80002d9d SP=0030:ffff8000bff88f40 CR2=ffffffff80002d9d
RAX=0000000000052000 RBX=00000000fd3e7000 RCX=0000000000000007 RDX=00000000bff28007
RSI=fff0000000000fff RDI=ffffffff8000a000 RBP=ffff8000bff88fc0 RSP=ffff8000bff88f40
R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
R12=00000000fd000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
RIP=ffffffff80002d9d RFL=00000046 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0030 0000000000000000 00000000 00009300 DPL=0 DS   [-WA]
CS =0028 0000000000000000 00000000 00209b00 DPL=0 CS64 [-RA]
SS =0030 0000000000000000 00000000 00009300 DPL=0 DS   [-WA]
DS =0030 0000000000000000 00000000 00009300 DPL=0 DS   [-WA]
FS =0030 0000000000000000 00000000 00009300 DPL=0 DS   [-WA]
GS =0030 0000000000000000 00000000 00009300 DPL=0 DS   [-WA]
LDT=0000 0000000000000000 00000000 00008200 DPL=0 LDT
TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
GDT=     ffff800000014e08 00000037
IDT=     ffffffff80008460 00000fff
CR0=80010011 CR2=ffffffff80002d9d CR3=0000000000052000 CR4=00000020
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
DR6=00000000ffff0ff0 DR7=0000000000000400
CCS=0000000000000044 CCD=0000000000000000 CCO=EFLAGS
EFER=0000000000000d00
check_exception old: 0xe new 0xe
     1: v=08 e=0000 i=0 cpl=0 IP=0028:ffffffff80002d9d pc=ffffffff80002d9d SP=0030:ffff8000bff88f40 env->regs[R_EAX]=0000000000052000
RAX=0000000000052000 RBX=00000000fd3e7000 RCX=0000000000000007 RDX=00000000bff28007
RSI=fff0000000000fff RDI=ffffffff8000a000 RBP=ffff8000bff88fc0 RSP=ffff8000bff88f40
R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
R12=00000000fd000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
RIP=ffffffff80002d9d RFL=00000046 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0030 0000000000000000 00000000 00009300 DPL=0 DS   [-WA]
CS =0028 0000000000000000 00000000 00209b00 DPL=0 CS64 [-RA]
SS =0030 0000000000000000 00000000 00009300 DPL=0 DS   [-WA]
DS =0030 0000000000000000 00000000 00009300 DPL=0 DS   [-WA]
FS =0030 0000000000000000 00000000 00009300 DPL=0 DS   [-WA]
GS =0030 0000000000000000 00000000 00009300 DPL=0 DS   [-WA]
LDT=0000 0000000000000000 00000000 00008200 DPL=0 LDT
TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
GDT=     ffff800000014e08 00000037
IDT=     ffffffff80008460 00000fff
CR0=80010011 CR2=ffffffff80008540 CR3=0000000000052000 CR4=00000020
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
DR6=00000000ffff0ff0 DR7=0000000000000400
CCS=0000000000000044 CCD=0000000000000000 CCO=EFLAGS
EFER=0000000000000d00
check_exception old: 0x8 new 0xe
Triple fault
The instruction at the faulting address is the instruction that comes right after setting CR3. I adapted the kernel mapping code to go off of the linker-defined symbols that are page aligned for the virtual address, and the physical base provided by the Limine request. I know the kernel code must physically reside on the addresses I am given by the address request since we're executing code up until that point, but when I go to update the mapping we triple fault. Are there any debugging tips for this?

When I inspect the page table setup before I update CR3, the data looks like it's mapped properly:

Code: Select all

(gdb) p/x table[PML4_INDEX(0xffffffff80002d9d)]
$10 = {present = 0x1, rw = 0x0, us = 0x0, pwt = 0x0, pcd = 0x0, accessed = 0x0, ign1 = 0x0, zero1 = 0x0, zero2 = 0x0, dummy = 0x0,
  pdpe_ptr = 0x59000, dummy2 = 0x0, nx = 0x0}
(gdb) p/x ((pdpe*)(0x59000 + get_hhdmoff()))[PDPT_INDEX(0xffffffff80002d9d)]
$11 = {present = 0x1, rw = 0x0, us = 0x0, pwt = 0x0, pcd = 0x0, accessed = 0x0, ign1 = 0x0, zero = 0x0, ign2 = 0x0, dummy = 0x0,
  pde_ptr = 0x5a000, dummy2 = 0x0, nx = 0x0}
 $12 = {present = 0x1, rw = 0x0, us = 0x0, pwt = 0x0, pcd = 0x0, accessed = 0x0, ign1 = 0x0, zero = 0x0, ign2 = 0x0, dummy = 0x0,
  pte_ptr = 0x5b000, dummy2 = 0x0, nx = 0x0}
(gdb) p/x ((pte*)(0x5b000+hhdmoff))[PT_INDEX(0xffffffff80002d9d)]
$13 = {present = 0x1, rw = 0x1, us = 0x1, pwt = 0x0, pcd = 0x0, accessed = 0x0, dirty = 0x0, pat = 0x0, g = 0x0, avail = 0x0, frame = 0xbff34,
  dummy = 0x0, dummy2 = 0x0, nx = 0x0}
(gdb) p/x r2->virtual_base
$14 = 0xffffffff80000000
(gdb) p/x r2->physical_base
$15 = 0xbff32000
Given this faulting instruction pointer is 0xffffffff80002d9d = 3rd page of kernel vmem, checks out that our frame address is 0xbff34.

I don't say that the kernel pages are no-execute, but according to the error code QEMU provides, the fault was due to instruction fetch (which only applies if No-Execute is supported and enabled, meaning set to 1), but I never set NX to 1 for any of the pages I currently try and map.
sebihepp
Member
Member
Posts: 210
Joined: Tue Aug 26, 2008 11:24 am
GitHub: https://github.com/sebihepp

Re: x86-64 PML4 setup

Post by sebihepp »

Are you sure qemu is using 4-level paging? Maybe VA57 is active?
Which request do you use to get the kernels physical address?
vortexian
Posts: 13
Joined: Fri Apr 04, 2025 10:25 pm

Re: x86-64 PML4 setup

Post by vortexian »

I use these two requests for size and addresses:

Code: Select all

__attribute__((used, section(".limine_requests")))
static volatile struct limine_kernel_file_request kernel_request = {
    .id = LIMINE_KERNEL_FILE_REQUEST,
    .revision = 0
};

__attribute__((used, section(".limine_requests")))
static volatile struct limine_kernel_address_request addr_request = {
    .id = LIMINE_KERNEL_ADDRESS_REQUEST,
    .revision = 0
};
The first gives me size, the second gives me physical_base and virtual_base for the kernel.

I don't actually know if it's using 4-level paging, I believed that to be the default though I don't ever explicitly configure it with the requests, so I will check.
vortexian
Posts: 13
Joined: Fri Apr 04, 2025 10:25 pm

Re: x86-64 PML4 setup

Post by vortexian »

sebihepp wrote: Tue Apr 08, 2025 1:33 pm Are you sure qemu is using 4-level paging? Maybe VA57 is active?
Which request do you use to get the kernels physical address?
Ok, confirmed it's not using 5-level paging. But I'll insert the Limine request to force it to be 4-level.
sebihepp
Member
Member
Posts: 210
Joined: Tue Aug 26, 2008 11:24 am
GitHub: https://github.com/sebihepp

Re: x86-64 PML4 setup

Post by sebihepp »

Can you dump the entire page tree for that failing virtual address?
I would like to see the entries leading to it. Maybe the problem is not in the pml1 but in pml3 or so.

Also, the file request loads the file into memory. You need to parse the elf headers to find out the correct size, because .bss section is normally not in the file, as it contains uninitialized memory.
Octocontrabass
Member
Member
Posts: 5754
Joined: Mon Mar 25, 2013 7:01 pm

Re: x86-64 PML4 setup

Post by Octocontrabass »

vortexian wrote: Tue Apr 08, 2025 1:17 pm

Code: Select all

pdpe_ptr = 0x59000
Hold on, are you using bit fields? If you're using bit fields, that should be 0x59 instead of 0x59000 because the lowest 12 bits of the address are replaced by other fields.
vortexian
Posts: 13
Joined: Fri Apr 04, 2025 10:25 pm

Re: x86-64 PML4 setup

Post by vortexian »

Ah! That would probably do it. I am using bitfields in my structs - I only >> 12 for PTEs. Will look into it. Can't believe I missed that...
vortexian
Posts: 13
Joined: Fri Apr 04, 2025 10:25 pm

Re: x86-64 PML4 setup

Post by vortexian »

sebihepp wrote: Tue Apr 08, 2025 2:48 pm Can you dump the entire page tree for that failing virtual address?
I would like to see the entries leading to it. Maybe the problem is not in the pml1 but in pml3 or so.

Also, the file request loads the file into memory. You need to parse the elf headers to find out the correct size, because .bss section is normally not in the file, as it contains uninitialized memory.
So the above fixes maybe helped? I am still triple faulting, however, it's now on a stack address. Understandable that I don't map it in because it's not part of kernel or framebuffer - is this something I can solve with another Limine request of X amount of stack space, or should I grab the stack pointer upon entering kernel code and map X amount of kB around it?

edit: The instruction triggering this is a call instruction to my logging function which uses varargs, but the stack addresses are mapped writable and present. The error code from QEMU was 0x0003 - if I'm interpreting that right, it means we faulted on a page because of a protection violation, and due to a write. The fault happens immediately upon executing call, so I assume that the fault happens when call tries to push the return address onto the stack?
sebihepp
Member
Member
Posts: 210
Joined: Tue Aug 26, 2008 11:24 am
GitHub: https://github.com/sebihepp

Re: x86-64 PML4 setup

Post by sebihepp »

Maybe you missed the stack is growing downwards?
With Limine requests you can request a certain stack size, but AFAIK to get the physical address the only possible solution is to walk the paging manually while the paging from limine is still active.

The error code is 0x3 means the page is not present during a write.
vortexian
Posts: 13
Joined: Fri Apr 04, 2025 10:25 pm

Re: x86-64 PML4 setup

Post by vortexian »

I thought 0x3 was that it was present, and had a protection violation during / due to a write. The error code information for a page fault states that bit 0 is NOT set if the page wasn't present. I know stack grows down - I grabbed RSP from the entrypoint in the kernel and map from that address downwards.

I'll look into walking the page table from Limine then.
Post Reply