Page table addressing in 64-bit mode

jbemmel · Post by **jbemmel** » Thu Jul 05, 2012 11:31 pm

I use the approach sketched in http://wiki.osdev.org/Paging#Manipulation for my 32-bit OS, and now I am implementing 64-bit support.

As many will know, in x86-64 mode the paging structure is extended from 2 to 4 levels, using 64-bit PTEs and 512 entries per table. This means one can address 4*9+12 = 48 bits of physical memory space, which is plenty.

In 32-bit mode, you can use the upper 4MB of linear space for mapping page tables. Counting how much space is required can be confusing: you have 1 PBT and 1024 PDs, so you might think you need 4MB + 4KB. However, the PBT is identity mapped at 0xfffff000 and essentially becomes a PD itself too.

In 64-bit mode, I'm planning to reserve 1GB of linear space for the upper 3 levels of page tables. The PML4 appears at 0xffffffff:fffff000 (-4KB). I think one can apply the same logic as for 32-bit and put the 512 PDPTs at 0xffffffff:ffe00000 (-2MB), and the 512x512 PDs at 0xffffffff:c0000000 (-1GB) - does that make sense?

Nessphoro · Post by **Nessphoro** » Fri Jul 06, 2012 12:06 am

Setting Up Paging With PAE

bluemoon · Post by **bluemoon** » Fri Jul 06, 2012 12:28 am

For 64-bit recursive page directory, you only need to choose any number from [0~511] and put PML4[n] = Physical address of PML4[]

If you choose n = 511, you end up with structures at address you mentioned(note I have not verify the address).

I have a macro to convert n into address(this one I verified and I'm using):
(Since I uses -mcmodel=kernel, my kernel sits on -2GB so I need somewhere else for paging structures)

Code: Select all

// MMU recursive mapping (-512GB ~ -256GB)
// ----------------------------------------------
#define MMU_RECURSIVE_SLOT      (510UL)


// Convert an address into array index of a structure
// E.G. int index = MMU_PML4_INDEX(0xFFFFFFFFFFFFFFFF); // index = 511
#define MMU_PML4_INDEX(addr)    ((((uintptr_t)(addr))>>39) & 511)
#define MMU_PDPT_INDEX(addr)    ((((uintptr_t)(addr))>>30) & 511)
#define MMU_PD_INDEX(addr)      ((((uintptr_t)(addr))>>21) & 511)
#define MMU_PT_INDEX(addr)      ((((uintptr_t)(addr))>>12) & 511)

// Base address for paging structures
#define KADDR_MMU_PT            (0xFFFF000000000000UL + (MMU_RECURSIVE_SLOT<<39))
#define KADDR_MMU_PD            (KADDR_MMU_PT         + (MMU_RECURSIVE_SLOT<<30))
#define KADDR_MMU_PDPT          (KADDR_MMU_PD         + (MMU_RECURSIVE_SLOT<<21))
#define KADDR_MMU_PML4          (KADDR_MMU_PDPT       + (MMU_RECURSIVE_SLOT<<12))

// Structures for given address, for example
// uint64_t* pt = MMU_PT(addr)
// uint64_t physical_addr = pt[MMU_PT_INDEX(addr)];
#define MMU_PML4(addr)          ((uint64_t*)  KADDR_MMU_PML4 )
#define MMU_PDPT(addr)          ((uint64_t*)( KADDR_MMU_PDPT + (((addr)>>27) & 0x00001FF000) ))
#define MMU_PD(addr)            ((uint64_t*)( KADDR_MMU_PD   + (((addr)>>18) & 0x003FFFF000) ))
#define MMU_PT(addr)            ((uint64_t*)( KADDR_MMU_PT   + (((addr)>>9)  & 0x7FFFFFF000) ))

The setup code become fairly straightforward:

Code: Select all

// For initializing, just add
    // Install recursive page directory
    k_PML4[MMU_RECURSIVE_SLOT] = KADDR_PMA(k_PML4) +3;

For cloning page directory when creating new process:
// -------------------------------------------------
MMU_PADDR MMU_clonepagedir(void) {
    uint64_t* clone = (uint64_t*)KADDR_CLONEPD;
    MMU_PADDR paddr = MMU_alloc();
    if ( paddr == 0 ) return 0;
    MMU_mmap ( clone, paddr, 4096, MMU_MMAP_MAPPHY );
    memset ( clone, 0, 256*8 );
    memcpy ( &clone[256], &k_PML4[256], 256*8 );

    // Recursive
    clone[MMU_RECURSIVE_SLOT] = paddr +3;

    MMU_munmap ( clone, 4096, MMU_MUNMAP_NORELEASE );
    return paddr;
}

I notice there is a lack of 64-bit recursive page directory example on the wiki, I hope this help.

xenos · Post by **xenos** » Fri Jul 06, 2012 1:12 am

Your idea looks reasonable to me (and in fact it seems to be identical to my own kernel's page table mapping in 64 bit mode). If you simply enter the physical address of the PML4T into the last entry of the PML4T (as bluemoon wrote above), i.e., your PML4T is the last PDP, you end up with the following layout at the top of linear memory:

Code: Select all

0xfffffffffffff000 - PML4
0xffffffffffe00000 - PDPs
0xffffffffc0000000 - page directories
0xffffff8000000000 - page tables

So the addresses you calculated are correct. Be aware that this does not only consume 1GB, but in fact 512GB since all 4 paging levels are mapped to the top of linear memory, including the page tables.

gerryg400 · Post by **gerryg400** » Fri Jul 06, 2012 1:57 am

Since there is so much more virtual memory than physical memory, why bother with the recursive memory 'trick'. Why not simply map all physical memory into the kernel and have all of it accessible all the time ? You can then have a simple phys2kvirt and kvirt2phys macros like we used to do when 32bit OSs had less than 1GB of physical RAM.

Owen · Post by **Owen** » Fri Jul 06, 2012 1:59 am

Using the top slot in the PML4 will force you to compile your kernel with -mcmodel=large, which has a non-negligible efficiency decrease relative to -mcmodel=kernel. I generally would suggest picking a different PML4E for your fractal map.

gerryg400 wrote:Since there is so much more virtual memory than physical memory, why bother with the recursive memory 'trick'. Why not simply map all physical memory into the kernel and have all of it accessible all the time ? You can then have a simple phys2kvirt and kvirt2phys macros like we used to do when 32bit OSs had less than 1GB of physical RAM.

Presently shipping AMD64 CPUs have a 48-bit physical address space. This has obvious issues fitting around other items in your 48-bit virtual address space.

gerryg400 · Post by **gerryg400** » Fri Jul 06, 2012 3:50 am

You're correct. Currently my MM only supports 64TB of physical memory. I guess when that becomes a limitation I'll need to look at the recursive/fractal thing.

jbemmel · Post by **jbemmel** » Fri Jul 06, 2012 7:59 am

Owen wrote:Using the top slot in the PML4 will force you to compile your kernel with -mcmodel=large, which has a non-negligible efficiency decrease relative to -mcmodel=kernel. I generally would suggest picking a different PML4E for your fractal map.

XenOS wrote:Be aware that this does not only consume 1GB, but in fact 512GB since all 4 paging levels are mapped to the top of linear memory, including the page tables.

I should have added that I (was thinking to) use -mcmodel=kernel, so the kernel lives at -2GB ( 0xffffffff:80000000 ). I did not realize I would automatically be mapping all 4 levels; this may indeed make it necessary to pick another slot than the top one for mapping PML4 recursively, else the page tables contain random mappings ( i.e. kernel code bytes )?

bluemoon · Post by **bluemoon** » Fri Jul 06, 2012 9:05 am

If you use mcmodel=kernel, you already using slot[511], you cannot assign two value into same slot simultaneously; in my example above I used 510 instead.

ps. fix your quote, I didn't said that.

jbemmel · Post by **jbemmel** » Fri Jul 06, 2012 9:47 am

jbemmel wrote:ps. fix your quote, I didn't said that.

[off topic] Nice feature, this quoting

You said what??!?[/off topic]

Cognition · Post by **Cognition** » Fri Jul 06, 2012 8:46 pm

Owen wrote:Using the top slot in the PML4 will force you to compile your kernel with -mcmodel=large, which has a non-negligible efficiency decrease relative to -mcmodel=kernel. I generally would suggest picking a different PML4E for your fractal map.

There's also the option of using -mcmodel=small with -fPIC. I don't know what kind of overhead it would add relative to kernel memory model, but I have to imagine it's much less than that of the large memory model.

Owen · Post by **Owen** » Fri Jul 06, 2012 9:06 pm

Cognition wrote:
Owen wrote:Using the top slot in the PML4 will force you to compile your kernel with -mcmodel=large, which has a non-negligible efficiency decrease relative to -mcmodel=kernel. I generally would suggest picking a different PML4E for your fractal map.
There's also the option of using -mcmodel=small with -fPIC. I don't know what kind of overhead it would add relative to kernel memory model, but I have to imagine it's much less than that of the large memory model.

-mcmodel=small is like -mcmodel=kernel, except in the region 0 to +2GB.

Virtlink · Post by **Virtlink** » Fri Apr 26, 2013 4:30 am

Owen wrote:Using the top slot in the PML4 will force you to compile your kernel with -mcmodel=large, which has a non-negligible efficiency decrease relative to -mcmodel=kernel. I generally would suggest picking a different PML4E for your fractal map.

I don't quite fully understand -mcmodel, but what I thought is this:

Regardless of the -mcmodel used, when your kernel (code, data and bss) is smaller than 2 GB, it surely always can use IP-relative addressing and 32-bit offsets for all defined symbols, right? And you can explicitly cast any 64-bit address into a pointer, so you can access your recursively mapped page tables regardless of the mcmodel and regardless of where they are placed in memory. So only for undefined or relocated symbols does the mcmodel matter. Or am I wrong?

bluemoon · Post by **bluemoon** » Fri Apr 26, 2013 4:44 am

mcmodel tell the compiler your range of pointer address.
For x86_64 the address can be sign extended, so putting -1GB into a pointer means the top 1GB.

Now for different mcmodel, this usually matter when using indirect addressing (ie. mov rdi, my_var or mov eax, [my_var])
1. small - 32-bit instruction is used (ie. mov edi, my_var or mov eax, [my_var]) , generate linker error if an address can't fit by relocation.
2. kernel - 32-bit instruction is used (ie. mov edi, my_var or mov eax, [my_var]) , generate linker error if an address can't fit by relocation. This also depend on the fact that address is sign-extended.
3. large - 64-bit instruction is used (ie. mov rdi, my_var then mov eax, [rdi]), since you can't put 64-bit opland on some instructions. This has overhead compared to 32-bit one.
4. medium - I don't care much on it :p

PIC is another thing, which access is relative to RIP, however the range limit still applies.

It also worth noting that you can access the full virtual space in any model with direct address (ie. int*p = 0xFFFFFFFFFF000000; *p = 0;)
It just affect some kind of code generation that limit the value (object address) patch by the linker.

OSDev.org

Page table addressing in 64-bit mode

Page table addressing in 64-bit mode

Re: Page table addressing in 64-bit mode

Re: Page table addressing in 64-bit mode

Re: Page table addressing in 64-bit mode

Re: Page table addressing in 64-bit mode

Re: Page table addressing in 64-bit mode

Re: Page table addressing in 64-bit mode

Re: Page table addressing in 64-bit mode

Re: Page table addressing in 64-bit mode

Re: Page table addressing in 64-bit mode

Re: Page table addressing in 64-bit mode

Re: Page table addressing in 64-bit mode

Re: Page table addressing in 64-bit mode

Re: Page table addressing in 64-bit mode