Page table addressing in 64-bit mode

Discussions on more advanced topics such as monolithic vs micro-kernels, transactional memory models, and paging vs segmentation should go here. Use this forum to expand and improve the wiki!
Post Reply
jbemmel
Member
Member
Posts: 53
Joined: Fri May 11, 2012 11:54 am

Page table addressing in 64-bit mode

Post by jbemmel »

I use the approach sketched in http://wiki.osdev.org/Paging#Manipulation for my 32-bit OS, and now I am implementing 64-bit support.

As many will know, in x86-64 mode the paging structure is extended from 2 to 4 levels, using 64-bit PTEs and 512 entries per table. This means one can address 4*9+12 = 48 bits of physical memory space, which is plenty.

In 32-bit mode, you can use the upper 4MB of linear space for mapping page tables. Counting how much space is required can be confusing: you have 1 PBT and 1024 PDs, so you might think you need 4MB + 4KB. However, the PBT is identity mapped at 0xfffff000 and essentially becomes a PD itself too.

In 64-bit mode, I'm planning to reserve 1GB of linear space for the upper 3 levels of page tables. The PML4 appears at 0xffffffff:fffff000 (-4KB). I think one can apply the same logic as for 32-bit and put the 512 PDPTs at 0xffffffff:ffe00000 (-2MB), and the 512x512 PDs at 0xffffffff:c0000000 (-1GB) - does that make sense?
User avatar
Nessphoro
Member
Member
Posts: 308
Joined: Sat Apr 30, 2011 12:50 am

Re: Page table addressing in 64-bit mode

Post by Nessphoro »

User avatar
bluemoon
Member
Member
Posts: 1761
Joined: Wed Dec 01, 2010 3:41 am
Location: Hong Kong

Re: Page table addressing in 64-bit mode

Post by bluemoon »

For 64-bit recursive page directory, you only need to choose any number from [0~511] and put PML4[n] = Physical address of PML4[]

If you choose n = 511, you end up with structures at address you mentioned(note I have not verify the address).

I have a macro to convert n into address(this one I verified and I'm using):
(Since I uses -mcmodel=kernel, my kernel sits on -2GB so I need somewhere else for paging structures)

Code: Select all

// MMU recursive mapping (-512GB ~ -256GB)
// ----------------------------------------------
#define MMU_RECURSIVE_SLOT      (510UL)


// Convert an address into array index of a structure
// E.G. int index = MMU_PML4_INDEX(0xFFFFFFFFFFFFFFFF); // index = 511
#define MMU_PML4_INDEX(addr)    ((((uintptr_t)(addr))>>39) & 511)
#define MMU_PDPT_INDEX(addr)    ((((uintptr_t)(addr))>>30) & 511)
#define MMU_PD_INDEX(addr)      ((((uintptr_t)(addr))>>21) & 511)
#define MMU_PT_INDEX(addr)      ((((uintptr_t)(addr))>>12) & 511)

// Base address for paging structures
#define KADDR_MMU_PT            (0xFFFF000000000000UL + (MMU_RECURSIVE_SLOT<<39))
#define KADDR_MMU_PD            (KADDR_MMU_PT         + (MMU_RECURSIVE_SLOT<<30))
#define KADDR_MMU_PDPT          (KADDR_MMU_PD         + (MMU_RECURSIVE_SLOT<<21))
#define KADDR_MMU_PML4          (KADDR_MMU_PDPT       + (MMU_RECURSIVE_SLOT<<12))

// Structures for given address, for example
// uint64_t* pt = MMU_PT(addr)
// uint64_t physical_addr = pt[MMU_PT_INDEX(addr)];
#define MMU_PML4(addr)          ((uint64_t*)  KADDR_MMU_PML4 )
#define MMU_PDPT(addr)          ((uint64_t*)( KADDR_MMU_PDPT + (((addr)>>27) & 0x00001FF000) ))
#define MMU_PD(addr)            ((uint64_t*)( KADDR_MMU_PD   + (((addr)>>18) & 0x003FFFF000) ))
#define MMU_PT(addr)            ((uint64_t*)( KADDR_MMU_PT   + (((addr)>>9)  & 0x7FFFFFF000) ))
The setup code become fairly straightforward:

Code: Select all

// For initializing, just add
    // Install recursive page directory
    k_PML4[MMU_RECURSIVE_SLOT] = KADDR_PMA(k_PML4) +3;

For cloning page directory when creating new process:
// -------------------------------------------------
MMU_PADDR MMU_clonepagedir(void) {
    uint64_t* clone = (uint64_t*)KADDR_CLONEPD;
    MMU_PADDR paddr = MMU_alloc();
    if ( paddr == 0 ) return 0;
    MMU_mmap ( clone, paddr, 4096, MMU_MMAP_MAPPHY );
    memset ( clone, 0, 256*8 );
    memcpy ( &clone[256], &k_PML4[256], 256*8 );

    // Recursive
    clone[MMU_RECURSIVE_SLOT] = paddr +3;

    MMU_munmap ( clone, 4096, MMU_MUNMAP_NORELEASE );
    return paddr;
}
I notice there is a lack of 64-bit recursive page directory example on the wiki, I hope this help.
User avatar
xenos
Member
Member
Posts: 1118
Joined: Thu Aug 11, 2005 11:00 pm
Libera.chat IRC: xenos1984
Location: Tartu, Estonia
Contact:

Re: Page table addressing in 64-bit mode

Post by xenos »

Your idea looks reasonable to me (and in fact it seems to be identical to my own kernel's page table mapping in 64 bit mode). If you simply enter the physical address of the PML4T into the last entry of the PML4T (as bluemoon wrote above), i.e., your PML4T is the last PDP, you end up with the following layout at the top of linear memory:

Code: Select all

0xfffffffffffff000 - PML4
0xffffffffffe00000 - PDPs
0xffffffffc0000000 - page directories
0xffffff8000000000 - page tables
So the addresses you calculated are correct. Be aware that this does not only consume 1GB, but in fact 512GB since all 4 paging levels are mapped to the top of linear memory, including the page tables.
Programmers' Hardware Database // GitHub user: xenos1984; OS project: NOS
gerryg400
Member
Member
Posts: 1801
Joined: Thu Mar 25, 2010 11:26 pm
Location: Melbourne, Australia

Re: Page table addressing in 64-bit mode

Post by gerryg400 »

Since there is so much more virtual memory than physical memory, why bother with the recursive memory 'trick'. Why not simply map all physical memory into the kernel and have all of it accessible all the time ? You can then have a simple phys2kvirt and kvirt2phys macros like we used to do when 32bit OSs had less than 1GB of physical RAM.
If a trainstation is where trains stop, what is a workstation ?
User avatar
Owen
Member
Member
Posts: 1700
Joined: Fri Jun 13, 2008 3:21 pm
Location: Cambridge, United Kingdom
Contact:

Re: Page table addressing in 64-bit mode

Post by Owen »

Using the top slot in the PML4 will force you to compile your kernel with -mcmodel=large, which has a non-negligible efficiency decrease relative to -mcmodel=kernel. I generally would suggest picking a different PML4E for your fractal map.

gerryg400 wrote:Since there is so much more virtual memory than physical memory, why bother with the recursive memory 'trick'. Why not simply map all physical memory into the kernel and have all of it accessible all the time ? You can then have a simple phys2kvirt and kvirt2phys macros like we used to do when 32bit OSs had less than 1GB of physical RAM.
Presently shipping AMD64 CPUs have a 48-bit physical address space. This has obvious issues fitting around other items in your 48-bit virtual address space.
gerryg400
Member
Member
Posts: 1801
Joined: Thu Mar 25, 2010 11:26 pm
Location: Melbourne, Australia

Re: Page table addressing in 64-bit mode

Post by gerryg400 »

You're correct. Currently my MM only supports 64TB of physical memory. I guess when that becomes a limitation I'll need to look at the recursive/fractal thing.
If a trainstation is where trains stop, what is a workstation ?
jbemmel
Member
Member
Posts: 53
Joined: Fri May 11, 2012 11:54 am

Re: Page table addressing in 64-bit mode

Post by jbemmel »

Owen wrote:Using the top slot in the PML4 will force you to compile your kernel with -mcmodel=large, which has a non-negligible efficiency decrease relative to -mcmodel=kernel. I generally would suggest picking a different PML4E for your fractal map.
XenOS wrote:Be aware that this does not only consume 1GB, but in fact 512GB since all 4 paging levels are mapped to the top of linear memory, including the page tables.
I should have added that I (was thinking to) use -mcmodel=kernel, so the kernel lives at -2GB ( 0xffffffff:80000000 ). I did not realize I would automatically be mapping all 4 levels; this may indeed make it necessary to pick another slot than the top one for mapping PML4 recursively, else the page tables contain random mappings ( i.e. kernel code bytes )?
Last edited by jbemmel on Fri Jul 06, 2012 9:46 am, edited 1 time in total.
User avatar
bluemoon
Member
Member
Posts: 1761
Joined: Wed Dec 01, 2010 3:41 am
Location: Hong Kong

Re: Page table addressing in 64-bit mode

Post by bluemoon »

If you use mcmodel=kernel, you already using slot[511], you cannot assign two value into same slot simultaneously; in my example above I used 510 instead.

ps. fix your quote, I didn't said that.
jbemmel
Member
Member
Posts: 53
Joined: Fri May 11, 2012 11:54 am

Re: Page table addressing in 64-bit mode

Post by jbemmel »

jbemmel wrote:ps. fix your quote, I didn't said that.
[off topic] Nice feature, this quoting :D You said what??!?[/off topic]
Cognition
Member
Member
Posts: 191
Joined: Tue Apr 15, 2008 6:37 pm
Location: Gotham, Batmanistan

Re: Page table addressing in 64-bit mode

Post by Cognition »

Owen wrote:Using the top slot in the PML4 will force you to compile your kernel with -mcmodel=large, which has a non-negligible efficiency decrease relative to -mcmodel=kernel. I generally would suggest picking a different PML4E for your fractal map.
There's also the option of using -mcmodel=small with -fPIC. I don't know what kind of overhead it would add relative to kernel memory model, but I have to imagine it's much less than that of the large memory model.
Reserved for OEM use.
User avatar
Owen
Member
Member
Posts: 1700
Joined: Fri Jun 13, 2008 3:21 pm
Location: Cambridge, United Kingdom
Contact:

Re: Page table addressing in 64-bit mode

Post by Owen »

Cognition wrote:
Owen wrote:Using the top slot in the PML4 will force you to compile your kernel with -mcmodel=large, which has a non-negligible efficiency decrease relative to -mcmodel=kernel. I generally would suggest picking a different PML4E for your fractal map.
There's also the option of using -mcmodel=small with -fPIC. I don't know what kind of overhead it would add relative to kernel memory model, but I have to imagine it's much less than that of the large memory model.
-mcmodel=small is like -mcmodel=kernel, except in the region 0 to +2GB.
Virtlink
Member
Member
Posts: 34
Joined: Thu Jun 05, 2008 3:53 pm
Location: The Netherlands
Contact:

Re: Page table addressing in 64-bit mode

Post by Virtlink »

Owen wrote:Using the top slot in the PML4 will force you to compile your kernel with -mcmodel=large, which has a non-negligible efficiency decrease relative to -mcmodel=kernel. I generally would suggest picking a different PML4E for your fractal map.
I don't quite fully understand -mcmodel, but what I thought is this:

Regardless of the -mcmodel used, when your kernel (code, data and bss) is smaller than 2 GB, it surely always can use IP-relative addressing and 32-bit offsets for all defined symbols, right? And you can explicitly cast any 64-bit address into a pointer, so you can access your recursively mapped page tables regardless of the mcmodel and regardless of where they are placed in memory. So only for undefined or relocated symbols does the mcmodel matter. Or am I wrong?
User avatar
bluemoon
Member
Member
Posts: 1761
Joined: Wed Dec 01, 2010 3:41 am
Location: Hong Kong

Re: Page table addressing in 64-bit mode

Post by bluemoon »

mcmodel tell the compiler your range of pointer address.
For x86_64 the address can be sign extended, so putting -1GB into a pointer means the top 1GB.

Now for different mcmodel, this usually matter when using indirect addressing (ie. mov rdi, my_var or mov eax, [my_var])
1. small - 32-bit instruction is used (ie. mov edi, my_var or mov eax, [my_var]) , generate linker error if an address can't fit by relocation.
2. kernel - 32-bit instruction is used (ie. mov edi, my_var or mov eax, [my_var]) , generate linker error if an address can't fit by relocation. This also depend on the fact that address is sign-extended.
3. large - 64-bit instruction is used (ie. mov rdi, my_var then mov eax, [rdi]), since you can't put 64-bit opland on some instructions. This has overhead compared to 32-bit one.
4. medium - I don't care much on it :p

PIC is another thing, which access is relative to RIP, however the range limit still applies.

It also worth noting that you can access the full virtual space in any model with direct address (ie. int*p = 0xFFFFFFFFFF000000; *p = 0;)
It just affect some kind of code generation that limit the value (object address) patch by the linker.
Post Reply