Page table addressing in 64-bit mode
Page table addressing in 64-bit mode
I use the approach sketched in http://wiki.osdev.org/Paging#Manipulation for my 32-bit OS, and now I am implementing 64-bit support.
As many will know, in x86-64 mode the paging structure is extended from 2 to 4 levels, using 64-bit PTEs and 512 entries per table. This means one can address 4*9+12 = 48 bits of physical memory space, which is plenty.
In 32-bit mode, you can use the upper 4MB of linear space for mapping page tables. Counting how much space is required can be confusing: you have 1 PBT and 1024 PDs, so you might think you need 4MB + 4KB. However, the PBT is identity mapped at 0xfffff000 and essentially becomes a PD itself too.
In 64-bit mode, I'm planning to reserve 1GB of linear space for the upper 3 levels of page tables. The PML4 appears at 0xffffffff:fffff000 (-4KB). I think one can apply the same logic as for 32-bit and put the 512 PDPTs at 0xffffffff:ffe00000 (-2MB), and the 512x512 PDs at 0xffffffff:c0000000 (-1GB) - does that make sense?
As many will know, in x86-64 mode the paging structure is extended from 2 to 4 levels, using 64-bit PTEs and 512 entries per table. This means one can address 4*9+12 = 48 bits of physical memory space, which is plenty.
In 32-bit mode, you can use the upper 4MB of linear space for mapping page tables. Counting how much space is required can be confusing: you have 1 PBT and 1024 PDs, so you might think you need 4MB + 4KB. However, the PBT is identity mapped at 0xfffff000 and essentially becomes a PD itself too.
In 64-bit mode, I'm planning to reserve 1GB of linear space for the upper 3 levels of page tables. The PML4 appears at 0xffffffff:fffff000 (-4KB). I think one can apply the same logic as for 32-bit and put the 512 PDPTs at 0xffffffff:ffe00000 (-2MB), and the 512x512 PDs at 0xffffffff:c0000000 (-1GB) - does that make sense?
Re: Page table addressing in 64-bit mode
For 64-bit recursive page directory, you only need to choose any number from [0~511] and put PML4[n] = Physical address of PML4[]
If you choose n = 511, you end up with structures at address you mentioned(note I have not verify the address).
I have a macro to convert n into address(this one I verified and I'm using):
(Since I uses -mcmodel=kernel, my kernel sits on -2GB so I need somewhere else for paging structures)
The setup code become fairly straightforward:
I notice there is a lack of 64-bit recursive page directory example on the wiki, I hope this help.
If you choose n = 511, you end up with structures at address you mentioned(note I have not verify the address).
I have a macro to convert n into address(this one I verified and I'm using):
(Since I uses -mcmodel=kernel, my kernel sits on -2GB so I need somewhere else for paging structures)
Code: Select all
// MMU recursive mapping (-512GB ~ -256GB)
// ----------------------------------------------
#define MMU_RECURSIVE_SLOT (510UL)
// Convert an address into array index of a structure
// E.G. int index = MMU_PML4_INDEX(0xFFFFFFFFFFFFFFFF); // index = 511
#define MMU_PML4_INDEX(addr) ((((uintptr_t)(addr))>>39) & 511)
#define MMU_PDPT_INDEX(addr) ((((uintptr_t)(addr))>>30) & 511)
#define MMU_PD_INDEX(addr) ((((uintptr_t)(addr))>>21) & 511)
#define MMU_PT_INDEX(addr) ((((uintptr_t)(addr))>>12) & 511)
// Base address for paging structures
#define KADDR_MMU_PT (0xFFFF000000000000UL + (MMU_RECURSIVE_SLOT<<39))
#define KADDR_MMU_PD (KADDR_MMU_PT + (MMU_RECURSIVE_SLOT<<30))
#define KADDR_MMU_PDPT (KADDR_MMU_PD + (MMU_RECURSIVE_SLOT<<21))
#define KADDR_MMU_PML4 (KADDR_MMU_PDPT + (MMU_RECURSIVE_SLOT<<12))
// Structures for given address, for example
// uint64_t* pt = MMU_PT(addr)
// uint64_t physical_addr = pt[MMU_PT_INDEX(addr)];
#define MMU_PML4(addr) ((uint64_t*) KADDR_MMU_PML4 )
#define MMU_PDPT(addr) ((uint64_t*)( KADDR_MMU_PDPT + (((addr)>>27) & 0x00001FF000) ))
#define MMU_PD(addr) ((uint64_t*)( KADDR_MMU_PD + (((addr)>>18) & 0x003FFFF000) ))
#define MMU_PT(addr) ((uint64_t*)( KADDR_MMU_PT + (((addr)>>9) & 0x7FFFFFF000) ))
Code: Select all
// For initializing, just add
// Install recursive page directory
k_PML4[MMU_RECURSIVE_SLOT] = KADDR_PMA(k_PML4) +3;
For cloning page directory when creating new process:
// -------------------------------------------------
MMU_PADDR MMU_clonepagedir(void) {
uint64_t* clone = (uint64_t*)KADDR_CLONEPD;
MMU_PADDR paddr = MMU_alloc();
if ( paddr == 0 ) return 0;
MMU_mmap ( clone, paddr, 4096, MMU_MMAP_MAPPHY );
memset ( clone, 0, 256*8 );
memcpy ( &clone[256], &k_PML4[256], 256*8 );
// Recursive
clone[MMU_RECURSIVE_SLOT] = paddr +3;
MMU_munmap ( clone, 4096, MMU_MUNMAP_NORELEASE );
return paddr;
}
- xenos
- Member
- Posts: 1118
- Joined: Thu Aug 11, 2005 11:00 pm
- Libera.chat IRC: xenos1984
- Location: Tartu, Estonia
- Contact:
Re: Page table addressing in 64-bit mode
Your idea looks reasonable to me (and in fact it seems to be identical to my own kernel's page table mapping in 64 bit mode). If you simply enter the physical address of the PML4T into the last entry of the PML4T (as bluemoon wrote above), i.e., your PML4T is the last PDP, you end up with the following layout at the top of linear memory:
So the addresses you calculated are correct. Be aware that this does not only consume 1GB, but in fact 512GB since all 4 paging levels are mapped to the top of linear memory, including the page tables.
Code: Select all
0xfffffffffffff000 - PML4
0xffffffffffe00000 - PDPs
0xffffffffc0000000 - page directories
0xffffff8000000000 - page tables
Re: Page table addressing in 64-bit mode
Since there is so much more virtual memory than physical memory, why bother with the recursive memory 'trick'. Why not simply map all physical memory into the kernel and have all of it accessible all the time ? You can then have a simple phys2kvirt and kvirt2phys macros like we used to do when 32bit OSs had less than 1GB of physical RAM.
If a trainstation is where trains stop, what is a workstation ?
- Owen
- Member
- Posts: 1700
- Joined: Fri Jun 13, 2008 3:21 pm
- Location: Cambridge, United Kingdom
- Contact:
Re: Page table addressing in 64-bit mode
Using the top slot in the PML4 will force you to compile your kernel with -mcmodel=large, which has a non-negligible efficiency decrease relative to -mcmodel=kernel. I generally would suggest picking a different PML4E for your fractal map.
Presently shipping AMD64 CPUs have a 48-bit physical address space. This has obvious issues fitting around other items in your 48-bit virtual address space.gerryg400 wrote:Since there is so much more virtual memory than physical memory, why bother with the recursive memory 'trick'. Why not simply map all physical memory into the kernel and have all of it accessible all the time ? You can then have a simple phys2kvirt and kvirt2phys macros like we used to do when 32bit OSs had less than 1GB of physical RAM.
Re: Page table addressing in 64-bit mode
You're correct. Currently my MM only supports 64TB of physical memory. I guess when that becomes a limitation I'll need to look at the recursive/fractal thing.
If a trainstation is where trains stop, what is a workstation ?
Re: Page table addressing in 64-bit mode
Owen wrote:Using the top slot in the PML4 will force you to compile your kernel with -mcmodel=large, which has a non-negligible efficiency decrease relative to -mcmodel=kernel. I generally would suggest picking a different PML4E for your fractal map.
I should have added that I (was thinking to) use -mcmodel=kernel, so the kernel lives at -2GB ( 0xffffffff:80000000 ). I did not realize I would automatically be mapping all 4 levels; this may indeed make it necessary to pick another slot than the top one for mapping PML4 recursively, else the page tables contain random mappings ( i.e. kernel code bytes )?XenOS wrote:Be aware that this does not only consume 1GB, but in fact 512GB since all 4 paging levels are mapped to the top of linear memory, including the page tables.
Last edited by jbemmel on Fri Jul 06, 2012 9:46 am, edited 1 time in total.
Re: Page table addressing in 64-bit mode
If you use mcmodel=kernel, you already using slot[511], you cannot assign two value into same slot simultaneously; in my example above I used 510 instead.
ps. fix your quote, I didn't said that.
ps. fix your quote, I didn't said that.
Re: Page table addressing in 64-bit mode
[off topic] Nice feature, this quoting You said what??!?[/off topic]jbemmel wrote:ps. fix your quote, I didn't said that.
Re: Page table addressing in 64-bit mode
There's also the option of using -mcmodel=small with -fPIC. I don't know what kind of overhead it would add relative to kernel memory model, but I have to imagine it's much less than that of the large memory model.Owen wrote:Using the top slot in the PML4 will force you to compile your kernel with -mcmodel=large, which has a non-negligible efficiency decrease relative to -mcmodel=kernel. I generally would suggest picking a different PML4E for your fractal map.
Reserved for OEM use.
- Owen
- Member
- Posts: 1700
- Joined: Fri Jun 13, 2008 3:21 pm
- Location: Cambridge, United Kingdom
- Contact:
Re: Page table addressing in 64-bit mode
-mcmodel=small is like -mcmodel=kernel, except in the region 0 to +2GB.Cognition wrote:There's also the option of using -mcmodel=small with -fPIC. I don't know what kind of overhead it would add relative to kernel memory model, but I have to imagine it's much less than that of the large memory model.Owen wrote:Using the top slot in the PML4 will force you to compile your kernel with -mcmodel=large, which has a non-negligible efficiency decrease relative to -mcmodel=kernel. I generally would suggest picking a different PML4E for your fractal map.
Re: Page table addressing in 64-bit mode
I don't quite fully understand -mcmodel, but what I thought is this:Owen wrote:Using the top slot in the PML4 will force you to compile your kernel with -mcmodel=large, which has a non-negligible efficiency decrease relative to -mcmodel=kernel. I generally would suggest picking a different PML4E for your fractal map.
Regardless of the -mcmodel used, when your kernel (code, data and bss) is smaller than 2 GB, it surely always can use IP-relative addressing and 32-bit offsets for all defined symbols, right? And you can explicitly cast any 64-bit address into a pointer, so you can access your recursively mapped page tables regardless of the mcmodel and regardless of where they are placed in memory. So only for undefined or relocated symbols does the mcmodel matter. Or am I wrong?
Re: Page table addressing in 64-bit mode
mcmodel tell the compiler your range of pointer address.
For x86_64 the address can be sign extended, so putting -1GB into a pointer means the top 1GB.
Now for different mcmodel, this usually matter when using indirect addressing (ie. mov rdi, my_var or mov eax, [my_var])
1. small - 32-bit instruction is used (ie. mov edi, my_var or mov eax, [my_var]) , generate linker error if an address can't fit by relocation.
2. kernel - 32-bit instruction is used (ie. mov edi, my_var or mov eax, [my_var]) , generate linker error if an address can't fit by relocation. This also depend on the fact that address is sign-extended.
3. large - 64-bit instruction is used (ie. mov rdi, my_var then mov eax, [rdi]), since you can't put 64-bit opland on some instructions. This has overhead compared to 32-bit one.
4. medium - I don't care much on it :p
PIC is another thing, which access is relative to RIP, however the range limit still applies.
It also worth noting that you can access the full virtual space in any model with direct address (ie. int*p = 0xFFFFFFFFFF000000; *p = 0;)
It just affect some kind of code generation that limit the value (object address) patch by the linker.
For x86_64 the address can be sign extended, so putting -1GB into a pointer means the top 1GB.
Now for different mcmodel, this usually matter when using indirect addressing (ie. mov rdi, my_var or mov eax, [my_var])
1. small - 32-bit instruction is used (ie. mov edi, my_var or mov eax, [my_var]) , generate linker error if an address can't fit by relocation.
2. kernel - 32-bit instruction is used (ie. mov edi, my_var or mov eax, [my_var]) , generate linker error if an address can't fit by relocation. This also depend on the fact that address is sign-extended.
3. large - 64-bit instruction is used (ie. mov rdi, my_var then mov eax, [rdi]), since you can't put 64-bit opland on some instructions. This has overhead compared to 32-bit one.
4. medium - I don't care much on it :p
PIC is another thing, which access is relative to RIP, however the range limit still applies.
It also worth noting that you can access the full virtual space in any model with direct address (ie. int*p = 0xFFFFFFFFFF000000; *p = 0;)
It just affect some kind of code generation that limit the value (object address) patch by the linker.