Hi,
CRoemheld wrote:What is the idea behind sometimes using the 510th rather than the 511th entry in the PML4 table for this purpose?
Lots of instructions have "immediate data" - data that is built into the instruction itself, like the number 0x12345678 in the instruction "mov eax,[0x12345678 + ebx]". For 64-bit code, almost all immediate data is limited to 32 bits, because having 64 bits of immediate data in instructions makes the instructions huge (and makes it harder/slower for CPU to decode instructions).
Because of this, for "addresses known at compile time" it's more efficient to use 32-bit addresses. For example, something like "mov rax,[0x12345678 + rbx]" is fine because the immediate data fits in 32 bits, but "mov rax,[0x123456789ABCDEF + rbx]" is not supported and would have to be split into a 2 instructions (e.g. maybe "mov rax,0x123456789ABCDEF" and then "mov rax,[rax + rbx]").
This means that it's best to have your code in the first 2 GiB or the first 4 GiB of the virtual address space (where it can use "unsigned 32-bit" immediate data for addresses known at compile time); or in the last 2 GiB of the virtual address space (where it can use "signed 32-bit" immediate data). For this reason; typically user-space processes use the first 2 GiB or 4 GiB of the virtual address space and the kernel uses the last 2 GiB of the virtual address space. To cope with this most compilers are designed to support different memory models for 64-bit code - e.g. a normal memory model (where everything is in the first 2 GiB of the virtual address space), a large memory model (where the code is too large to use 32-bit addresses for addresses known at compile time), and a kernel memory model (where everything is in the last 2 GiB of the virtual address space).
If the kernel's code and data uses the last 2 GiB of the virtual address space (for better efficiency); then you can't use 511th entry in the PML4 table for the recursive mapping trick because it'd effect the area used for kernel's code and data.
Cheers,
Brendan