Page 1 of 2

What is the correct structure of 1 GiB pages?

Posted: Sun Dec 27, 2020 6:47 pm
by peachsmith
So I'm trying to set up my own preliminary paging in a 64-bit UEFI application, but according to QEMU's output, I appear to be getting a page fault when I set CR3.
I'm trying to identity map the first 512 GiB with 1 GiB pages just to get my own paging started, since it requires less paging structures.
I plan to move on to 2 MiB and 4 KiB pages once I thoroughly understand 1 GiB pages.

I could have sworn I built the PML4 and PDPT correctly by following the table 4-14 and 4-15 from the Intel manual.
Is there some bit I'm forgetting to set in my paging structure entries?
Am I not setting the addresses in the correct bits in each entry?

I was pretty darn sure I was setting each bit of the paging structure entries correctly.

Here's the logical flow:
  1. 1. Call ExitBootServices
  • 2. Disable Interrupts
  • 3. Load GDT
  • 4. Load IDT with ISRs
  • 5. Enable Paging
  • 6. Enable Interrupts

Here's the state of CR4 just before I attempt to enable paging:

Code: Select all

CR4.PAE:   1
CR4.PCIDE: 0
CR4.SMEP:  0
CR4.SMAP:  0
CR4.PKE:   0

My paging structures:

Code: Select all

// PML4
uint64_t pml4[512] __attribute__((aligned(0x1000)));

// PDPT
uint64_t pdpt[512] __attribute__((aligned(0x1000)));
My functions to create PML4 and PDPT entries:

Code: Select all

/**
 * Creates a PML4 entry to hold the address of a PDPT.
 * The resulting PML4 entry is marked as present, read/write,
 * and supervisor mode.
 * Page-level write through and page-level cache disable bits
 * are left at 0.
 *
 *
 * Params:
 *   uint64_t - the address of a PDPT
 *
 * Returns:
 *   pml4e - a PML4 entry
 */
pml4e make_pml4e(uint64_t pdpt_addr)
{
  pml4e p = 0;
  uint64_t bit_mask = 0x1; // single bit mask

  // Bit 0 is the present flag.
  // Mark the page as present.
  p |= (bit_mask << 0);

  // Bit 1 is the read/write flag.
  // Set this entry to be read/write.
  p |= (bit_mask << 1);

  // Bits [51:12] are the address of the PDPT entry.
  p |= (pdpt_addr << 12);

  return p;
}


/**
 * Creates a PDPT entry.
 * The resulting PDPT entry is marked as present, read/write,
 * and supervisor mode.
 * Page-level write through and page-level cache disable bits
 * are left at 0.
 * The PAT and global translation bits are left at 0.
 * The PKE bits [62:59] are left as 0 since CR4.PKE is 0.
 *
 * Params:
 *   uint64_t - base address of 1 Gib page
 *
 * Returns:
 *   pdpte - a PDPT entry
 */
pdpte make_pdpte(uint64_t page_base)
{
  pdpte p = 0;
  uint64_t bit_mask = 0x1; // single bit mask

  // Bit 0 is the present flag.
  // Mark the page as present.
  p |= (bit_mask << 0);

  // Bit 1 is the read/write flag.
  // Set this entry to be read/write.
  p |= (bit_mask << 1);

  // Bit 7 is the page size bit and must be set
  // in order for the PDPT entry to point to a 1 GiB page.
  p |= (bit_mask << 7);

  // Bits [51:30] are the base address of a 1 GiB page frame.
  p |= (page_base);

  return p;
}

And my function to actually populate the PML4 and PDPT and set CR3:

Code: Select all

void k_paging_init()
{
  // Identity map 512 GiB of address space in the PDPT.
  uint64_t phys = 0;
  for (uint64_t i = 0; i < 512; i++)
  {
    pdpt[i] = make_pdpte(i * 0x40000000);
  }

  // Put the PDPT in the PML4.
  pml4[0] = make_pml4e((uint64_t)(pdpt));

  // Mark the rest of the entries in the PML4 as not present.
  for (int i = 1; i < 512; i++)
  {
    pml4[i] = pml4[0] & ~((uint64_t)1);
  }

  // Create a new CR3 value.
  uint64_t cr3 = (((uint64_t)(pml4)) << 12);

  // Put the new CR3 value in CR3.
  k_set_cr3(cr3);
}

Re: What is the correct structure of 1 GiB pages?

Posted: Sun Dec 27, 2020 7:27 pm
by Octocontrabass

Code: Select all

  // Bits [51:12] are the address of the PDPT entry.
  p |= (pdpt_addr << 12);
Bits 51-12 of the PML4E are bits 51-12 of the address of the PDPT, so you shouldn't be shifting the address.

Code: Select all

  // Create a new CR3 value.
  uint64_t cr3 = (((uint64_t)(pml4)) << 12);
It's the same for CR3, you shouldn't be shifting the address.

You should also check CPUID for 1 GiB page support.

When the MTRRs indicate more than one memory type for a large page, the behavior is undefined. Do not use large pages unless the MTRRs will set the same memory type across the entire page.

Re: What is the correct structure of 1 GiB pages?

Posted: Sun Dec 27, 2020 9:43 pm
by peachsmith
Well this is awkward.
Removing the bit shifting did the trick and allowed me to set CR3 without causing a page fault, but when I do CPUID.80000001H:EDX.Page1GB, the Page1GB bit is not set.
Using 1 GiB pages didn't appear to cause any problems (I was able to handle a divide error with my ISR), but I guess that's purely coincidental since 1 GiB pages aren't officially supported for my CPU.

Can just assume that 2 MiB pages are usable since I'm in 64-bit mode?
And if I end up having to allocate RAM from the UEFI memory map to hold my paging structures, can I assume that the map entries are page aligned (4096)?

For what it's worth, this is what's in my IA32_PAT:

Code: Select all

PA0: WB
PA1: WT
PA2: UC-
PA3: UC
PA4: WB
PA5: WT
PA6: UC-
PA7: UC

Re: What is the correct structure of 1 GiB pages?

Posted: Mon Dec 28, 2020 12:50 am
by Octocontrabass
peachsmith wrote:when I do CPUID.80000001H:EDX.Page1GB, the Page1GB bit is not set.
Using 1 GiB pages didn't appear to cause any problems
Are you using hardware virtualization to emulate a CPU without 1GiB page support on a CPU with 1GiB page support? I wouldn't be surprised if QEMU only masks the CPUID bit and doesn't trap attempts to use 1GiB pages.
peachsmith wrote:Can just assume that 2 MiB pages are usable since I'm in 64-bit mode?
Yes.
peachsmith wrote:And if I end up having to allocate RAM from the UEFI memory map to hold my paging structures, can I assume that the map entries are page aligned (4096)?
The UEFI specification says yes. I say add some error handling to gracefully halt in case the firmware isn't following the spec.
peachsmith wrote:For what it's worth, this is what's in my IA32_PAT:
Yep, those are the defaults. The undefined behavior is when the MTRRs specify two (or more) different types for parts of a single large page.

Re: What is the correct structure of 1 GiB pages?

Posted: Mon Dec 28, 2020 10:24 am
by peachsmith
Octocontrabass wrote:Are you using hardware virtualization to emulate a CPU without 1GiB page support on a CPU with 1GiB page support?
Ah, I didn't think about that. I'm running in QEMU in VMware on an Intel Core i7 8700.

Looks like I'll need to implement some interfaces for control registers and MSRs to make my life easier.

Re: What is the correct structure of 1 GiB pages?

Posted: Mon Dec 28, 2020 9:56 pm
by peachsmith
Alrighty, I implemented a MTRR interface to read the values from the fixed and variable range MTRRs, and this is what I came up with.
I'm not quite clear on how to determine if a range has multiple types.
Should I check to see if two ranges of variable range MTRRs overlap?
Do I need to verify that all of the types in a single fixed range MTRR are the same?

My Fixed Range MTRRs

Code: Select all

IA32_MTRR_FIX64K_00000: WB, WB, WB, WB, WB, WB, WB, WB
IA32_MTRR_FIX16K_80000: WB, WB, WB, WB, WB, WB, WB, WB
IA32_MTRR_FIX16K_A0000: UC, UC, UC, UC, UC, UC, UC, UC
IA32_MTRR_FIX4K_C0000:  UC, UC, UC, UC, UC, UC, UC, UC
IA32_MTRR_FIX4K_C8000:  UC, UC, UC, UC, UC, UC, UC, UC
IA32_MTRR_FIX4K_D0000:  UC, UC, UC, UC, UC, UC, UC, UC
IA32_MTRR_FIX4K_D8000:  UC, UC, UC, UC, UC, UC, UC, UC
IA32_MTRR_FIX4K_E0000:  UC, UC, UC, UC, UC, UC, UC, UC
IA32_MTRR_FIX4K_E8000:  UC, UC, UC, UC, UC, UC, UC, UC
IA32_MTRR_FIX4K_F0000:  UC, UC, UC, UC, UC, UC, UC, UC
IA32_MTRR_FIX4K_F8000:  UC, UC, UC, UC, UC, UC, UC, UC
My Variable Range MTRRs
My IA32_MTRRCAP.VCNT is 8

Code: Select all

IA32_MTRR_PHYSBASE0         80000000 UC
IA32_MTRR_PHYSMASK0       FF80000000 VALID
IA32_MTRR_PHYSBASE1        800000000 UC
IA32_MTRR_PHYSMASK1       F800000000 VALID
IA32_MTRR_PHYSBASE2                0 UC
IA32_MTRR_PHYSMASK2                0 INVALID
IA32_MTRR_PHYSBASE3                0 UC
IA32_MTRR_PHYSMASK3                0 INVALID
IA32_MTRR_PHYSBASE4                0 UC
IA32_MTRR_PHYSMASK4                0 INVALID
IA32_MTRR_PHYSBASE5                0 UC
IA32_MTRR_PHYSMASK5                0 INVALID
IA32_MTRR_PHYSBASE6                0 UC
IA32_MTRR_PHYSMASK6                0 INVALID
IA32_MTRR_PHYSBASE7                0 UC
IA32_MTRR_PHYSMASK7                0 INVALID

Re: What is the correct structure of 1 GiB pages?

Posted: Mon Dec 28, 2020 11:12 pm
by Octocontrabass
peachsmith wrote:I'm not quite clear on how to determine if a range has multiple types.
Determine the type for each 4kiB block of memory. Compare all of the 4kiB blocks within a large page. If they're all the same type, then you can use the large page. (MTRRs specify memory type with 4kiB granularity.)
peachsmith wrote:Should I check to see if two ranges of variable range MTRRs overlap?
Yes. Two or more variable range MTRRs can overlap, and the overlapping area is described by a combination of all of them.
peachsmith wrote:Do I need to verify that all of the types in a single fixed range MTRR are the same?
No, you need to verify that all of the types in all of the fixed range MTRRs are the same. They describe a 1MiB block of memory, and the smallest "large" page is 2MiB, so they must all be the same type for the entire 2MiB range to be the same type. Check the variable range MTRRs for the other 1MiB.
peachsmith wrote:My Variable Range MTRRs
Don't forget IA32_MTRR_DEF_TYPE.

Re: What is the correct structure of 1 GiB pages?

Posted: Tue Dec 29, 2020 10:17 am
by peachsmith
Octocontrabass wrote:Don't forget IA32_MTRR_DEF_TYPE.
After reading IA32_MTRR_DEF_TYPE

Code: Select all

Default Type:   WB
Fixed Enabled:  Y
MTRR Enabled:   Y
So since the PAT is available on Pentium III (released in 1999) and later, if I wanted to be super lazy about it, could I just disable MTRRs and use the PAT so that any individual page I map has the same memory type regardless of granularity?

Re: What is the correct structure of 1 GiB pages?

Posted: Tue Dec 29, 2020 3:04 pm
by Octocontrabass
Unfortunately, no. Disabling the MTRRs sets all memory to UC, and you can't use the PAT to override it.

Re: What is the correct structure of 1 GiB pages?

Posted: Tue Dec 29, 2020 3:32 pm
by peachsmith
Well can I set all the MTRRs to be one type of memory?

Re: What is the correct structure of 1 GiB pages?

Posted: Tue Dec 29, 2020 8:23 pm
by Octocontrabass
You can, but it's not a good idea: the firmware's SMM code might depend on the MTRRs being set up a particular way.

Re: What is the correct structure of 1 GiB pages?

Posted: Wed Dec 30, 2020 10:07 am
by peachsmith
Octocontrabass wrote:the firmware's SMM code might depend on the MTRRs being set up a particular way
Ok, so what if I left the fixed MTRRs alone and mapped their addresses with 4KiB pages (since the fixed MTRRs only map the first MiB of physical address space).
Could I then get away with setting all of the variable MTRRs to the same type? It seems like the only downside would be performance. Why should that affect functionality?

Re: What is the correct structure of 1 GiB pages?

Posted: Wed Dec 30, 2020 1:41 pm
by Octocontrabass
peachsmith wrote:Could I then get away with setting all of the variable MTRRs to the same type?
If you mean using the variable MTRRs to set all addresses above 1MiB to the same type, no, that won't work. If you mean using the variable MTRRs to set all memory above 1MiB to the same type, the firmware should have already done that for you, but memory isn't guaranteed to evenly divide into large pages, and occasionally firmware is stupid.
peachsmith wrote:It seems like the only downside would be performance. Why should that affect functionality?
I know of at least two reasons: SMM MMIO and SMM cache poisoning. With SMM MMIO, the problem should be pretty apparent: when the SMM handler accesses MMIO, all of the reads and writes hit the cache instead of the device it's trying to access. SMM cache poisoning is more subtle: SMRAM is supposed to be accessible only while in SMM, but the protection is implemented outside the cache, so the cache may hold stale values from ordinary RAM before entry to SMM (and probably crash the SMM handler), or the cache may hold stale values from SMM after exiting the SMM handler (and probably crash your program).

Re: What is the correct structure of 1 GiB pages?

Posted: Wed Dec 30, 2020 2:06 pm
by peachsmith
Bummer. So the only way to avoid having to check every fixed and variable MTRR for every address range I want to map is to just use 4KiB pages?
I feel like this MTRR restriction defeats the purpose of having large pages in the first place.
And it seems like such a contrived restriction.

Re: What is the correct structure of 1 GiB pages?

Posted: Wed Dec 30, 2020 2:36 pm
by Octocontrabass
peachsmith wrote:Bummer. So the only way to avoid having to check every fixed and variable MTRR for every address range I want to map is to just use 4KiB pages?
Yes, but you only need to check once if you never modify the MTRRs. For example, if you ensure all RAM is configured as WB, then you only need to avoid large pages that would cross the boundaries of usable RAM.
peachsmith wrote:I feel like this MTRR restriction defeats the purpose of having large pages in the first place.
Large pages are intended to reduce the number of page table walks and TLB entries required for software accessing a lot of data at the same time. While it would be convenient to use them to map everything, they were never intended for that purpose.
peachsmith wrote:And it seems like such a contrived restriction.
It's pretty reasonable when you consider the cost of hardware that would allow it. Many CPUs track the memory type in the TLB entry. Large page TLB entries have no way to indicate that the page crosses the boundary between two different memory types. Adding support would mean either increasing the complexity of large page TLB entries or splitting large pages that cross memory type boundaries into multiple smaller TLB entries.