Page 1 of 2

Cant for the life of me fix paging.

Posted: Fri Mar 29, 2024 9:32 pm
by maxtyson123
So I've been implementing paging in my os: https://github.com/maxtyson123/MaxOS (see memory management branch)

However, I'm getting a page fault when trying to read the spurious vector from the MEMIO of the local apic.
Note, I'm using 2MB pages with PAE enabled, 64bit OS

What I, ve gathered:

- APIC base address has been mapped from its lower half address of 0xFEE00000 to the higher half where the kernel is.
- The resultant page (in memory) is as follows:

Code: Select all

present = {uint64_t} 1 [0x1]
write = {uint64_t} 1 [0x1]
user = {uint64_t} 0 [0x0]
write_through = {uint64_t} 0 [0x0]
cache_disabled = {uint64_t} 0 [0x0]
accessed = {uint64_t} 0 [0x0]
dirty = {uint64_t} 0 [0x0]
huge_page = {uint64_t} 0 [0x0]
global = {uint64_t} 0 [0x0]
available = {uint64_t} 0 [0x0]
physical_address = {uint64_t} 1043968 [0xfee00]
- This page has the physical address of 0x1200dc1 (which according to my memory map is a valid and previously unused region)
- When this page is being created a new pml3 is created at index 511 and pml2 at index 509
- The CR3 register shows that the pml4 root address is 0x00000000001bc000 and the higher half address my kernel uses is 0xffffffff801bc000
- Memory view shows that they are mapped correctly (their first entries are both of value: 23 D0 1B 00 00 00 00 00)
- I invalidate the TLB before returning so cache shouldn't be a problem
- When trying to read from this mapped value interrupt 14 is triggered with error code 0x0 (decoded to nonpresent page)
- Registered at the time of error:

Code: Select all

rax = 0xffffffff7ee000f0 [-2166357776]
rbx = 0x0000000000281ce0 [2628832]
rcx = 0x0000000000000010 [16]
rdx = 0xffffffff7ee00000 [-2166358016]
rsi = 0x00000000000000f0 [240]
rdi = 0xffffffff801c0180 [-2145648256]
r8 = 0xffffffff7ee00000 [-2166358016]
r9 = 0xffffffff7ee00000 [-2166358016]
r10 = 0x0000000000000000 [0]
r11 = 0x0000000000000000 [0]
r12 = 0x0000000000000000 [0]
r13 = 0x0000000000000000 [0]
r14 = 0x0000000000000000 [0]
r15 = 0x0000000000000000 [0]
rip = 0xffffffff80114500 [0xffffffff80114500 <MaxOS::hardwarecommunication::InterruptManager::HandleInterrupt(MaxOS::system::cpu_status_t*)+12>]
rsp = 0xffffffff801c0160 [0xffffffff801c0160]
rbp = 0xffffffff801c0170 [0xffffffff801c0170]
eflags = 0x00200086 [ID IOPL=0 SF PF]
cs = 0x00000008 [8]
ds = 0x00000010 [16]
es = 0x00000010 [16]
ss = 0x00000010 [16]
fs = 0x00000010 [16]
gs = 0x00000010 [16]
fs_base = 0x0000000000000000 [0]
gs_base = 0x0000000000000000 [0]
k_gs_base = 0x0000000000000000 [0]
cr0 = 0x0000000080010011 [PG WP ET PE]
cr2 = 0xffffffff7ee000f0 [-2166357776]
cr3 = 0x00000000001bc000 [PDBR=444 PCID=0]
cr4 = 0x0000000000000020 [PAE]
cr8 = 0x0000000000000000 [0]
efer = 0x0000000000000500 [LMA LME]
So I come to the conclusion that my page mapping is not correct, however, I cant figure out where I went wrong and have spent a couple of months on and off debugging every aspect I can think of, any help would be appreciated.

What Ive Tried:
- Identity mapping 0xFEE00000 and reading that
- Using the lower half addresses for the PML4 root (0x00000000001bc000)
- Loading a new PML4 stored in the PhysicalMemoryManager class during initialisation
- Mapping other addresses (didn't have a way to test if worked)
- Printing tlb info before and after mapping in qemu monitor (no change, this is how I came to my conclusion)

Re: Cant for the life of me fix paging.

Posted: Fri Mar 29, 2024 10:02 pm
by Octocontrabass
maxtyson123 wrote:Note, I'm using 2MB pages

Code: Select all

huge_page = {uint64_t} 0 [0x0]
You're not using 2MB pages.
maxtyson123 wrote:- This page has the physical address of 0x1200dc1
Are you sure? That address isn't aligned correctly.

Re: Cant for the life of me fix paging.

Posted: Sun Mar 31, 2024 4:10 pm
by maxtyson123
Ahh yes I’ve forgotten to pass that flag, must have missed that when I was doing a bit of refactoring.

The other part is page aligned is it not? The page table for that address has the base address of 0x1200000 and is just offset due to its index (making it 0x1200dc1)

Re: Cant for the life of me fix paging.

Posted: Sun Mar 31, 2024 4:40 pm
by Octocontrabass
maxtyson123 wrote:The other part is page aligned is it not? The page table for that address has the base address of 0x1200000 and is just offset due to its index (making it 0x1200dc1)
An offset of 0xDC1 is impossible because the offset added by the index must also be aligned. Something is still wrong.

Re: Cant for the life of me fix paging.

Posted: Mon Apr 01, 2024 3:07 am
by maxtyson123
Could you please explain this in depth a bit for me? I understand that the page table has to be aligned to the page, but I thought to get to that entry it would just be that pags index index into the that table?

Like this:

Code: Select all

pte_t* pte = &pml2->entries[PML2_GET_INDEX(virtual_address)];


Thanks

Re: Cant for the life of me fix paging.

Posted: Mon Apr 01, 2024 10:38 am
by Octocontrabass
maxtyson123 wrote:I understand that the page table has to be aligned to the page, but I thought to get to that entry it would just be that pags index index into the that table?
Each entry is eight bytes, so the address of the entry will always be aligned to eight bytes. Since you're getting an address that isn't aligned to eight bytes, something is wrong.

Re: Cant for the life of me fix paging.

Posted: Mon Apr 01, 2024 8:33 pm
by maxtyson123
Ah, I've found what was causing that I was saying the physical address was 40 bits instead of 52:

Code: Select all

        typedef struct PageTableEntry {
          uint64_t present : 1;
          uint64_t write : 1;
          uint64_t user : 1;
          uint64_t write_through : 1;
          uint64_t cache_disabled : 1;
          uint64_t accessed : 1;
          uint64_t dirty : 1;
          uint64_t huge_page : 1;
          uint64_t global : 1;
          uint64_t available : 3;
          uint64_t physical_address : 52;  <--- HERE
        } __attribute__((packed)) pte_t;
The new address is now 0x1000FB8, however the code still causes a page fault?

Re: Cant for the life of me fix paging.

Posted: Mon Apr 01, 2024 9:30 pm
by Octocontrabass
maxtyson123 wrote:Ah, I've found what was causing that I was saying the physical address was 40 bits instead of 52:
No, that was correct, page table entries only contain (up to) 40 bits of the physical address. The remaining 12 bits belong to additional fields that are missing from your struct.
maxtyson123 wrote:however the code still causes a page fault?
Did anything change, or is it still a fault caused by a non-present page trying to access the newly-mapped address?

Re: Cant for the life of me fix paging.

Posted: Mon Apr 01, 2024 10:22 pm
by maxtyson123
Ahh yes the error code has changed to be 0x8 (0b1000 ), meaning it is now present and failed on a kernel read operation correct?
Why would this cause an error as the page is set to be write able?

Code: Select all

pte = {MaxOS::memory::pte_t *} 0x1000fb8 
---------------------------------------------------
present = {uint64_t} 1 [0x1]
 write = {uint64_t} 1 [0x1]
 user = {uint64_t} 0 [0x0]
 write_through = {uint64_t} 0 [0x0]
 cache_disabled = {uint64_t} 0 [0x0]
 accessed = {uint64_t} 0 [0x0]
 dirty = {uint64_t} 0 [0x0]
 huge_page = {uint64_t} 1 [0x1]
 global = {uint64_t} 0 [0x0]
 available = {uint64_t} 0 [0x0]
 physical_address = {uint64_t} 1043968 [0xfee00]
Thank you for all your help by the way :)

Re: Cant for the life of me fix paging.

Posted: Mon Apr 01, 2024 10:48 pm
by nullplan
No, error code 8 means you set a reserved bit somewhere. That last PTE looks good, so the problem may be in one of the PTEs leading up to it.

Re: Cant for the life of me fix paging.

Posted: Tue Apr 02, 2024 2:56 am
by maxtyson123
I now clear the reserved bits:

Code: Select all

void clearBits(uint64_t *num, int start, int end) {
  // Create a mask with 1s from start to end and 0s elsewhere
  uint64_t mask = (~0ULL << start) ^ (~0ULL << (end + 1));

  // Apply the mask to the number to clear the desired bits
  *num &= ~mask;
}

pte_t PhysicalMemoryManager::create_page_table_entry(uintptr_t address, size_t flags) {

  pte_t page =  (pte_t){
    .present = 1,
    .write = (flags & WriteBit) != 0,
    .user = (flags & UserBit) != 0,
    .write_through = (flags & (1 << 7)) != 0,
    .cache_disabled = 0,
    .accessed = 0,
    .dirty = 0,
    .huge_page = 1,
    .global = 0,
    .available = 0,
    .physical_address = (uint64_t)address >> 12
  };

  // Clear the reserved bits
  clearBits((uint64_t*)&page, 40, 51);

  return page;
}
And am getting error code 0x9?

Re: Cant for the life of me fix paging.

Posted: Tue Apr 02, 2024 9:34 am
by Octocontrabass
maxtyson123 wrote:I now clear the reserved bits:
In the last level entry, yes, but what about the higher levels of the page tables? And anyway, it would be much easier to clear the entire entry before setting the bits you want to set instead of setting things first and clearing the reserved bits afterwards.
maxtyson123 wrote:

Code: Select all

(uint64_t*)&page
This smells like undefined behavior...
maxtyson123 wrote:And am getting error code 0x9?
That still means there's a reserved bit somewhere in a page table entry that would be used to translate the address in CR2.

Re: Cant for the life of me fix paging.

Posted: Tue Apr 02, 2024 2:12 pm
by nullplan
Octocontrabass wrote:
maxtyson123 wrote:

Code: Select all

(uint64_t*)&page
This smells like undefined behavior...
It is according to the strict aliasing rule. In C11 (that is, n1570.pdf), that is defined in §6.5P7 (to make it short, it is undefined behavior to be accessing a pte_t as a uint64_t). I especially don't get the intent in this case. I mean you already have your nice bitfield and then aren't going to use it? Why? In my OS, I don't use bitfields, and as a rule avoid them wherever possible. And for paging, do note that each page table entry is literally just the physical address of the next layer or the actual page, with some flag bits ORed in.

Anyway, according to the AMD manuals, in a 2MB page translation, the reserved bits are bits 7 and 8 in the PML5E and PML4E, nothing in the PDPE, and bits 13-20 in the PDE. The bits you are clearing are part of the physical address, and there is nothing reserved here. That is, the manual doesn't say what happens when you set unimplemented bits there, maybe the processor treats them as reserved bits, too.

Re: Cant for the life of me fix paging.

Posted: Tue Apr 02, 2024 2:57 pm
by Octocontrabass
nullplan wrote:That is, the manual doesn't say what happens when you set unimplemented bits there, maybe the processor treats them as reserved bits, too.
AMD's manuals don't say, but Intel's manuals are very clear: those bits are reserved and must not be set.

In this case, we know the addresses won't have any reserved bits set unless they're completely wrong, so there's no need to worry about figuring out how many bits are reserved and masking them out.

Re: Cant for the life of me fix paging.

Posted: Tue Apr 02, 2024 3:05 pm
by nullplan
Octocontrabass wrote:AMD's manuals don't say, but Intel's manuals are very clear: those bits are reserved and must not be set.
That part is in the AMD manuals too. It just doesn't say what happens if you set them anyway.