Page 2 of 3

Re: switching to new pagemap causes 0xe and more exceptions

Posted: Tue Oct 29, 2024 8:46 pm
by MichaelPetch
I do get a page fault with e=000a and my CR2=ffff800100400000. Expanding on my previous comment - e=000a is also telling us that the page fault is because the page isn't present as well. This suggests there is something really wrong in an entry. In the QEMU monitor I did an `info tlb` . It is long but I have reduced it to a few entries of which 2 are relevant

Code: Select all

...
ffff8000ffc00000: 00000000ffc00000 --P-----W
ffff8000ffe00000: 00000000ffe00000 --P-----W
ffff800100400000: 0003800000033000 X-------W    <------- Messed up
ffff800100401000: 0003800000036000 X-------W    <------- Messed up
ffffffff80000000: 0000000002125000 ----A---W
ffffffff80001000: 0000000002126000 ----A---W
ffffffff80002000: 0000000002127000 ----A---W
...
You can see the physical addresses are wrong and the pages aren't marked present and the NX bit is set. Something has created those bogus entries. I would do an `info tlb` and look for the virtual address giving you a problem on your build (it might be different than mine) and find it in the TLB. .I don't have time this evening to see why, but I would use the QEMU monitor to look at your page structures in physical memory to see what level in the hierarchy the problems happened at and that might five a clue as to where in the code to look.

Re: switching to new pagemap causes 0xe and more exceptions

Posted: Tue Oct 29, 2024 9:32 pm
by RayanMargham
else if (page_table[idx] & PAGE2MB) {
uint64_t *guy = (uint64_t *)((uint64_t)pmm_alloc());
uint64_t old_phys = page_table[idx] & 0x000ffffffffff000;
uint64_t old_flags = page_table[idx] & ~0x000ffffffffff000;
for (int j = 0; j < 512; j++) {
guy[j] = (old_phys + j * 4096) | (old_flags & ~PAGE2MB);
}

is the inverted of PAGE2MB the problem?

Re: switching to new pagemap causes 0xe and more exceptions

Posted: Wed Oct 30, 2024 7:18 pm
by RayanMargham
anyone know???

Re: switching to new pagemap causes 0xe and more exceptions

Posted: Wed Oct 30, 2024 10:36 pm
by RayanMargham
I still dont understand, ive been looking with gdb as to why but i dont get the issue

:c

Re: switching to new pagemap causes 0xe and more exceptions

Posted: Thu Oct 31, 2024 2:17 am
by MichaelPetch
I haven't had time to revisit it yet. In the next couple days I'll be able to look if someone hasn't been able to find it before then.

Re: switching to new pagemap causes 0xe and more exceptions

Posted: Thu Oct 31, 2024 7:51 pm
by RayanMargham
okii :c

Re: switching to new pagemap causes 0xe and more exceptions

Posted: Fri Nov 01, 2024 3:38 pm
by RayanMargham
Just an update

still nothing found, im still trying to figure this out

Re: switching to new pagemap causes 0xe and more exceptions

Posted: Fri Nov 01, 2024 5:21 pm
by MichaelPetch
So previously I pointed out I was getting a page fault at CR2=0xffff800100400000 and that we had these entries in `info tlb`:

Code: Select all

ffff800100400000: 0003800000033000 X-------W    <------- Messed up
ffff800100401000: 0003800000036000 X-------W    <------- Messed up
We want to find the page table (or page directory) for virtual address 0xffff800100400000. This address breaks down to:

Code: Select all

PML4 IDX = 256 (0x100)
PML3 IDX =   4 (0x004)
PML2 IDX =   2 (0x002)
PML1 IDX =   0 (0x000)
When I follow each page level using the QEMU monitor starting at physical address 0x8000 (PML4) I end up at a PML1 (page table) that starts with these entries (the first 2 entries are pml1[0] and pm1l[1]):

Code: Select all

(qemu) xp/512g 0x35000
0000000000035000: 0xffff800000033003 0xffff800000036003
0000000000035010: 0x0000000000000000 0x0000000000000000
0000000000035020: 0x0000000000000000 0x0000000000000000
0000000000035030: 0x0000000000000000 0x0000000000000000
... the rest are 0x0000000000000000
Now look closely at those entries. Those are supposed to be physical addresses (of 4KiB page frames) but the upper 17 bits are all set! The lower bits do appear to be physical addresses though. I haven't looked at the code, but maybe you can review your code to find a place where the upper bits are being set (or not being cleared) when updating/setting the physical addresses of page table entries. I am basically providing this to you as a courtesy so that you can attempt to continue the bug hunt.

Re: switching to new pagemap causes 0xe and more exceptions

Posted: Fri Nov 01, 2024 5:26 pm
by RayanMargham
could it be my allocation code?

Code: Select all

uint64_t *find_pte_and_allocate(uint64_t *pt, uint64_t virt) {
  uint64_t shift = 48;
  for (int i = 0; i < 4; i++) {
    shift -= 9;
    uint64_t idx = (virt >> shift) & 0x1ff;
    uint64_t *page_table =
        (uint64_t *)((uint64_t)pt + hhdm_request.response->offset);
    if (i == 3) {
      return page_table + idx;
    }
    if (!(page_table[idx] & PRESENT)) {
      uint64_t *guy =
          (uint64_t *)((uint64_t)pmm_alloc() - hhdm_request.response->offset);
      page_table[idx] = (uint64_t)guy | PRESENT | RWALLOWED;
      pt = guy;
    } else if (page_table[idx] & PAGE2MB) {
      uint64_t *guy = (uint64_t *)((uint64_t)pmm_alloc());
      uint64_t old_phys = page_table[idx] & 0x000ffffffffff000;
      uint64_t old_flags = page_table[idx] & ~0x000ffffffffff000;
      for (int j = 0; j < 512; j++) {
        guy[j] = (old_phys + j * 4096) | (old_flags & ~PAGE2MB);
      }
      pt = (uint64_t *)((uint64_t)guy - hhdm_request.response->offset);
    } else {
      pt = (uint64_t *)(page_table[idx] & 0x000ffffffffff000);
    }
  }
  return 0;
}
uint64_t *find_pte_and_allocate2mb(uint64_t *pt, uint64_t virt) {
  uint64_t shift = 48;
  for (int i = 0; i < 4; i++) {
    shift -= 9;
    uint64_t idx = (virt >> shift) & 0x1ff;
    uint64_t *page_table =
        (uint64_t *)((uint64_t)pt + hhdm_request.response->offset);
    if (i == 2) {
      return page_table + idx;
    }
    if (!(page_table[idx] & PRESENT)) {
      uint64_t *guy =
          (uint64_t *)((uint64_t)pmm_alloc() - hhdm_request.response->offset);
      page_table[idx] = (uint64_t)guy | PRESENT | RWALLOWED;
      pt = guy;
    } else {
      pt = (uint64_t *)(page_table[idx] & 0x000ffffffffff000);
    }
  }
  return 0;
}
or maybe my bitmask? im unsure

Re: switching to new pagemap causes 0xe and more exceptions

Posted: Fri Nov 01, 2024 9:45 pm
by MichaelPetch
Other possibility though is that you are using a virtual address returned by pmm_alloc and treating it as a physical address somewhere.
Your pmm_init stores the virtual address in the free pages to create a linked list (not the physical address). Possibly you are eventually reading those virtual addresses back and treating them as physical addresses. I don't have time to go through your code at the moment. Trying to give you ideas so you can find the bug.

Re: switching to new pagemap causes 0xe and more exceptions

Posted: Fri Nov 01, 2024 11:11 pm
by MichaelPetch
In `kvmm_region_alloc` you have:

Code: Select all

      for (uint64_t i = 0; i != amount_to_allocateinpages; i++) {
        void *page = pmm_alloc();
        map(ker_map.pml4, (uint64_t)page, new->base + (i * 4096), flags);
      }        
I believe that it should be:

Code: Select all

      for (uint64_t i = 0; i != amount_to_allocateinpages; i++) {
        void *page = (void *)((uint64_t)pmm_alloc() - hhdm_request.response->offset);
        map(ker_map.pml4, (uint64_t)page, new->base + (i * 4096), flags);
      }

Re: switching to new pagemap causes 0xe and more exceptions

Posted: Sat Nov 02, 2024 7:47 am
by RayanMargham
That solved that issue but as usual new bugs

Code: Select all

Page Fault! CR2 0xffff800100c001d8
RIP is 0xffffffff80010666
NYAUX Panic! Reason: Page Fault:c

Re: switching to new pagemap causes 0xe and more exceptions

Posted: Sat Nov 02, 2024 11:02 am
by MichaelPetch
I noticed a potential cause of this bug last night but was hoping you might be able to track it down. You likely had a page fault with error code e=0002 which is a write to a non present page. In the QEMU monitor `info tlb` and `info mem` confirm this memory (0xffff800100c001d8) is in a page that is not present. You have many bugs in your vmm and one of the big ones I noticed is in `find_pte`. In particular this function returns back after only descending down to the pml2 (page directory). You need to modify that function to handle 2MiB pages and also descend to the PML1 (page tables) level when not using 2MiB pages.

Your going to need to start learning to debug these issues yourself because OSDev is hard but there is a skill set in doing proper debugging that needs to be learned. Are you using GDB to debug and do you use the QEMU monitor at all?

Re: switching to new pagemap causes 0xe and more exceptions

Posted: Sat Nov 02, 2024 10:19 pm
by RayanMargham
It is not find_pte, im afraid

Re: switching to new pagemap causes 0xe and more exceptions

Posted: Sun Nov 03, 2024 2:54 am
by RayanMargham
Also yes im using gdb and qemu monitor

im not great at using gdb tho