Page 1 of 2

Setting up paging

Posted: Fri May 21, 2021 12:53 pm
by Bonfra
So my bootloader is responsible for setting identity paging. Now in the kernel, I want to override the paging system set by the bootloader. I started by creating a function that creates the base hierarchy thing:

Code: Select all

paging_data_t paging_create()
{
    uint64_t* pml4 = (uint64_t*)pfa_alloc_page();
    memset(pml4, 0, pfa_page_size());
    return (paging_data_t)pml4;
}
Then I need some defines and macros for the next function

Code: Select all

#define PML_PRESENT (1ull << 0)
#define PML_READWRITE (1ull << 1)
#define PML_USER (1ull << 2)
#define PML_WRITETHROUGH (1ull << 3)
#define PML_CACHEDISABLE (1ull << 4)
#define PML_ACCESSED (1ull << 5)
#define PML_SIZE (1ull << 7)
#define PML_AVAILABLE (0b111ull << 9)
#define PML_ADDRESS (0xFFFFFFFFFFull << 12)
#define PML_EXECDISABLE (1ull << 63)

#define PML_CLEAR_AVAILABLE(entry) (entry &= ~PML_AVAILABLE)
#define PML_SET_AVAILABLE(entry, val) (entry |= ((val << 9) & PML_AVAILABLE))
#define PML_UPDATE_AVAILABLE(entry, val) (PML_CLEAR_AVAILABLE(entry), PML_SET_AVAILABLE(entry, val))

#define PML_CLEAR_ADDRESS(entry) (entry &= ~PML_ADDRESS)
#define PML_SET_ADDRESS(entry, val) (entry |= ((val << 12) & PML_ADDRESS))
#define PML_UPDATE_ADDRESS(entry, val) (PML_CLEAR_ADDRESS(entry), PML_SET_ADDRESS(entry, val))

#define PT_UPDATE_ADDRESS(entry, val) PML_UPDATE_ADDRESS(entry, val)
Finally I have a function to attach a 2mb page to the hirerachy

Code: Select all

void paging_attach_2mb_page(paging_data_t data, void* physical_addr, void* virtual_addr)
{
    uint64_t pml4_offset = (uint64_t)virtual_addr & UINT64_C(0x01FF) >> 39;
    uint64_t pdp_offset = (uint64_t)virtual_addr & UINT64_C(0x01FF) >> 30;
    uint64_t pd_offset = (uint64_t)virtual_addr & UINT64_C(0x01FF) >> 21;

    uint64_t* pml4 = (uint64_t*)data;
    if((pml4[pml4_offset] & PML_PRESENT) == 0)
    {
        uint64_t* pdp = pfa_alloc_page();
        memset(pdp, 0, pfa_page_size());

        pml4[pml4_offset] = (PML_PRESENT | PML_READWRITE) & ~PML_USER;
        PML_UPDATE_ADDRESS(pml4[pml4_offset], (uint64_t)pdp);
    }

    uint64_t* pdp = (uint64_t*)((pml4[pml4_offset] & PML_ADDRESS) >> 12);
    if((pdp[pdp_offset] & PML_PRESENT) == 0)
    {
        uint64_t* pd = pfa_alloc_page();
        memset(pd, 0, pfa_page_size());

        pdp[pdp_offset] = (PML_PRESENT | PML_READWRITE) & ~PML_USER;
        PML_UPDATE_ADDRESS(pdp[pdp_offset], (uint64_t)pd);
    }

    uint64_t* pd = (uint64_t*)((pdp[pdp_offset] & PML_ADDRESS) >> 12);
    pd[pd_offset] = (PML_PRESENT | PML_READWRITE | PML_SIZE) & ~PML_USER;
    PML_UPDATE_ADDRESS(pd[pd_offset], (uint64_t)physical_addr);
}
With all of this code working, I tried to use this code as so:

Code: Select all

    static paging_data_t paging_data;
    paging_data = paging_create();
    for(uint64_t i = 0; i < memorySize / 1024; i += 2)
        paging_attach_2mb_page(paging_data, i * 0x200000, i * 0x200000);

    asm volatile("mov cr3, %[addr]" : : [addr]"r"(paging_data) : "memory"); //intel syntax here
But after this QEMU just pauses. running `info mem` in the monitor should tell me which memory is available but actually prints nothing.
I think paging is enabled correctly since it works with the identity paging set by the bootloader, anyway I'm posting also the code related to paging enabling

Code: Select all

; Enable PAE paging.
    mov eax, cr4
    or eax, (1 << 5) | (1 << 4)   ; CR4.PAE | CR4.PSE
    mov cr4, eax
I even printed all the memory in binary with gdb to ensure that the paging hierarchy was right, and it is. Can you see anything weird here? I'm also posting my GitHub repo if you need to check something else.

Re: Setting up paging

Posted: Fri May 21, 2021 1:32 pm
by Octocontrabass
Bonfra wrote:But after this QEMU just pauses.
Sounds like a triple fault.
Bonfra wrote:running `info mem` in the monitor should tell me which memory is available but actually prints nothing.
Sounds like no pages are present.
Bonfra wrote:

Code: Select all

#define PML_SET_ADDRESS(entry, val) (entry |= ((val << 12) & PML_ADDRESS))
Why are you shifting the address left?

Re: Setting up paging

Posted: Fri May 21, 2021 1:33 pm
by nullplan

Code: Select all

    uint64_t pml4_offset = (uint64_t)virtual_addr & UINT64_C(0x01FF) >> 39;
    uint64_t pdp_offset = (uint64_t)virtual_addr & UINT64_C(0x01FF) >> 30;
    uint64_t pd_offset = (uint64_t)virtual_addr & UINT64_C(0x01FF) >> 21;
You are masking before the shift. Therefore, the only things you mask out are almost all of the offset bits, which are likely zero. Then you shift a zero somewhere, but the result of that is always zero.

Here's my code for the initial page map:

Code: Select all

#define PAGE_PRESENT    1
#define PAGE_WRITABLE   2
#define PAGE_USER       4
#define PAGE_PS         128
#define PAGE_PTR_MASK   0xfffff000  /* we are in 32-bit mode still */

static uint64_t *pml4;
static void mmap(uint64_t va, uint64_t pa, uint64_t len, uint64_t flags)
{
    if (!pml4)
        pml4 = page_alloc_or_die();

    if ((va ^ pa) & 0xfff)
        die("Virtual and physical addresses misaligned!\n");
    len += va & 0xfff;
    va &= -0x1000ull;
    pa &= -0x1000ull;

    len = (len + 0xfff) & -0x1000ull;
    while (len)
    {
        /* how to choose page size?
         * Choose biggest size possible.
         */
        /* addr:
         *    6         5         4         3         2         1
         * 3210987654321098765432109876543210987654321098765432109876543210
         * `sign extens.  ´`pml4idx´`pdpidx ´`pdidx  ´`ptidx  ´`offset    ´
         */
        if (is_gb_paging_possible() && len >= 1 << 30 && !(va & ((1 << 30) - 1)) && !(pa & (1 << 30) - 1))
        {
            size_t pml4idx = (va >> 39) & 0x1ff;
            size_t pdpidx = (va >> 30) & 0x1ff;
            if (!pml4[pml4idx] & PAGE_PRESENT)
                pml4[pml4idx] = (uint64_t)(uintptr_t)page_alloc_or_die() | PAGE_PRESENT | PAGE_WRITABLE;
            uint64_t *pdpt = (uint64_t*)(uintptr_t)(pml4[pml4idx] & PAGE_PTR_MASK);
            pdpt[pdpidx] = pa | flags | PAGE_PRESENT | PAGE_PS;
            va += 1 << 30;
            pa += 1 << 30;
            len -= 1 << 30;
        }
        else if (len >= 2 << 20 && !(va & (2 << 20) - 1) && !(pa & (2 << 20) - 1))
        {
            size_t pml4idx = (va >> 39) & 0x1ff;
            size_t pdpidx = (va >> 30) & 0x1ff;
            size_t pdidx = (va >> 21) & 0x1ff;
            if (!pml4[pml4idx] & 1)
                pml4[pml4idx] = (uint64_t)(uintptr_t)page_alloc_or_die() | PAGE_PRESENT | PAGE_WRITABLE;
            uint64_t *pdpt, *pdt;
            pdpt = (void*)(uintptr_t)(pml4[pml4idx] & PAGE_PTR_MASK);
            if (!pdpt[pdpidx] & PAGE_PRESENT)
                pdpt[pdpidx] = (uint64_t)(uintptr_t)page_alloc_or_die() | PAGE_PRESENT | PAGE_WRITABLE;
            pdt = (void*)(uintptr_t)(pdpt[pdpidx] & PAGE_PTR_MASK);
            pdt[pdidx] = pa | flags | PAGE_PRESENT | PAGE_PS;

            va += 1 << 21;
            pa += 1 << 21;
            len -= 1 << 21;
        }
        else
        {
            size_t pml4idx = (va >> 39) & 0x1ff;
            size_t pdpidx = (va >> 30) & 0x1ff;
            size_t pdidx = (va >> 21) & 0x1ff;
            size_t ptidx = (va >> 12) & 0x1ff;
            uint64_t *pdpt, *pdt, *pt;
            if (!pml4[pml4idx] & PAGE_PRESENT)
                pml4[pml4idx] = (uint64_t)(uintptr_t)page_alloc_or_die() | PAGE_PRESENT | PAGE_WRITABLE;
            pdpt = (uint64_t*)(uintptr_t)(pml4[pml4idx] & PAGE_PTR_MASK);
            if (!pdpt[pdpidx] & PAGE_PRESENT)
                pdpt[pdpidx] = (uint64_t)(uintptr_t)page_alloc_or_die() | PAGE_PRESENT | PAGE_WRITABLE;
            pdt = (uint64_t*)(uintptr_t)(pdpt[pdpidx] & PAGE_PTR_MASK);
            if (!pdt[pdidx] & PAGE_PRESENT)
                pdt[pdidx] = (uint64_t)(uintptr_t)page_alloc_or_die() | PAGE_PRESENT | PAGE_WRITABLE;
            pt = (uint64_t*)(uintptr_t)(pdt[pdidx] & PAGE_PTR_MASK);
            pt[ptidx] = pa | flags | PAGE_PRESENT;

            va += 1 << 12;
            pa += 1 << 12;
            len -= 1 << 12;
        }
    }
}
Yeah, it's no beauty. I should probably refactor parts of it. But it gets the job done. page_alloc_or_die() will zero the page. is_gb_paging_possible() is a wrapper around is_gb_paging_possible_asm() that caches the answer (because the answer will not change), and is_gb_paging_possible_asm() queries the relevant CPUID bit (EDX bit 26 from function 0x80000001).

This code presumes identity mapped code for now. Inside the running kernel, things will look a bit different: First of all, we will be in 64-bit mode, we will not be identity mapped, so I actually have to translate from physical to virtual address. And we will likely have the NX bit available (which I'm not using here for simplicity). Also, my code is C while yours looks like C++, so take care of the differences. Although, if your code is C++, why are you using C-style casts and not the more fine grained ones given by C++?

Re: Setting up paging

Posted: Fri May 21, 2021 1:39 pm
by Bonfra
Octocontrabass wrote: Sounds like a triple fault.
It could be but gdb prints

Code: Select all

Program
 received signal SIGQUIT, Quit.
init (bootinfo=<error reading variable: Cannot access memory at address 0xcfff98>) at src/kernel.c:76
76	}
Octocontrabass wrote: Sounds like no pages are present.
Exactly what I thought
Octocontrabass wrote: Why are you shifting the address left?
I tried both ways and this side seems to set the bytes in the entry the correct way. I just tried shifting the other way around and this causes a page fault, also `info mem` return something useful now.
Image

Re: Setting up paging

Posted: Fri May 21, 2021 1:44 pm
by Bonfra
nullplan wrote:

Code: Select all

    uint64_t pml4_offset = (uint64_t)virtual_addr & UINT64_C(0x01FF) >> 39;
    uint64_t pdp_offset = (uint64_t)virtual_addr & UINT64_C(0x01FF) >> 30;
    uint64_t pd_offset = (uint64_t)virtual_addr & UINT64_C(0x01FF) >> 21;
You are masking before the shift. Therefore, the only things you mask out are almost all of the offset bits, which are likely zero. Then you shift a zero somewhere, but the result of that is always zero.
Oh, operator precedence.. should it be like so?

Code: Select all

    uint64_t pml4_offset = (uint64_t)virtual_addr & (UINT64_C(0x01FF) >> 39);
    uint64_t pdp_offset = (uint64_t)virtual_addr & (UINT64_C(0x01FF) >> 30);
    uint64_t pd_offset = (uint64_t)virtual_addr & (UINT64_C(0x01FF) >> 21);
nullplan wrote: Although, if your code is C++, why are you using C-style casts and not the more fine grained ones given by C++?
It is C, not C++

Re: Setting up paging

Posted: Fri May 21, 2021 2:00 pm
by Octocontrabass
Bonfra wrote:
Octocontrabass wrote:Why are you shifting the address left?
I tried both ways and this side seems to set the bytes in the entry the correct way.
But you shouldn't be shifting the address at all. The low 12 bits of the address are always 0, so they're not stored at all, and that space is reused for 12 bits of flags instead.

Re: Setting up paging

Posted: Fri May 21, 2021 2:05 pm
by Bonfra
Octocontrabass wrote: But you shouldn't be shifting the address at all. The low 12 bits of the address are always 0, so they're not stored at all, and that space is reused for 12 bits of flags instead.
Hmm, yea makes sense. So how should I set the address bits of the PML entry? Just this?

Code: Select all

#define PML_ADDRESS (0xFFFFFFFFFFull << 12)
#define PML_SET_ADDRESS(entry, val) (entry |= (val & PML_ADDRESS))

Re: Setting up paging

Posted: Fri May 21, 2021 8:57 pm
by Octocontrabass
Bonfra wrote:Just this?
Yep, that looks right.

Re: Setting up paging

Posted: Fri May 21, 2021 9:38 pm
by nullplan
Bonfra wrote:Oh, operator precedence.. should it be like so?
No, the expressions are just wrong. You must do the shift first, then the mask. Like so:

Code: Select all

    uint64_t pml4_offset = ((uint64_t)virtual_addr >> 39) & 0x1ff;
    uint64_t pdp_offset = ((uint64_t)virtual_addr >> 30) & 0x1ff;
    uint64_t pd_offset = ((uint64_t)virtual_addr >> 21) & 0x1ff;
Further, there is no need for UINT64_C here, as the constant given is small enough to definitely fit into an int (even if int were a 16-bit type), and operands of binary arithmetic and comparison operators are converted to the same type beforehand anyway, and larger integer types have higher conversion rank than smaller ones.
Bonfra wrote:It is C, not C++
In that case there is no need to explicitly convert pointer-to-void to any other pointer type. Pointer-to-void implicitly converts to all other pointer types (regrettably, even including pointer-to-pointer-to-void) without cast.

Re: Setting up paging

Posted: Fri May 21, 2021 11:17 pm
by Bonfra
So all i changed in my code is the macros and the calculation for the index in the table like so:

Code: Select all

#define PML_SET_AVAILABLE(entry, val) (entry |= ((val) & PML_AVAILABLE))
#define PML_SET_ADDRESS(entry, val) (entry |= ((val) & PML_ADDRESS))

Code: Select all

uint64_t pml4_offset = ((uint64_t)virtual_addr >> 39) & 0x01FF;
uint64_t pdp_offset = ((uint64_t)virtual_addr >> 30) & 0x01FF;
uint64_t pd_offset = ((uint64_t)virtual_addr >> 21) & 0x01FF;
It does not give me explicitly a page fault but it's like before: QEMU pauses and GDB prints that that memory can't be accessed. Also `info mem` returns nothing.
nullplan wrote: In that case there is no need to explicitly convert pointer-to-void to any other pointer type
Oh, I thought it would've spit out a warning. Apparently, it's not. Thanks for the advice, this will make my code a lot cleaner

Re: Setting up paging

Posted: Sat May 22, 2021 12:11 am
by Octocontrabass
Bonfra wrote:It does not give me explicitly a page fault but it's like before: QEMU pauses and GDB prints that that memory can't be accessed. Also `info mem` returns nothing.
Your page fault handler won't run if it's not mapped properly. Add "-d int" to your QEMU command line if you want to see the page fault. Use the QEMU console if you want to dump memory.

Re: Setting up paging

Posted: Sat May 22, 2021 12:20 am
by Bonfra
this is the exception output from qemu `-d int`

Code: Select all

check_exception old: 0xffffffff new 0xe
    38: v=0e e=0000 i=0 cpl=0 IP=0010:0000000000101212 pc=0000000000101212 SP=0008:0000000000cfff70 CR2=0000000000101212
RAX=0000000000d07000 RBX=0000000000001000 RCX=000000003fc00000 RDX=000000003fc00083
RSI=0000000000000ff0 RDI=0000000000d07000 RBP=0000000000cfffd0 RSP=0000000000cfff70
R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000023 R11=0000000000000000
R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
RIP=0000000000101212 RFL=00000212 [----A--] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0008 0000000000000000 00000000 00009300 DPL=0 DS   [-WA]
CS =0010 0000000000000000 00000000 00209a00 DPL=0 CS64 [-R-]
SS =0008 0000000000000000 00000000 00009300 DPL=0 DS   [-WA]
DS =0008 0000000000000000 00000000 00009300 DPL=0 DS   [-WA]
FS =0008 0000000000000000 00000000 00009300 DPL=0 DS   [-WA]
GS =0008 0000000000000000 00000000 00009300 DPL=0 DS   [-WA]
LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
GDT=     0000000000004500 00000027
IDT=     00000000009202a0 00001000
CR0=80000013 CR2=0000000000101212 CR3=0000000000d07000 CR4=00000630
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
CCS=00000000000001ff CCD=0000000000000001 CCO=SUBQ
EFER=0000000000000500
check_exception old: 0xe new 0xe
    39: v=08 e=0000 i=0 cpl=0 IP=0010:0000000000101212 pc=0000000000101212 SP=0008:0000000000cfff70 env->regs[R_EAX]=0000000000d07000
RAX=0000000000d07000 RBX=0000000000001000 RCX=000000003fc00000 RDX=000000003fc00083
RSI=0000000000000ff0 RDI=0000000000d07000 RBP=0000000000cfffd0 RSP=0000000000cfff70
R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000023 R11=0000000000000000
R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
RIP=0000000000101212 RFL=00000212 [----A--] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0008 0000000000000000 00000000 00009300 DPL=0 DS   [-WA]
CS =0010 0000000000000000 00000000 00209a00 DPL=0 CS64 [-R-]
SS =0008 0000000000000000 00000000 00009300 DPL=0 DS   [-WA]
DS =0008 0000000000000000 00000000 00009300 DPL=0 DS   [-WA]
FS =0008 0000000000000000 00000000 00009300 DPL=0 DS   [-WA]
GS =0008 0000000000000000 00000000 00009300 DPL=0 DS   [-WA]
LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
GDT=     0000000000004500 00000027
IDT=     00000000009202a0 00001000
CR0=80000013 CR2=0000000000920380 CR3=0000000000d07000 CR4=00000630
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
CCS=00000000000001ff CCD=0000000000000001 CCO=SUBQ
EFER=0000000000000500
check_exception old: 0x8 new 0xe
cr3 has the right address of the table and `xp /10 0xd07000` shows the table the right way

Re: Setting up paging

Posted: Sat May 22, 2021 1:49 am
by Octocontrabass
Bonfra wrote:cr3 has the right address of the table and `xp /10 0xd07000` shows the table the right way
CR2 says the faulting address is 0x101212. Walk the table and find the entry at each level that will be used for that address. Are all of those entries correct?

Re: Setting up paging

Posted: Sun May 23, 2021 12:55 am
by Bonfra
Octocontrabass wrote: CR2 says the faulting address is 0x101212. Walk the table and find the entry at each level that will be used for that address. Are all of those entries correct?
`info mem` returns nothing so I think that the page table is in fact incorrect. I tried to walk the page table manually but I went mad trying to understand all of that bits :(
Just to be sure I dumped the first entry of the PML4 and this is what I got

Code: Select all

cr3            0xd07000
	[ PDBR=0 PCID=0 ]

x /gt 0xd07000
0xd07000:	0000000000000000000000000000000000000000110100001000000000000011
0
00000000000
0000000000000000000000000000110100001000 => 0xD08
000
0
0
0
0
0
0
0
1 read/write
1 present
the flags are correct but the address isn't should be something like 0xd07000+0x1000

Re: Setting up paging

Posted: Sun May 23, 2021 9:11 am
by nullplan
Bonfra wrote:`info mem` returns nothing so I think that the page table is in fact incorrect. I tried to walk the page table manually but I went mad trying to understand all of that bits :(
Understandable, given that you actually are looking at the bits. A page table entry is essentially just the physical address of the next page table or the page with a few bits ORed in, so just print it in hex:

Code: Select all

0xd07000:   0000000000000000000000000000000000000000110100001000000000000011
group into quartets: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 1101 0000 1000 0000 0000 0011
hexadecimal: 0x0000000000d08003
So the entry points to 0xd08000, with "present" and "writable" bits being set. Which seems to be what you want. Now you only need to follow the chain for the other parts of the address.

Remember, you are getting a page fault for 0x101212, so PML4 index 0, PDPT index 0, PDT index 0 and PT index 0x101 (257), if you get that far. Remember that bit 8 (mask 0x80) is the page size bit, and terminates the lookup. And that bit is invalid in the PML4 and the PDPT.