Speed of PML4 table creation

zerosum · Post by **zerosum** » Fri Apr 11, 2008 9:20 pm

Hi all,

I'm just wondering... how long (time wise) does it take your code to create your long mode paging structures? I know that's a "how long is a piece of string" question, but I ask for a reason..

I've got some code written to identity map all of physical memory. At the moment it's running under BOCHS, which is simulating 30mb of physical memory.

It's taking about 4 seconds (I'm counting while it's running :-p) for it to zero out all of the structures required to identity map this. It was taking about 10-15 with my C memzero function, but I created an asm one which gave obvious speed improvements.

How long would it take if it had to identity map all 3GB that I actually have?! I realise that things are going to be a bit slower under BOCHS, but still.... I'm wondering if I'm doing something seriously wrong.

Cheers,
Lee

Brendan · Post by **Brendan** » Fri Apr 11, 2008 11:06 pm

Hi,

zerosum wrote:It's taking about 4 seconds (I'm counting while it's running :-p) for it to zero out all of the structures required to identity map this. It was taking about 10-15 with my C memzero function, but I created an asm one which gave obvious speed improvements.

I'm not sure I've understood this correctly, but why does it zero out these structures in the first place (instead of just filling them with page table entries, page directory entries, etc)?

The fastest method would be to find an area of RAM large enough for all page table entries, then do something like this:

Code: Select all

    cld

    mov edi,<address_for_all_page_tables>
    mov ecx,<total_pages>
    mov eax,<first_address_low> | <page_table_flags>
    mov ebx,<first_address_high>
.nextPT:
    mov [edi],eax
    mov [edi+4],ebx
    add eax,0x00001000
    adc ebx,0x00000000
    add edi,8
    sub ecx,1
    jne .nextPT

    test edi,0x00000FFF
    je .donePT
.blankPT:
    mov [edi],0
    mov [edi+4],0
    add edi,8
    test edi,0x00000FFF
    jne .blankPT
.donePT:

You'd do something very similar to create page directories, page directory pointer tables, etc.

Are you allocating pages from some sort of physical memory manager before using them in the paging structures? Do you allocate them one at a time? Do you test each page to see if it's faulty? If you are then the simple/fast method above won't be suitable. In this case I'd have something like:

Code: Select all

    physical_address = 0;
    PLM4address = get_page();
    do {
        PDPEaddress = get_page();
        *PLM4address = PDPEaddress | PDPEflags;
        PLM4address += 8;
        do {
            PDEaddress = get_page();
            *PDPEaddress = PDEaddress | PDEflags;
            PDPEaddress += 8;
            do {
                PTEaddress = get_page();
                *PDEaddress = PTEaddress | PTEflags;
                PDEaddress += 8;
                do {
                    *PTEaddress = physical_address | pageflags;
                    PTEaddress += 8;
                    physical_address += 0x1000;
                    if( physical_address >= max_physical_address) goto done
                } while( (PTEaddress & 0x00000FFF) != 0);
            } while( (PDEaddress & 0x00000FFF) != 0);
        } while( (PDPEaddress & 0x00000FFF) != 0);
    }  while( (PLM4address & 0x00000FFF) != 0);

done:
    while( (PLM4address & 0x00000FFF) != 0) {
         *PLM4address = 0;
         PLM4address += 8;
    }
    while( (PDPEaddress & 0x00000FFF) != 0) {
         *PDPEaddress = 0;
         PDPEaddress += 8;
    }
    while( (PDEaddress & 0x00000FFF) != 0) {
         *PDEaddress = 0;
         PDEaddress += 8;
    }
    while( (PTEaddress & 0x00000FFF) != 0) {
         *PTEaddress = 0;
         PTEaddress += 8;
    }

Of course there is a (very minor) problem with both of these methods - you'd can only use 32-bit physical addresses until you setup long mode. If the computer has 1536 GB of RAM (or more) then you'll probably only be able to use 3 GB of RAM with 32-bit addresssing (due to PCI hole, ROM, APICs, etc), and you won't be able to access enough RAM (below 4 GB) to create the paging structures needed to identity map the full 1536 GB (or more) of RAM.

This brings me to my first suggestion: sometimes the fastest way to do something is to pretend you've done it and actually do nothing.

For example, you could identity map your kernel and nothing else. When your kernel tries to access a page that should've been identity mapped but wasn't, then your page fault handler can create the necessary paging structures to identity map the page. It will look like everything was identity mapped (even though most of it wasn't). This will also solve the "more than 1536 GB of RAM" problem because you'd be allocating page tables, etc while in long mode (and not with 32-bit addressing).

This brings me to my second suggestion: sometimes the fastest way to do something is to do nothing (and not bother to pretend you've done it).

Why do you need to identity map everything? AFAIK most kernels don't. For e.g. Linux identity maps a relatively small area (16 MB?) and I never identity map anything. There are reasons to dynamically allocate pages used by the kernel instead of relying on a static identity mapping (e.g. NUMA and fault tolerance).

Cheers,

Brendan

zerosum · Post by **zerosum** » Fri Apr 11, 2008 11:36 pm

Hi Brendan,

Thank you for your detailed response

The reason I'm zeroing out the structures is because all of the structures besides an actual page-table-entry are pointers to 512-entry tables. If I don't zero them out, marking them as not present, then they could theoretically point anywhere and if for some reason code tries to access one of these bad addresses, the result would be undefined... it could happen to point to something which appears to be a valid pte mapping to some of the kernel code, for instance.

I just realised I've been zeroing out all the pte's as well, which is stupid... obviously it would only be beneficial to zero out any pte's which are unused. This would be the part taking up so much time.

At the moment I have no device drivers, no filesystem drivers etc so I have no way of swapping a page, so I'm currently identity-mapping everything, which, as you say, is not entirely necessary and which I will not be doing in the future.

Is there anything wrong with the above reasoning? I'm quite new to all this, so I could easily misunderstand how many of these things (paging etc) work.

Cheers,
Lee

Brendan · Post by **Brendan** » Sat Apr 12, 2008 1:01 am

Hi,

zerosum wrote:Is there anything wrong with the above reasoning? I'm quite new to all this, so I could easily misunderstand how many of these things (paging etc) work.

You don't need to be able to swap a page for my first suggestion - you only need to do something like:

Code: Select all

page_fault_handler:
    if ( CR2 is within identity map area ) {
        identity_map_one_page_at(CR2);
        iretd;
    } else {
        /* something went wrong */
    }

Now, try to think of reasons for identity mapping...

Possibly one of the most common reasons is that the tools used to build the kernel can't generate code that runs at 2 different addresses (e.g. at 1 MB before paging is started and somewhere else after paging is started). The simple solution is to have a seperate "kernel setup" stage that sets up paging for the kernel before the kernel is started.

Another reason is to make physical memory management easier, but that depends on how you do physical memory management.

Another common reason is that some tutorial did it, but that isn't a valid reason IMHO.

I can't think of any other reasons.

Cheers,

Brendan

zerosum · Post by **zerosum** » Sat Apr 12, 2008 1:27 am

Hi again,

I realise I don't have to identity map the entire space and that I could do it your way, i.e. write a page fault handler to map the unmapped portion

I've done it this way for the sake of simplicity. I'm only learning

Basically, what happens in my setup is:

1. Grub loads 32-bit kernel stub.
2. 32-bit stub loads 64-bit kernel into memory
3. 32-bit stub identity maps all physical memory (which as you pointed out may be impossible while using 32 bit addressing) after the 64-bit kernel
4. 32-bit stub enables paging and long mode and jumps to 64-bit kernel
5. 64-bit kernel entry point is actually 32-bits. It sets up the new GDT, loads it and does a far jump.

That's it. That's all I've got. I identity mapped everything so I knew where the 64-bit kernel ends, where the paging structures end and I know that I don't have to dynamically adjust the page tables, which would alter where they end in memory.

Of course, this is going to have to change. I know that and I'm not disputing it

One thing I'm wondering though... when you're in 32-bit protected mode, can you actually load the 64-bit descriptor table entries? For some reason I previously assumed you couldn't, but now I'm thinking that they won't actually be touched until you do a far jump, after which you'll be in long mode anyway..................

Thanks Brendan,
Lee

OSDev.org

Speed of PML4 table creation

Speed of PML4 table creation

Re: Speed of PML4 table creation