Hi,
zerosum wrote:It's taking about 4 seconds (I'm counting while it's running :-p) for it to zero out all of the structures required to identity map this. It was taking about 10-15 with my C memzero function, but I created an asm one which gave obvious speed improvements.
I'm not sure I've understood this correctly, but why does it zero out these structures in the first place (instead of just filling them with page table entries, page directory entries, etc)?
The fastest method would be to find an area of RAM large enough for all page table entries, then do something like this:
Code: Select all
cld
mov edi,<address_for_all_page_tables>
mov ecx,<total_pages>
mov eax,<first_address_low> | <page_table_flags>
mov ebx,<first_address_high>
.nextPT:
mov [edi],eax
mov [edi+4],ebx
add eax,0x00001000
adc ebx,0x00000000
add edi,8
sub ecx,1
jne .nextPT
test edi,0x00000FFF
je .donePT
.blankPT:
mov [edi],0
mov [edi+4],0
add edi,8
test edi,0x00000FFF
jne .blankPT
.donePT:
You'd do something very similar to create page directories, page directory pointer tables, etc.
Are you allocating pages from some sort of physical memory manager before using them in the paging structures? Do you allocate them one at a time? Do you test each page to see if it's faulty? If you are then the simple/fast method above won't be suitable. In this case I'd have something like:
Code: Select all
physical_address = 0;
PLM4address = get_page();
do {
PDPEaddress = get_page();
*PLM4address = PDPEaddress | PDPEflags;
PLM4address += 8;
do {
PDEaddress = get_page();
*PDPEaddress = PDEaddress | PDEflags;
PDPEaddress += 8;
do {
PTEaddress = get_page();
*PDEaddress = PTEaddress | PTEflags;
PDEaddress += 8;
do {
*PTEaddress = physical_address | pageflags;
PTEaddress += 8;
physical_address += 0x1000;
if( physical_address >= max_physical_address) goto done
} while( (PTEaddress & 0x00000FFF) != 0);
} while( (PDEaddress & 0x00000FFF) != 0);
} while( (PDPEaddress & 0x00000FFF) != 0);
} while( (PLM4address & 0x00000FFF) != 0);
done:
while( (PLM4address & 0x00000FFF) != 0) {
*PLM4address = 0;
PLM4address += 8;
}
while( (PDPEaddress & 0x00000FFF) != 0) {
*PDPEaddress = 0;
PDPEaddress += 8;
}
while( (PDEaddress & 0x00000FFF) != 0) {
*PDEaddress = 0;
PDEaddress += 8;
}
while( (PTEaddress & 0x00000FFF) != 0) {
*PTEaddress = 0;
PTEaddress += 8;
}
Of course there is a (very minor) problem with both of these methods - you'd can only use 32-bit physical addresses until you setup long mode. If the computer has 1536 GB of RAM (or more) then you'll probably only be able to use 3 GB of RAM with 32-bit addresssing (due to PCI hole, ROM, APICs, etc), and you won't be able to access enough RAM (below 4 GB) to create the paging structures needed to identity map the full 1536 GB (or more) of RAM.
This brings me to my first suggestion: sometimes the fastest way to do something is to pretend you've done it and actually do nothing.
For example, you could identity map your kernel and nothing else. When your kernel tries to access a page that should've been identity mapped but wasn't, then your page fault handler can create the necessary paging structures to identity map the page. It will look like everything was identity mapped (even though most of it wasn't). This will also solve the "more than 1536 GB of RAM" problem because you'd be allocating page tables, etc while in long mode (and not with 32-bit addressing).
This brings me to my second suggestion: sometimes the fastest way to do something is to do nothing (and not bother to pretend you've done it).
Why do you need to identity map everything? AFAIK most kernels don't. For e.g. Linux identity maps a relatively small area (16 MB?) and I never identity map anything. There are reasons to dynamically allocate pages used by the kernel instead of relying on a static identity mapping (e.g. NUMA and fault tolerance).
Cheers,
Brendan