Page 1 of 1

invlpg vs. complete flush

Posted: Mon Apr 13, 2009 8:08 pm
by earlz
Hi, The way my OS works is the kernel is able to access any physical address below 2G at anytime by identity paging. I really do not want to waste a lot of memory(up to like 2M per process) so I figured I could just update the page-tables each context switch. Now the problem is flushing the buffers. Only the upper 2G tables change(and usually not all of them).

So is it faster to invlpg each table or to just make a mov to cr3 to invalidate all the pages at once. Or possibly a hybrid solution such as like if less than X pages must be invalidated, then use invlpg, else mov to cr3

Re: invlpg vs. complete flush

Posted: Mon Apr 13, 2009 9:10 pm
by Firestryke31
One thing to keep from wasting 2MB per process if it's only going to use 1-2KB is to only map page tables you need. Take advantage of the fact that a page table can be allocated only as needed.

For instance, say a process only uses 8KB - 4 for code and 4 for data. You'd be able to do this:

4KB for a process-specific page directory
4KB for a process-specific page table
4KB for the data
4KB for the code

That works out to a minimum of 16KB per process, including actual used pages, instead of 2MB per process just for page tables and directories.

Then just mark any unused tables in the directory as "not present" so you don't need to allocate pages for all of those tables.

Also, keep in mind you can map a page table to more than one page directory, so if you want the kernel's table(s) just put them into each process' page directory, saving that duplicated data that shouldn't change between processes.

Then, to do a context switch, just reload CR3 with the new process' page directory.

Re: invlpg vs. complete flush

Posted: Mon Apr 13, 2009 9:52 pm
by earlz
Ok, well that makes a lot of sense.. but still doesn't quite answer my question. In that small case, it would be fastest to just invlpg the 2 page entries corresponding to the code and data. But what exactly is the threshold? When is it more efficient to completely flush the TLB rather than using invlpg?

Also, I am opting to not use the global feature. I had rather my OS be able to run on 386s...

Re: invlpg vs. complete flush

Posted: Tue Apr 14, 2009 6:23 am
by Brendan
Hi,
earlz wrote:Ok, well that makes a lot of sense.. but still doesn't quite answer my question. In that small case, it would be fastest to just invlpg the 2 page entries corresponding to the code and data. But what exactly is the threshold? When is it more efficient to completely flush the TLB rather than using invlpg?
This is a hard thing to estimate - a fast CPU with slow RAM isn't the same as a slow CPU with fast RAM, and it would depend on access patterns, TLB contents and TLB size.
earlz wrote:Also, I am opting to not use the global feature. I had rather my OS be able to run on 386s...
80386 doesn't support the INVLPG instruction either.

It's probably a good idea to find out which features the CPU supports, and use INVLPG and global pages if they are supported. For modern CPUs (where CPUs are a lot faster than RAM and the TLB is larger) INVLPG and global pages are a lot more important...


Cheers,

Brendan

Re: invlpg vs. complete flush

Posted: Tue Apr 14, 2009 9:25 pm
by earlz
Brendan wrote:Hi,
earlz wrote:Ok, well that makes a lot of sense.. but still doesn't quite answer my question. In that small case, it would be fastest to just invlpg the 2 page entries corresponding to the code and data. But what exactly is the threshold? When is it more efficient to completely flush the TLB rather than using invlpg?
This is a hard thing to estimate - a fast CPU with slow RAM isn't the same as a slow CPU with fast RAM, and it would depend on access patterns, TLB contents and TLB size.
earlz wrote:Also, I am opting to not use the global feature. I had rather my OS be able to run on 386s...
80386 doesn't support the INVLPG instruction either.

It's probably a good idea to find out which features the CPU supports, and use INVLPG and global pages if they are supported. For modern CPUs (where CPUs are a lot faster than RAM and the TLB is larger) INVLPG and global pages are a lot more important...


Cheers,

Brendan
I don't intend it running on a 386.. so cpuid should work for that.. and I'll consider such a hybrid solution..