invlpg vs. complete flush

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
earlz
Member
Member
Posts: 1546
Joined: Thu Jul 07, 2005 11:00 pm
Contact:

invlpg vs. complete flush

Post by earlz »

Hi, The way my OS works is the kernel is able to access any physical address below 2G at anytime by identity paging. I really do not want to waste a lot of memory(up to like 2M per process) so I figured I could just update the page-tables each context switch. Now the problem is flushing the buffers. Only the upper 2G tables change(and usually not all of them).

So is it faster to invlpg each table or to just make a mov to cr3 to invalidate all the pages at once. Or possibly a hybrid solution such as like if less than X pages must be invalidated, then use invlpg, else mov to cr3
User avatar
Firestryke31
Member
Member
Posts: 550
Joined: Sat Nov 29, 2008 1:07 pm
Location: Throw a dart at central Texas
Contact:

Re: invlpg vs. complete flush

Post by Firestryke31 »

One thing to keep from wasting 2MB per process if it's only going to use 1-2KB is to only map page tables you need. Take advantage of the fact that a page table can be allocated only as needed.

For instance, say a process only uses 8KB - 4 for code and 4 for data. You'd be able to do this:

4KB for a process-specific page directory
4KB for a process-specific page table
4KB for the data
4KB for the code

That works out to a minimum of 16KB per process, including actual used pages, instead of 2MB per process just for page tables and directories.

Then just mark any unused tables in the directory as "not present" so you don't need to allocate pages for all of those tables.

Also, keep in mind you can map a page table to more than one page directory, so if you want the kernel's table(s) just put them into each process' page directory, saving that duplicated data that shouldn't change between processes.

Then, to do a context switch, just reload CR3 with the new process' page directory.
Owner of Fawkes Software.
Wierd Al wrote: You think your Commodore 64 is really neato,
What kind of chip you got in there, a Dorito?
earlz
Member
Member
Posts: 1546
Joined: Thu Jul 07, 2005 11:00 pm
Contact:

Re: invlpg vs. complete flush

Post by earlz »

Ok, well that makes a lot of sense.. but still doesn't quite answer my question. In that small case, it would be fastest to just invlpg the 2 page entries corresponding to the code and data. But what exactly is the threshold? When is it more efficient to completely flush the TLB rather than using invlpg?

Also, I am opting to not use the global feature. I had rather my OS be able to run on 386s...
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: invlpg vs. complete flush

Post by Brendan »

Hi,
earlz wrote:Ok, well that makes a lot of sense.. but still doesn't quite answer my question. In that small case, it would be fastest to just invlpg the 2 page entries corresponding to the code and data. But what exactly is the threshold? When is it more efficient to completely flush the TLB rather than using invlpg?
This is a hard thing to estimate - a fast CPU with slow RAM isn't the same as a slow CPU with fast RAM, and it would depend on access patterns, TLB contents and TLB size.
earlz wrote:Also, I am opting to not use the global feature. I had rather my OS be able to run on 386s...
80386 doesn't support the INVLPG instruction either.

It's probably a good idea to find out which features the CPU supports, and use INVLPG and global pages if they are supported. For modern CPUs (where CPUs are a lot faster than RAM and the TLB is larger) INVLPG and global pages are a lot more important...


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
earlz
Member
Member
Posts: 1546
Joined: Thu Jul 07, 2005 11:00 pm
Contact:

Re: invlpg vs. complete flush

Post by earlz »

Brendan wrote:Hi,
earlz wrote:Ok, well that makes a lot of sense.. but still doesn't quite answer my question. In that small case, it would be fastest to just invlpg the 2 page entries corresponding to the code and data. But what exactly is the threshold? When is it more efficient to completely flush the TLB rather than using invlpg?
This is a hard thing to estimate - a fast CPU with slow RAM isn't the same as a slow CPU with fast RAM, and it would depend on access patterns, TLB contents and TLB size.
earlz wrote:Also, I am opting to not use the global feature. I had rather my OS be able to run on 386s...
80386 doesn't support the INVLPG instruction either.

It's probably a good idea to find out which features the CPU supports, and use INVLPG and global pages if they are supported. For modern CPUs (where CPUs are a lot faster than RAM and the TLB is larger) INVLPG and global pages are a lot more important...


Cheers,

Brendan
I don't intend it running on a 386.. so cpuid should work for that.. and I'll consider such a hybrid solution..
Post Reply