TLB quick question

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
dc0d32

TLB quick question

Post by dc0d32 »

is there no way i can save the contents of page TLB on a 32 bit intel machine? because i think it is really nonsense to flush out the TLB on change in CR3, afaik, which means after the new task comes in, TLB starts filling again.
bkilgore

Re:TLB quick question

Post by bkilgore »

prashant wrote: because i think it is really nonsense to flush out the TLB on change in CR3, afaik, which means after the new task comes in, TLB starts filling again.
It would be a very Bad Thing if the system did not flush the TLB on a CR3 change. Unless every single page was mapped the exact same way (in which case you shouldnt be changing the cr3 to begin with) you're eventually going to have a problem where you have a TLB hit for an address that was mapped differently between the old and new page directories. Yes, you'll probably have many addresses mapped the same (such as those in the kernel area) but if even a single address is mapped differently you need to flush the TLB so that you dont do an incorrect memory translation. The processor (rightfully) assumes that if you're changing the CR3 there must be something different between the two virtual memory spaces.
dc0d32

Re:TLB quick question

Post by dc0d32 »

i definitely know and understand the importance of page TLB cache flush. It _IS_ inevitable.

i just want to save n restore the cache context, instead of flushing it.
User avatar
gaf
Member
Member
Posts: 349
Joined: Thu Oct 21, 2004 11:00 pm
Location: Munich, Germany

Re:TLB quick question

Post by gaf »

Hello everybody
bkilgore wrote:Yes, you'll probably have many addresses mapped the same (such as those in the kernel area) but if even a single address is mapped differently you need to flush the TLB so that you dont do an incorrect memory translation.
From what I know you can use the global-flag to avoid that entries used by all address-spaces get flushed..
prashant wrote:I definitely know and understand the importance of page TLB cache flush. It _IS_ inevitable.
It's only inevitable on the x86 architecture - not by concept. Other systems use a tagged-tlb, in which each entry is assigned a task-id, so that flushes can be avoided altogether. Especially systems that often task-switch (?-kernels basically) may benefit from such a design.
On IA32 processors the behaviour can be emulated, to some extend, by using segmentation in conjuntion with the paging mechanism. You might have a look at this L4 paper if you're interested in details.
prashant wrote:I just want to save n restore the cache context, instead of flushing it.
How would storing and reloading the whole TLB be any faster than rebuilding it from the page-tables ?

regards,
gaf
nick8325
Member
Member
Posts: 200
Joined: Wed Oct 18, 2006 5:49 am

Re:TLB quick question

Post by nick8325 »

There's no way of doing it :(

But there are two things:

* Some versions of x86-64 (ones with Pacifica, IIRC, though I'm not sure if anyone's making that yet) have a tagged TLB. That means that each TLB entry has an address-space ID, and the processor ignores entries that have a different ID from the current one. So when you change address spaces, you can leave the old entries in and just change the ID. When you change the ID back, the old entries will still be there :)

* You can simulate a tagged TLB using segmentation. Google for small address spaces L4. You can put many processes in the same address space at a different base address, and to switch processes change the base and limit of CS so that it covers the process. That way each process will still think it's running in a flat address space.

EDIT: i see gaf got there before me, and gave a better answer :)
mystran

Re:TLB quick question

Post by mystran »

Yeah, it would be cool if you could save&restore TLBs. It would also mean that you could manually modify TLBs. And that would also mean you could forget about the PDE/PTE stuff if you want, just map kernel in a single context, and process all userspace faults by modifying TLB directly.

If you wanted to, ofcourse.

Some (especially RISC) processors do have such software TLBs. Unfortunately, not so for Intel.

Would be pretty cool though.
mystran

Re:TLB quick question

Post by mystran »

The L4 trick (also used in other systems, I think) is actually pretty nice scheme, especially in a microkernel, where you have lots of different small driver processes. If you keep (say) TCP/IP stack and network card, or filesystem and disk in separate processes, then you'd have awful amounts of context switching, so small address spaces are cool.

You can map one 2GB big address space, and use the remaining 2GB for kernel + small spaces, and essentially keep all the drivers (and other smallies) always in current context. If you limit the size of "small" to say 4MB, and use 1GB total for them (leaving 1GB for kernel's use), you can have 256 small processes always mapped in any context, right? Or set limit to 16MB, and you can still have 64 smallies.
dc0d32

Re:TLB quick question

Post by dc0d32 »

gaf wrote: From what I know you can use the global-flag to avoid that entries used by all address-spaces get flushed..
like the page tables for kernel mapped at the same address in all usermode processes?
nick8325
Member
Member
Posts: 200
Joined: Wed Oct 18, 2006 5:49 am

Re:TLB quick question

Post by nick8325 »

prashant wrote: like the page tables for kernel mapped at the same address in all usermode processes?
Yep. It's not supported on all processors though (Pentium Pro upwards IIRC, maybe). You can detect if it's supported using CPUID - http://sandpile.org/ia32/cpuid.htm gives it as bit 13 of EDX (PGE).
mystran

Re:TLB quick question

Post by mystran »

Two things to remember:

1) Global flag is only a hint: what is does is tell processor that this page is global, so the page is not flushed from TLB when CR3 is changed. The page will still be subject to normal TLB replacement, so you still need the same information in page tables.

2) I haven't seen/heard any problems in setting the global flag on processors that don't support it, so I don't think detecting it is really necessary. If somebody disagrees on this point, please speak.

There's a related trick though: map kernel using 4MB large pages set to global. Those have a separate set of TLBs. So if all application code is using only 4kB pages, then the large global TLBs holding kernel page entries will never get replaced, and you never take TLB misses for kernel.

I personally use this for kernel image (code, data, bss sections), but my kernel heap is mapped normally. Since physical memory aliasing is allowed, I don't even use all of the 4MB region, and unused parts of it are freed as normal 4kB pages in memory manager initialization.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re:TLB quick question

Post by Brendan »

Hi,
mystran wrote:There's a related trick though: map kernel using 4MB large pages set to global. Those have a separate set of TLBs. So if all application code is using only 4kB pages, then the large global TLBs holding kernel page entries will never get replaced, and you never take TLB misses for kernel.
That depends on which CPU - newer CPUs share the TLBs for both 4 KB pages and large pages.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
mystran

Re:TLB quick question

Post by mystran »

That depends on which CPU - newer CPUs share the TLBs for both 4 KB pages and large pages.
Hmmh. How new? I must have missed something, then...

This sounds pretty unfortunate.

Not that it matters much: it's still good idea to map the kernel into a large page, if the TLB logic is intelligent enough to only use a single TLB entry per large page; that'll still reduce poisoning when switching to kernel.
ti_mo_n

Re:TLB quick question

Post by ti_mo_n »

mystran wrote:Hmmh. How new? I must have missed something, then...
Pentium4 and Xeon, for example. They have only 1 type of TLB entries (for 4K-pages). Large pages get split into 4K-pages.
mystran

Re:TLB quick question

Post by mystran »

ti_mo_n wrote: Pentium4 and Xeon, for example. They have only 1 type of TLB entries (for 4K-pages). Large pages get split into 4K-pages.
Btw, does this also mean that you have to invalidate each 4k part of a large global page separately?
ti_mo_n

Re:TLB quick question

Post by ti_mo_n »

No, they're being handled internally. The "G" (global) flag is being propagated, also.

At least according to "The Manuals"[sup]tm[/sup], that is ;D

Large pages are still cool, because they save a bit of RAM (one 2MB/4MB entry, instead of 512/1024 4k-entries).
Post Reply