Hi,
bewing wrote:The smallest cache you have is the TLB.
It's small, but (for 4 KB pages) for a typical CPU with 8192 TLB entries those TLB entries cover 32 MB of linear address space, which is much larger than a "little" 2 MB L2 data cache will cover...
bewing wrote:When you switch from usermode to kernelmode to handle a syscall, if the kernel uses virtual memory, you will toast the entire TLB (or at least a significant fraction) in the process of handling the syscall.
No. if you switch from user-mode to kernel-mode then the kernel's code might use several TLB entries (but those TLB entries may have already been present in the TLB). If the kernel accesses many MB of data (which IMHO is extremely unlikely with a sane kernel) it would end up getting rid of all the "least recently used" user-mode TLB entries; or, if the kernel does a task switch the user-mode TLB entries will be flushed.
However, if the kernel disables paging all TLB entries will be flushed (including entries for "global" pages), regardless of whether or not a task switch is done. This would cause far worse performance problems than leaving paging enabled - a simple/fast kernel API function would cause a huge number of TLB misses to occur after it returns to user-mode.
bewing wrote:And it is completely unnecessary -- it is not the tiniest bit difficult to write a kernel that knows how to live inside the restrictions of physical memory.
For a toy kernel like Linux, I agree. When you start doing NUMA optimizations (or trying to do fault tolerance for faulty RAM) you can't assume that any specific area in the physical address space will be suitable, and something common like "the kernel's code starts at 0x00100000 in the physical address space" becomes far too restrictive.
bewing wrote:If you turn vmem off immediately "on receipt" of a syscall, then all the usermode cached vmem stuff never gets dumped out of the TLB.
Hehe - from Intel's manual, section 10.9. Invalidating the Translation Lookaside Buffers (TLBs):
"
The following operations invalidate all TLB entries, irrespective of the setting of the G flag:
* Asserting or de-asserting the FLUSH# pin.
* (Pentium 4, Intel Xeon, and P6 family processors only.) Writing to an MTRR (with a WRMSR instruction).
* Writing to control register CR0 to modify the PG or PE flag.
* (Pentium 4, Intel Xeon, and P6 family processors only.) Writing to control register CR4 to modify the PSE, PGE or PAE flag."
If you turn vmem off immediately "on receipt" of a syscall (which involves writing to control register CR0 to modify the PG flag), then you'll be completely flushing all TLB entries for every syscall.
proxy wrote:This is why the Global bit exists. Basically the entries marked global will not get flushed on CR3 write. This makes the User->Sys->User transition a lot less expensive.
Um, no - the global bit makes address space switches less expensive (e.g. process switches), not privilege level switches (CPL=3 -> CPL=0 -> CPL=3).
Cheers,
Brendan