Hi,
Ok, I was too lazy/busy to read all the previous replies and figure out what has been answered so far and what hasn't, so I covered everything asked. I apologise for any/all redundant answers.
cwillems wrote:- in general a TLB entry is created after a successful address translation. however, does this also happen if the resulting memory
is not accessible due to access protections (=> page fault)? I mean: the address translation itself worked properly ...
In general the CPU may cache any address translation. For all 80x86 CPUs this is likely to include translations that failed due to permission problems (e.g. if you write to a read-only page, then the CPU will remember the translation for the read-only page).
Intel CPUs (and I assume AMD CPUs too) will not remember address translations that result in "page not present"; however modern CPUs (with higher level TLB caches) may remember where to find the page directory or page directory pointer table, even if they don't remember that the page table itself refers to a "not present" page. In addition, other CPUs from other manufacturers (specifically old Cyrix CPUs) will remember that a page wasn't present.
cwillems wrote:- is an existing TLB entry invalidated if a page fault occurs for that address? Is there different behavior for a) no memory mapped
or b) an access protection occurs ?
A page fault won't cause any TLB invalidation. As an optimisation, some OS's do something called "lazy TLB invalidation" where they don't invalidate the TLB in some cases and then (if a page fault occurs due to stale data in a TLB) will invalidate the TLB in the page fault handler instead.
cwillems wrote:- besides the TLB entries: are the PDE/PTEs also stored (as regular data) in the L1/L2/LLC Caches? Or is there an additional cache
for the paging structures?
Yes and yes. The paging structures are also cached in the L1/L2/L3 data caches; and for modern CPUs there may be higher level TLB caches (e.g. the normal TLB entries that cache normal translations, plus a higher level TLB cache that remembers which page directory or page directory pointer table to use for different areas of the virtual address space). This means that (for example), if you access the virtual address 0x00000000 then the CPU might remember which physical page corresponds to virtual addresses from 0x00000000 to 0x00000FFF; and also remember which page directory corresponds to virtual addresses from 0x00000000 to 0x3FFFFFFF, so that if you then access the virtual address 0x22334455 it doesn't need to check the PLM4 and PDPT a second time (and only needs to look at the page directory and page table).
cwillems wrote:- if so: are there any circumstances under which the MMU flushes already cached PDE/PTE entries from the L1/L2/LLC caches?
CPU may flush anything from L1/L2/L3 (or from any TLB) whenever it feels like it. Most CPUs use (a variation of) a "least recently used" eviction strategy, where things that haven't been used recently are removed from the cache to make room for recently used things. This means that (for an example) a simple loop that reads every 64th byte of a large enough area (as large as the cache size) can completely fill the L1/L2/L3 data cache/s and all previous data will be evicted. This problem is called
cache pollution and it is the reason why modern CPUs have things like the CLFLUSH instruction (so that software can explicitly flush a cache line, to try to minimise cache pollution).
Note: Due to the way caches are designed, even with CLFLUSH the effects of cache pollution can't be entirely avoided - e.g. with an "8-way associative" cache, a loop like the one I described would wipe out 12.5% of the cache.
cwillems wrote: - is a paging structure cached under the same circumstances as TLB entries created, i.e. only if "present bit set" and also cached "on valid address translation but resulting access violation"?
No. L1/L2/L3 can cache anything the CPU touches, regardless of what it is (and regardless of whether it's a page table entry for a "not present" page or not). More specifically, what a CPU will cache in the L1/L2/L3 caches is only determined by access patterns, physical addresses and the MTRR/PAT settings.
cwillems wrote: - are there any public information regarding the specific implementation details of current Intel cpus (SandyBridge/Nehalem) about the size, associativity or replacement strartegy of those caches?
CPUID (for modern CPUs) will tell you the size, associativity and how many (logical) CPUs share each level of cache. For older CPUs you may only be able to determine size, and associativity. For even older CPUs you may not be able to determine anything (without using "vendorID
model" as an index into your own lookup table/s derived from hours of searching through datasheets). I don't think there's any way to determine the replacement strategy of any cache (but I'd assume "least recently used"). Also note that some CPUs use inclusive caches (mostly Intel) and some use exclusive caches (mostly AMD). For an inclusive cache, a specific piece of data may be in multiple caches at the same time (e.g. in the L1, L2 and L3); while for an exclusive cache a specific piece of data can only be in one cache (e.g. if something is in the L1 cache then it can't also be in the L2 or L3 cache).
Cheers,
Brendan