Hi,
hope this is the right place to ask for the following (same question has been posted on other forums)
Consider a processor/cpu with support for Secondary Level address translation (SLAT) technology (Intel EPT/AMD RVI). TLB caching is used to improve translation performance especially when SLAT is enabled.
Now consider the case in which guest page size is different from host page size (just to fix ideas suppose guest page size is 4KB while (SLAT) host page size is 2MB). In this scenario to support guest virtual -> host physical translation I guess TLB entries need to cache both levels of translation: guest virtual -> guest physical (GVA->GPA) and guest physical -> host physical (GPA->HPA). Otherwise, using different page sizes, how can be implemented the lookup in TLB to match the direct GVA->HPA mapping?
thanks.
Secondary Level address translation (SLAT) TLB structure
Re: Secondary Level address translation (SLAT) TLB structure
The processor TLB doesn't keep GVA->GPA mappings ever, it holds only GVA->HPA mappings.
The final page size is determined as min of two, i.e. in your example the final translation will be saved in min(2M,4K)=4K TLB entry.
The page miss handler (which serves TLB misses and does the actual page walks) also has its own intermediate walk level caches like PML4, PDPTE and PDE.
There is separate cache for regular and EPT translations so PML4 caches actually keeps GVA->guest PML4 mapping while EPML4 cache keeps GPA->EPT PML4 mapping.
These are only translation caches which keep mappings not to HPA directly and as you can see page size is not relevant for these caches.
Stanislav
The final page size is determined as min of two, i.e. in your example the final translation will be saved in min(2M,4K)=4K TLB entry.
The page miss handler (which serves TLB misses and does the actual page walks) also has its own intermediate walk level caches like PML4, PDPTE and PDE.
There is separate cache for regular and EPT translations so PML4 caches actually keeps GVA->guest PML4 mapping while EPML4 cache keeps GPA->EPT PML4 mapping.
These are only translation caches which keep mappings not to HPA directly and as you can see page size is not relevant for these caches.
Stanislav
Re: Secondary Level address translation (SLAT) TLB structure
Thanks Stanislav
my question was related to the benefit in term of less frequent TLB miss when employing host level large page (2MB) to back guest OS page.
In the previous example (4KB and 2MB guest and host page size respectively) considering that the number of available entries in TLB is fixed, I do not see any benefit. If I understand correctly we can have a benefit (from TLB miss point of view) just when guest OS and hypervisor (VMM) use the same page size (in the example just when guest OS employs itself large pages - 2MB)
Does it sound correct ?
my question was related to the benefit in term of less frequent TLB miss when employing host level large page (2MB) to back guest OS page.
In the previous example (4KB and 2MB guest and host page size respectively) considering that the number of available entries in TLB is fixed, I do not see any benefit. If I understand correctly we can have a benefit (from TLB miss point of view) just when guest OS and hypervisor (VMM) use the same page size (in the example just when guest OS employs itself large pages - 2MB)
Does it sound correct ?
Re: Secondary Level address translation (SLAT) TLB structure
If speaking about TLB only - you are right.
But in case of TLB miss large page walk is 1-level shorter and this also would have some benefits.
BTW, some observation too - amount of large page entries cached in the TLB is much smaller usually.
For example on early Core processors large pages were not cached in the 2nd level TLB which is much bigger than 1st level.
So if accesses are sparse enough and don't utilize locality of 2M page - it would be even better to use 4K pages also for TLB capacity reasons.
Stanislav
But in case of TLB miss large page walk is 1-level shorter and this also would have some benefits.
BTW, some observation too - amount of large page entries cached in the TLB is much smaller usually.
For example on early Core processors large pages were not cached in the 2nd level TLB which is much bigger than 1st level.
So if accesses are sparse enough and don't utilize locality of 2M page - it would be even better to use 4K pages also for TLB capacity reasons.
Stanislav
Re: Secondary Level address translation (SLAT) TLB structure
just to make sure I got it right
thanks
the benefits here - due to the 1-level shorter page table walk - apply to each of the two translation level (GVA->GPA and GPA->HPA), right ?stlw wrote:If speaking about TLB only - you are right.
But in case of TLB miss large page walk is 1-level shorter and this also would have some benefits.
here -in conditions of sparse enough memory accesses - the benefit is due just to the higher number of 4KB TLB entries available on 1nd and 2nd level TLB (as opposed when min (guest page size, host page size) = 2 MB), right ?stlw wrote: BTW, some observation too - amount of large page entries cached in the TLB is much smaller usually.
For example on early Core processors large pages were not cached in the 2nd level TLB which is much bigger than 1st level.
So if accesses are sparse enough and don't utilize locality of 2M page - it would be even better to use 4K pages also for TLB capacity reasons.
thanks
Re: Secondary Level address translation (SLAT) TLB structure
the benefits here - due to the 1-level shorter page table walk - apply to each of the two translation level (GVA->GPA and GPA->HPA), right ?stlw wrote:If speaking about TLB only - you are right.
But in case of TLB miss large page walk is 1-level shorter and this also would have some benefits.
Of course depends there do you have 2Mb pages.
here -in conditions of sparse enough memory accesses - the benefit is due just to the higher number of 4KB TLB entries available on 1nd and 2nd level TLB (as opposed when min (guest page size, host page size) = 2 MB), right ?stlw wrote: BTW, some observation too - amount of large page entries cached in the TLB is much smaller usually.
For example on early Core processors large pages were not cached in the 2nd level TLB which is much bigger than 1st level.
So if accesses are sparse enough and don't utilize locality of 2M page - it would be even better to use 4K pages also for TLB capacity reasons.
Exactly.
Re: Secondary Level address translation (SLAT) TLB structure
Of course depends where do you have 2Mb pages.stlw wrote:the benefits here - due to the 1-level shorter page table walk - apply to each of the two translation level (GVA->GPA and GPA->HPA), right ?stlw wrote:If speaking about TLB only - you are right.
But in case of TLB miss large page walk is 1-level shorter and this also would have some benefits.
Exactly.here -in conditions of sparse enough memory accesses - the benefit is due just to the higher number of 4KB TLB entries available on 1nd and 2nd level TLB (as opposed when min (guest page size, host page size) = 2 MB), right ?stlw wrote: BTW, some observation too - amount of large page entries cached in the TLB is much smaller usually.
For example on early Core processors large pages were not cached in the 2nd level TLB which is much bigger than 1st level.
So if accesses are sparse enough and don't utilize locality of 2M page - it would be even better to use 4K pages also for TLB capacity reasons.