OSDev.org

Posted: **Sat Apr 18, 2009 9:30 am**

Hey all,

Here's a quick question regarding what happens to the look-ahead caching when paging is enabled.
Lets say we have physical memory laid out as follows:

| free | used | free |
| 128kb | 64kb | 128kb |

and in a process we request 256kb of memory, so we page it in from the 2 128kb blocks sequentially from 0 - 256kb (for example).
Lets say that memory was allocated and used for a single 256kb array that we scan through .. normally (assuming no paging) that CPU would cache ahead
as we scan through that array/memory in a forward direction. Now with paging, does the CPU cache ahead from the physical memory based on the paging values
IE: paging in the 2 blocks with a physical gap in them wouldn't cause any cache stall, or does it still look at the actual physical memory layout to cache ahead, thus causing a stall in the middle of the array?

My logic tells me the cpu is smart enough to cache based on the virtual -> linear lookup seing as it knows paging is enabled... but I couldn't find a direct answer to this in the manuals.

Posted: **Sat Apr 18, 2009 9:43 am**

You might be getting caching confused with prefetching.

For caching, when you read even a single byte from memory, the CPU will load the entire cache line to which that byte belongs. So if you're reading sequentially, the pattern will look like:

miss, hit, hit, hit, ..., miss, hit, hit, hit, ...

Cache lines are aligned such that they won't cross page boundaries, so it shouldn't matter whether caching is done based on physical or virtual addresses. AFAIK on x86 caching is based on physical addresses, but this is not always true (e.g. -- I think the Pentium 4's trace cache used virtual addresses... not sure if I remember correctly). In any case, it shouldn't matter from the point of view of paging.

Prefetching happens a little "closer" to the CPU than caching, and I'm pretty sure it's based on virtual addresses when paging is enabled, because I think prefetching can easily cross page boundaries (e.g. -- fetching the next instruction which happens to start on the next page). Otherwise, the principle of locality wouldn't really apply and performance would be terrible.

Posted: **Sat Apr 18, 2009 9:45 am**

johnsa wrote:does the CPU cache ahead from the physical memory based on the paging values
IE: paging in the 2 blocks with a physical gap in them wouldn't cause any cache stall, or does it still look at the actual physical memory layout to cache ahead, thus causing a stall in the middle of the array?

As far as I know there are caches both before and after the virtual-to-physical address translation (i.e. caching blocks at virtual addresses and caching blocks at physical addresses). Just think a moment about that: Which cache (L1/L2/...) should do what and why? ... So the answer is: You don't know. You have to detect it (if that's possible...).

--TS

Posted: **Sat Apr 18, 2009 9:51 am**

I think the question mainly deals with the stream detector - that thing that detects serial memory accesses and then speculatively bursts lots of memory into the cache just before it is needed. (i.e. lines of memory that aren't being referenced anywhere in the pipeline)

I know it exists, but I can't tell from the top of my head whether it works on virtual or physical addresses, sorry

Posted: **Sat Apr 18, 2009 1:15 pm**

Its an interesting one.. Combuster is right, there is specifically a component of the cpu designed to detect that sort of memory access pattern and optimize prefetch/cache loading of data ahead. If that were based on physical addressing rather than VMA, it would be nice to know as even if you plan to use a full paging system in your OS, I'd have an option on the allocator to provide better performance by not splitting memory allocation requests across non contiguous blocks.. memory is cheap and plentiful especially in 64bit

I can afford to waste a little bit here and there due to fragmentation for performance sake.

Posted: **Sat Apr 18, 2009 5:59 pm**

Hi,

The L1/L2/L3 caches (at least on Intel CPUs) operate on physical addresses only, and are completely unaware of linear addresses or paging.

The hardware prefetcher (at least on Netburst/Pentium 4) detects sequential cache line accesses (in both directions), and is capable of handling several streams at once, but it will not cross a page boundary (even if paging is disabled), and takes several accesses before it will begin prefetching.

For example, if you're reading dwords sequentially starting from physical address 0x00000000 (with paging disable), then it might go like this:

Code: Select all

0x00000000: Cache miss  - CPU fetches the cache line
0x00000004 to 0x0000003C: Cache hits (same cache line as last fetch)
0x00000040: Cache miss  - CPU fetches the cache line
0x00000044 to 0x0000007C: Cache hits (same cache line as last fetch)
0x00000080: Cache miss  - CPU fetches the cache line. Hardware prefetcher notices sequential accesses.
0x00000084 to 0x000000CC: Cache hits (same cache line as last fetch)
0x000000D0: Cache hit because hardware prefetcher has prefetched the cache line
0x000000D4 to 0x00000FFC: Cache hits (either prefetched, or same cache line as last prefetch)
0x00001000: Cache miss, because the hardware prefetcher stopped caring (crossed a page boundary)
0x00001004 to 0x0000103C: Cache hits (same cache line as last fetch)
0x00001040: Cache miss  - CPU fetches the cache line
0x00001044 to 0x0000107C: Cache hits (same cache line as last fetch)
0x00001080: Cache miss  - CPU fetches the cache line. Hardware prefetcher notices sequential accesses again.
0x00001084 to 0x000010CC: Cache hits (same cache line as last fetch)
0x000010D0: Cache hit because hardware prefetcher has prefetched the cache line

In this case you get 3 cache misses per page, and 61 hardware prefetches per page.

For certain benchmarks, CPU manufacturers disable hardware prefetching and do the prefetching manually. This helps to avoid those 3 cache misses per page, and also helps to avoid prefetching the cache line after the last cache line (that isn't needed) - e.g. if you stop reading in the middle of a page, then the hardware prefetcher will keep going and prefetch one extra cache line you don't need.

The hardware pretecher stops on page boundaries because different pages may have different cache types. For example, if you read data sequentially from 0x0009F000 to 0x0009FFFC (write-back cache type) then you don't want the hardware prefetcher to prefetch from 0x000A0000 (which should be uncached).

Cheers,

Brendan

Posted: **Sun Apr 19, 2009 5:36 pm**

Brendan's becoming annoying

Posted: **Sun Apr 19, 2009 7:41 pm**

How on earth can you say that? If we could somehow convince Brendan to let us download his brain onto a web server, the Wiki would be finished.

Posted: **Sun Apr 19, 2009 7:46 pm**

Lol

I obviously don't think that way about him...

OSDev.org

Paging and Cache Look Ahead

Paging and Cache Look Ahead

Re: Paging and Cache Look Ahead

Re: Paging and Cache Look Ahead

Re: Paging and Cache Look Ahead

Re: Paging and Cache Look Ahead

Re: Paging and Cache Look Ahead

Re: Paging and Cache Look Ahead

Re: Paging and Cache Look Ahead

Re: Paging and Cache Look Ahead