Hi all,
I am running some benchmark tests and for that, I want to have only the RDTSC in cache and nothing else. So, only this instruction will be fetched from cache and all the others from the main memory. Another requirement is that this applies to both cores of my Intel Atom 330 target.
One way I thought about doing it is to do the following from the BSP, (before it boots the AP):
- Invalidate cache lines using WBINVD
- Set the cache to "Normal cache mode"
- Execute a "RDTSC" instruction
- Set the cache to "No-fill mode"
After the end of the above procedure, I will just keep the cache enabled as it is by default.
My understanding says that only the RDTSC instruction will be cached by both cores and everything else from the main memory.
What do you guys think? Do you think that this will work as excepted and if yes, do you have a more efficient way of doing it in mind?
I look forward to your comments/suggestions.
Regards,
limp
An efficient way to fetch a single instruction from cache
Re: An efficient way to fetch a single instruction from cach
CPU doesn't cache single instructions, cache consists of cache-lines.
You're doing it wrong, think twice about what you really want to achieve.
You're doing it wrong, think twice about what you really want to achieve.
Re: An efficient way to fetch a single instruction from cach
Thanks for your reply Nable,
I know that the CPU caches cache-lines and what I want is to have only a cache-line cached which will contain the RDTSC instruction. When you just say "you're doing it wrong", you're not really helping. Which part seems wrong to you?Nable wrote:CPU doesn't cache single instructions, cache consists of cache-lines.
You're doing it wrong, think twice about what you really want to achieve.
- Owen
- Member
- Posts: 1700
- Joined: Fri Jun 13, 2008 3:21 pm
- Location: Cambridge, United Kingdom
- Contact:
Re: An efficient way to fetch a single instruction from cach
You set the cache to "normal cache mode" and the CPU will issue a bunch of speculative prefetch requests for unpredictable addresses. Theres no way to say precisely "just have this address in cache"
I don't see what you're trying to do here, and I see no benchmark which requires this kind of setup. That said, if I wanted to do it, I would:
I don't see what you're trying to do here, and I see no benchmark which requires this kind of setup. That said, if I wanted to do it, I would:
- Disable the CPU caches
- Execute a WBINVD to flush them completely
- Completely rewrite the MTRRs to set the whole of RAM to non cache
- Set one MTRR to cache the page of RAM which contained my RDTSCP
- Re-enable the CPU caches
Re: An efficient way to fetch a single instruction from cach
Thanks for your reply Owen,
I assumed that by setting the cache to "No-fill mode", no speculative execution will take place.
Will your workaround guarantee that only the RDTSC is used or speculative execution prefetches may still occur?
By the way, I am not using paging so I guess that the size of the region that contains the RDTSC instruction can be quite small...do you happen to know the absolute minimum for an MTRR region?
Thanks in advance.
I assumed that by setting the cache to "No-fill mode", no speculative execution will take place.
Will your workaround guarantee that only the RDTSC is used or speculative execution prefetches may still occur?
By the way, I am not using paging so I guess that the size of the region that contains the RDTSC instruction can be quite small...do you happen to know the absolute minimum for an MTRR region?
Thanks in advance.
Re: An efficient way to fetch a single instruction from cach
Hi,
Also note that if you execute a lot of code (e.g. 1 million instructions) plus one RDTSC, then the time taken by the RDTSC is going to be negligible regardless of what you do. I'd consider executing a lot of code (e.g. ensure that the ratio of "RDTSC" to other instructions is tiny) and disable all caches without bothering with MTRRs at all.
Finally, Nable was entirely correct - whatever you think you're doing doesn't make any sense. Essentially you'd be using instruction fetch to benchmark RAM speed (and not benchmarking anything to do with CPU performance at all); and if you actually wanted to benchmark RAM speed properly there's far better (and much more accurate) ways to do that.
Cheers,
Brendan
In theory, speculative execution would still take place. The only difference is that instructions would be fetched from RAM instead of being fetched from cache. However, I don't think Intel's Atom does very much speculative execution anyway (it's a very simple core designed for low power not high performance; where hyper-threading is used to try to hide stalls).limp wrote:I assumed that by setting the cache to "No-fill mode", no speculative execution will take place.
Will your workaround guarantee that only the RDTSC is used or speculative execution prefetches may still occur?
Minimum size for variable range MTRRs is 4096 bytes. The smallest fixed range MTRR is 16 KiB.limp wrote:By the way, I am not using paging so I guess that the size of the region that contains the RDTSC instruction can be quite small...do you happen to know the absolute minimum for an MTRR region?
Also note that if you execute a lot of code (e.g. 1 million instructions) plus one RDTSC, then the time taken by the RDTSC is going to be negligible regardless of what you do. I'd consider executing a lot of code (e.g. ensure that the ratio of "RDTSC" to other instructions is tiny) and disable all caches without bothering with MTRRs at all.
Finally, Nable was entirely correct - whatever you think you're doing doesn't make any sense. Essentially you'd be using instruction fetch to benchmark RAM speed (and not benchmarking anything to do with CPU performance at all); and if you actually wanted to benchmark RAM speed properly there's far better (and much more accurate) ways to do that.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: An efficient way to fetch a single instruction from cach
Hi Brendan,
Thanks for your reply.
That is, if I have decalared all memory as uncached, apart from a single page that contains only my RDTSC instruction (and the rest of it is empty), if I configure the cache to "normal cache mode" then speculative prefetching will fetch from cache and if "no-fill mode" is enabled, it will ftech from RAM? Is that what you're saying here?
Thanks
Thanks for your reply.
You mean that this will be the only difference if I have cache to "no-fill mode" rather than "normal cache mode"?Brendan wrote:limp wrote:The only difference is that instructions would be fetched from RAM instead of being fetched from cache.
That is, if I have decalared all memory as uncached, apart from a single page that contains only my RDTSC instruction (and the rest of it is empty), if I configure the cache to "normal cache mode" then speculative prefetching will fetch from cache and if "no-fill mode" is enabled, it will ftech from RAM? Is that what you're saying here?
Thanks
Re: An efficient way to fetch a single instruction from cach
Hi,
There is no way to disable speculative execution (regardless of what you do with caches), because there's no sane reason to disable speculative execution.
Cheers,
Brendan
Yes. If the caches are enabled the CPU will use caches. If the caches are disabled (e.g. "no-fill mode" with any previous cache contents flushed via. WBINVD or something) then the CPU won't use caches.limp wrote:You mean that this will be the only difference if I have cache to "no-fill mode" rather than "normal cache mode"?Brendan wrote:The only difference is that instructions would be fetched from RAM instead of being fetched from cache.
There is no way to disable speculative execution (regardless of what you do with caches), because there's no sane reason to disable speculative execution.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
- Owen
- Member
- Posts: 1700
- Joined: Fri Jun 13, 2008 3:21 pm
- Location: Cambridge, United Kingdom
- Contact:
Re: An efficient way to fetch a single instruction from cach
Also note that disabling the instruction cache will have other unwanted effects - for example, on recent Intel CPUs, disabling the microcode trace cache; for most AMD CPUs, you're going to dramatically cut the instruction decode rate (because the ICache contains instruction boundary markers)