An efficient way to fetch a single instruction from cache

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
limp
Member
Member
Posts: 90
Joined: Fri Jun 12, 2009 7:18 am

An efficient way to fetch a single instruction from cache

Post by limp »

Hi all,

I am running some benchmark tests and for that, I want to have only the RDTSC in cache and nothing else. So, only this instruction will be fetched from cache and all the others from the main memory. Another requirement is that this applies to both cores of my Intel Atom 330 target.

One way I thought about doing it is to do the following from the BSP, (before it boots the AP):
- Invalidate cache lines using WBINVD
- Set the cache to "Normal cache mode"
- Execute a "RDTSC" instruction
- Set the cache to "No-fill mode"

After the end of the above procedure, I will just keep the cache enabled as it is by default.
My understanding says that only the RDTSC instruction will be cached by both cores and everything else from the main memory.

What do you guys think? Do you think that this will work as excepted and if yes, do you have a more efficient way of doing it in mind?

I look forward to your comments/suggestions.

Regards,
limp
Nable
Member
Member
Posts: 453
Joined: Tue Nov 08, 2011 11:35 am

Re: An efficient way to fetch a single instruction from cach

Post by Nable »

CPU doesn't cache single instructions, cache consists of cache-lines.
You're doing it wrong, think twice about what you really want to achieve.
limp
Member
Member
Posts: 90
Joined: Fri Jun 12, 2009 7:18 am

Re: An efficient way to fetch a single instruction from cach

Post by limp »

Thanks for your reply Nable,
Nable wrote:CPU doesn't cache single instructions, cache consists of cache-lines.
You're doing it wrong, think twice about what you really want to achieve.
I know that the CPU caches cache-lines and what I want is to have only a cache-line cached which will contain the RDTSC instruction. When you just say "you're doing it wrong", you're not really helping. Which part seems wrong to you?
User avatar
Owen
Member
Member
Posts: 1700
Joined: Fri Jun 13, 2008 3:21 pm
Location: Cambridge, United Kingdom
Contact:

Re: An efficient way to fetch a single instruction from cach

Post by Owen »

You set the cache to "normal cache mode" and the CPU will issue a bunch of speculative prefetch requests for unpredictable addresses. Theres no way to say precisely "just have this address in cache"

I don't see what you're trying to do here, and I see no benchmark which requires this kind of setup. That said, if I wanted to do it, I would:
  • Disable the CPU caches
  • Execute a WBINVD to flush them completely
  • Completely rewrite the MTRRs to set the whole of RAM to non cache
  • Set one MTRR to cache the page of RAM which contained my RDTSCP
  • Re-enable the CPU caches
Still, I see no meaningful benchmark which will come of this
limp
Member
Member
Posts: 90
Joined: Fri Jun 12, 2009 7:18 am

Re: An efficient way to fetch a single instruction from cach

Post by limp »

Thanks for your reply Owen,

I assumed that by setting the cache to "No-fill mode", no speculative execution will take place.
Will your workaround guarantee that only the RDTSC is used or speculative execution prefetches may still occur?

By the way, I am not using paging so I guess that the size of the region that contains the RDTSC instruction can be quite small...do you happen to know the absolute minimum for an MTRR region?

Thanks in advance.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: An efficient way to fetch a single instruction from cach

Post by Brendan »

Hi,
limp wrote:I assumed that by setting the cache to "No-fill mode", no speculative execution will take place.
Will your workaround guarantee that only the RDTSC is used or speculative execution prefetches may still occur?
In theory, speculative execution would still take place. The only difference is that instructions would be fetched from RAM instead of being fetched from cache. However, I don't think Intel's Atom does very much speculative execution anyway (it's a very simple core designed for low power not high performance; where hyper-threading is used to try to hide stalls).
limp wrote:By the way, I am not using paging so I guess that the size of the region that contains the RDTSC instruction can be quite small...do you happen to know the absolute minimum for an MTRR region?
Minimum size for variable range MTRRs is 4096 bytes. The smallest fixed range MTRR is 16 KiB.

Also note that if you execute a lot of code (e.g. 1 million instructions) plus one RDTSC, then the time taken by the RDTSC is going to be negligible regardless of what you do. I'd consider executing a lot of code (e.g. ensure that the ratio of "RDTSC" to other instructions is tiny) and disable all caches without bothering with MTRRs at all.

Finally, Nable was entirely correct - whatever you think you're doing doesn't make any sense. Essentially you'd be using instruction fetch to benchmark RAM speed (and not benchmarking anything to do with CPU performance at all); and if you actually wanted to benchmark RAM speed properly there's far better (and much more accurate) ways to do that.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
limp
Member
Member
Posts: 90
Joined: Fri Jun 12, 2009 7:18 am

Re: An efficient way to fetch a single instruction from cach

Post by limp »

Hi Brendan,

Thanks for your reply.
Brendan wrote:
limp wrote:The only difference is that instructions would be fetched from RAM instead of being fetched from cache.
You mean that this will be the only difference if I have cache to "no-fill mode" rather than "normal cache mode"?

That is, if I have decalared all memory as uncached, apart from a single page that contains only my RDTSC instruction (and the rest of it is empty), if I configure the cache to "normal cache mode" then speculative prefetching will fetch from cache and if "no-fill mode" is enabled, it will ftech from RAM? Is that what you're saying here?

Thanks
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: An efficient way to fetch a single instruction from cach

Post by Brendan »

Hi,
limp wrote:
Brendan wrote:The only difference is that instructions would be fetched from RAM instead of being fetched from cache.
You mean that this will be the only difference if I have cache to "no-fill mode" rather than "normal cache mode"?
Yes. If the caches are enabled the CPU will use caches. If the caches are disabled (e.g. "no-fill mode" with any previous cache contents flushed via. WBINVD or something) then the CPU won't use caches.

There is no way to disable speculative execution (regardless of what you do with caches), because there's no sane reason to disable speculative execution.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Owen
Member
Member
Posts: 1700
Joined: Fri Jun 13, 2008 3:21 pm
Location: Cambridge, United Kingdom
Contact:

Re: An efficient way to fetch a single instruction from cach

Post by Owen »

Also note that disabling the instruction cache will have other unwanted effects - for example, on recent Intel CPUs, disabling the microcode trace cache; for most AMD CPUs, you're going to dramatically cut the instruction decode rate (because the ICache contains instruction boundary markers)
Post Reply