A question regarding RDTSC operation

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
limp
Member
Member
Posts: 90
Joined: Fri Jun 12, 2009 7:18 am

A question regarding RDTSC operation

Post by limp »

Hi all,

I have a question regarding the way the processor acts when a RDTSC is used. I would like to do some timing measurements using RDTSC when cache is enabled and vice versa but I want to find out the best way in order to make the interference that I will get from reading the TSC in these 2 cases pretty much the same in both.

That is, if cache is enabled, RDTSC will be fetched from cache and so its interference will be much less than when it's fetched from main memory in the measurements with cache disabled.

So, my question is:

When cache is disabled, does the processor fetch the RDTSC instruction from main memory or is this avoided because the RDTSC is hard-coded to the processor?

In other words, which one of the following is true when a RDTSC is executed (with cache disabled) ?:

1)
- Fetch RDTSC instruction from main memory
- Read TSC from CPU register

OR

2)
- No need to fetch, read directly from CPU regsiter

Thanks in advance for your help.

P.S. If you guys want to share a technique that can help me achieve what I am looking for, I would really appreciate it.
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: A question regarding RDTSC operation

Post by Combuster »

You making the very big error of thinking that execution patterns magically change because a specific opcode is there. Specifically, if it's not allowed to cache code, how may it suddenly start caching because an RDTSC instruction appears?


The fundamentals of decoding hasn't quite changed over time. Typically the processor maintains a bit of memory that was once called the prefetch queue. If the processor didn't need the bus for a data cycle, it would instead try to add bytes somewhere ahead of the instruction pointer to this queue. To maximize throughput, a full bus-width of data was pulled in at once. Instructions can span several bytes and lack any alignment so several such reads may need to be performed to get one whole instruction in so that it might get past the decoding stage - try that with a 16-bit bus and a long instruction. On the upside, one read might be able to pull in several instructions at once. Trying it with caching disabled will simply not store code data in the cache but fetch blocks of (partial) instructions from main memory only, as well as not having access to a dedicated code bus. All in all, adding an RDTSC in such a situation means you roughly lose an eighth of a memory cycle on code.
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: A question regarding RDTSC operation

Post by Brendan »

Hi,

I'd also add that for accurate results you want to measure "nothing" (the cost of measuring with RDTSC and nothing else) and then measure "something" (the cost of measuring with RDTSC plus the cost of doing some other work). The cost of doing the "something" is "(RDTSC_cost + something_cost) - RDTSC_cost".

This way it doesn't matter how long RDTSC itself takes.

Of course you'd have to do each pair of tests twice - measure nothing with caches enabled, measure something with caches enabled, and calculate the cost of something with caches enabled; then measure nothing with caches disabled, measure something with caches disabled, and calculate the cost of something with caches disabled.

You'd also want to do each test many times and discard any dodgy results. For example:

Code: Select all

    enable_caches();

    for(i = 1; i < 10000; i++) {
        cost = measure_nothing();
        if(minimum_nothing_cost > cost) minimum_nothing_cost = cost;
    }
    for(i = 1; i < 10000; i++) {
        cost = measure_something();
        if(cost < minimum_nothing_cost) {
            /* "minimum_nothing_cost" must be dodgy (!) */
        } else {
            cost -= minimum_nothing_cost;    // Cost is the actual cost of "something" alone
            if(minimum_something_cost > cost) minimum_something_cost = cost;
        }
    }
    printf("%u with caches enabled\n", minimum_something_cost);

    disable_caches();

    for(i = 1; i < 10000; i++) {
        cost = measure_nothing();
        if(minimum_nothing_cost > cost) minimum_nothing_cost = cost;
    }
    for(i = 1; i < 10000; i++) {
        cost = measure_something();
        if(cost < minimum_nothing_cost) {
            /* "minimum_nothing_cost" must be dodgy (!) */
        } else {
            cost -= minimum_nothing_cost;    // Cost is the actual cost of "something" alone
            if(minimum_something_cost > cost) minimum_something_cost = cost;
        }
    }
    printf("%u with caches disabled\n", minimum_something_cost);

Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Post Reply