Page 1 of 1

A question regarding RDTSC operation

Posted: Fri Oct 12, 2012 6:08 am
by limp
Hi all,

I have a question regarding the way the processor acts when a RDTSC is used. I would like to do some timing measurements using RDTSC when cache is enabled and vice versa but I want to find out the best way in order to make the interference that I will get from reading the TSC in these 2 cases pretty much the same in both.

That is, if cache is enabled, RDTSC will be fetched from cache and so its interference will be much less than when it's fetched from main memory in the measurements with cache disabled.

So, my question is:

When cache is disabled, does the processor fetch the RDTSC instruction from main memory or is this avoided because the RDTSC is hard-coded to the processor?

In other words, which one of the following is true when a RDTSC is executed (with cache disabled) ?:

1)
- Fetch RDTSC instruction from main memory
- Read TSC from CPU register

OR

2)
- No need to fetch, read directly from CPU regsiter

Thanks in advance for your help.

P.S. If you guys want to share a technique that can help me achieve what I am looking for, I would really appreciate it.

Re: A question regarding RDTSC operation

Posted: Fri Oct 12, 2012 3:12 pm
by Combuster
You making the very big error of thinking that execution patterns magically change because a specific opcode is there. Specifically, if it's not allowed to cache code, how may it suddenly start caching because an RDTSC instruction appears?


The fundamentals of decoding hasn't quite changed over time. Typically the processor maintains a bit of memory that was once called the prefetch queue. If the processor didn't need the bus for a data cycle, it would instead try to add bytes somewhere ahead of the instruction pointer to this queue. To maximize throughput, a full bus-width of data was pulled in at once. Instructions can span several bytes and lack any alignment so several such reads may need to be performed to get one whole instruction in so that it might get past the decoding stage - try that with a 16-bit bus and a long instruction. On the upside, one read might be able to pull in several instructions at once. Trying it with caching disabled will simply not store code data in the cache but fetch blocks of (partial) instructions from main memory only, as well as not having access to a dedicated code bus. All in all, adding an RDTSC in such a situation means you roughly lose an eighth of a memory cycle on code.

Re: A question regarding RDTSC operation

Posted: Fri Oct 12, 2012 3:41 pm
by Brendan
Hi,

I'd also add that for accurate results you want to measure "nothing" (the cost of measuring with RDTSC and nothing else) and then measure "something" (the cost of measuring with RDTSC plus the cost of doing some other work). The cost of doing the "something" is "(RDTSC_cost + something_cost) - RDTSC_cost".

This way it doesn't matter how long RDTSC itself takes.

Of course you'd have to do each pair of tests twice - measure nothing with caches enabled, measure something with caches enabled, and calculate the cost of something with caches enabled; then measure nothing with caches disabled, measure something with caches disabled, and calculate the cost of something with caches disabled.

You'd also want to do each test many times and discard any dodgy results. For example:

Code: Select all

    enable_caches();

    for(i = 1; i < 10000; i++) {
        cost = measure_nothing();
        if(minimum_nothing_cost > cost) minimum_nothing_cost = cost;
    }
    for(i = 1; i < 10000; i++) {
        cost = measure_something();
        if(cost < minimum_nothing_cost) {
            /* "minimum_nothing_cost" must be dodgy (!) */
        } else {
            cost -= minimum_nothing_cost;    // Cost is the actual cost of "something" alone
            if(minimum_something_cost > cost) minimum_something_cost = cost;
        }
    }
    printf("%u with caches enabled\n", minimum_something_cost);

    disable_caches();

    for(i = 1; i < 10000; i++) {
        cost = measure_nothing();
        if(minimum_nothing_cost > cost) minimum_nothing_cost = cost;
    }
    for(i = 1; i < 10000; i++) {
        cost = measure_something();
        if(cost < minimum_nothing_cost) {
            /* "minimum_nothing_cost" must be dodgy (!) */
        } else {
            cost -= minimum_nothing_cost;    // Cost is the actual cost of "something" alone
            if(minimum_something_cost > cost) minimum_something_cost = cost;
        }
    }
    printf("%u with caches disabled\n", minimum_something_cost);

Cheers,

Brendan