using TSC to measure time returns different values

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
giovanig
Member
Member
Posts: 29
Joined: Tue Mar 13, 2012 12:03 pm

using TSC to measure time returns different values

Post by giovanig »

Hello,

I am running our own OS. I want to measure OS overhead using the TSC on an intel i7-2600 multicore processor.

I have the following application:

Code: Select all

int main()
{
    for(int i = 0; i < REP; i++)
        run(i)
}

void run(int test)
{
    create threads
    join threads
    delete threads
    print the collected TSC values
}
I am measuring the overhead as follows:

Code: Select all

void Thread::sleep(..)
{
    //disables interrupts and gets a spin lock
    TSC::Time_Stamp start = TSC::time_stamp();
    //performs some operations
    _wakeup_tsc[_wakeup_counter] = TSC::time_stamp() - start;
    _wakeup_counter++;
   //enables interrupts and releases the spin lock
}

static TSC::Time_Stamp time_stamp() {
	Time_Stamp ts;
	ASMV("rdtsc" : "=A" (ts) : );
	return ts;
}
When I run the application with only 1 repetition, I got very low TSC values, as expected (there are only 15 threads in the system). If I change to 10 repetitions, for example, the TSC values increase a lot (about 10 times more even in the first loop).

I understand that if I change the repetitions, the compiler (g++) will change the memory addresses in the generated system image. Is there a reasonable explanation for this behavior? Maybe, something related to memory banks accesses? I cannot understand how the read TSC values change so much.

Best regards,
Giovani
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: using TSC to measure time returns different values

Post by Brendan »

Hi,
giovanig wrote:When I run the application with only 1 repetition, I got very low TSC values, as expected (there are only 15 threads in the system). If I change to 10 repetitions, for example, the TSC values increase a lot (about 10 times more even in the first loop).
When you measure how long it takes to execute "run()" 10 times you found that it takes about 10 times longer than just executing "run()" once? In similar news, scientists have discovered that 10 elephants are roughly 10 times heavier than a single elephant, it takes 10 times longer to walk 10 Km than it takes to walk 1 Km, and 10 pizzas cost about 10 times more than one pizza. ;)

I don't see what the problem is.

Were you expecting that caches would make the second, third, ..., tenth time faster (e.g. so that doing it 10 times only takes 8 times as long as doing it once)? If you were, then maybe a lot of the code/data (e.g. most of the kernel) is already in caches before your test starts.

Were you expecting that doing it 10 times would use more CPUs and be a lot faster than 10 times longer? In this case; the "join threads" part (where you wait for the created threads to stop) would prevent anything extra from happening in parallel.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
giovanig
Member
Member
Posts: 29
Joined: Tue Mar 13, 2012 12:03 pm

Re: using TSC to measure time returns different values

Post by giovanig »

Hi Brendan, thank you for your answer.

I guess I was not clear enough. I am not measuring how long the program takes to execute. Obviously, the time would be 10x slower. I am measuring how long some OS activities take to execute (time to choose a thread to run for example). These activities do not depend on the loop and they are executed with interrupts disabled and inside a mutual exclusion code protected by a spin lock.

What I would expect is a very similar time independently of the loop size. The problem is when I measure the time running the application with more than 1 iteration (I reset the internal counters before starting a new iteration), I got a TSC value that is extremely higher than executing the same application and the same TSC and Thread code with a loop with 1 iteration. There is no filesystem, all data is kept in memory buffers.

The only explanation I see, is that when I increase the loop size, the code and data of the OS (scheduling lists for example) and application are placed in another memory region and the time to access the data in this new position is higher, for some reason related to the architecture.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: using TSC to measure time returns different values

Post by Brendan »

Hi,
giovanig wrote:I guess I was not clear enough. I am not measuring how long the program takes to execute. Obviously, the time would be 10x slower. I am measuring how long some OS activities take to execute (time to choose a thread to run for example). These activities do not depend on the loop and they are executed with interrupts disabled and inside a mutual exclusion code protected by a spin lock.

What I would expect is a very similar time independently of the loop size. The problem is when I measure the time running the application with more than 1 iteration (I reset the internal counters before starting a new iteration), I got a TSC value that is extremely higher than executing the same application and the same TSC and Thread code with a loop with 1 iteration. There is no filesystem, all data is kept in memory buffers.

The only explanation I see, is that when I increase the loop size, the code and data of the OS (scheduling lists for example) and application are placed in another memory region and the time to access the data in this new position is higher, for some reason related to the architecture.
This sounds far too unlikely. More likely is that you're not measuring what you think your measuring; or there's serious flaws in your OS (e.g. code to choose a thread to run that wastes time checking threads that have been terminated), or something else is going on.

I'd suggest trying many different loop counts - e.g. 1, 10, 100, 1000, ..., 10000000. This should give you a better idea of the symptoms.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Post Reply