Page 1 of 1

Accurate timing measurements with HPET

Posted: Sun Aug 29, 2010 8:44 am
by limp
Hi all,

I am using HPET for taking some timing measurements on an Intel Celeron processor. My question is this: Should I issue a CPUID instruction before reading the current value of the timer (just like you do when you readiong the TSC) in order to have accurate measurements or it doesn't matter? My processor has OoOE (Out-of-Order-Execution) so I guess that I should issue a CPUID in this case.

Since I read the current value of the timer from a memory mapped location (i.e. I am not issuing a serializing instruction), I guess it is possible that some of the instructions which follow it have already been issued before the timer is read, or existing instructions have not yet completed, thus resulting in an inaccurate measurement of the time interval.

What do you guys think?

Thanks in advance.

Re: Accurate timing measurements with HPET

Posted: Sun Aug 29, 2010 12:42 pm
by Brendan
Hi,
limp wrote:What do you guys think?
HPET is typically running at about 10 Mhz. This means (assuming it's exactly 10 MHz), if you read 123456 from the HPET counter then 12345600.00 ns may have passed since the counter was zeroed, or 12345699.99 ns may have passed, or anything inbetween, and you won't know exactly. The "best case" accuracy would be +/- 100 ns.

There's other delays too, like the latency of the front-side bus, PCI controller and PCI bus (which would also be effected by other load - e.g. if the PCI bus is going flat out copying data to/from disk and network, then your attempt to read the HPET counter may take longer); and maybe even delays caused by things like your IRQ handlers. Because of this it'd probably be safer to assume you can't achieve "best case" accuracy, and the accuracy you get in practice might be more like +/- 250 ns.

CPUs are typically running several hundred times faster than HPET. For example, in 250 ns the TSC of a 2 GHz CPU would increase by 500 cycles, and (assuming the CPU averages about 2 instructions per cycle) the CPU might retire 1000 instructions in this time.

I guess what I'm saying is that if using a serialising instruction does improve accuracy, then because the HPET counter is "relatively less accurate", any improvement would probably be insignificant anyway.


Cheers,

Brendan

Re: Accurate timing measurements with HPET

Posted: Sun Jan 16, 2011 7:45 am
by limp
Hi and sorry for re-opening this thread after so long but I have some aditional questions:
Brendan wrote: CPUs are typically running several hundred times faster than HPET. For example, in 250 ns the TSC of a 2 GHz CPU would increase by 500 cycles, and (assuming the CPU averages about 2 instructions per cycle) the CPU might retire 1000 instructions in this time.
Brendan
I know but my CPU doesn't have an invarient TSC so it can't be used for getting accurate timing measurements. I disovered some other instructions like MFENCE which I think is a much more beneficial way for serialising. So by issuing a MFENCE instruction before reading HPET, I think I can get more accurate measurements that by issuing CPUID or by not issuing a serialising instruction at all.
Brendan wrote: I guess what I'm saying is that if using a serialising instruction does improve accuracy, then because the HPET counter is "relatively less accurate", any improvement would probably be insignificant anyway.
Brendan
So what's the most accurate way of getting timining measurements on x86? Is it by using a PCI DAQ card? You don't get the PCI bus latency on this case as well? Any comments/suggestions are much appreciated.

Best regards

Re: Accurate timing measurements with HPET

Posted: Sun Jan 16, 2011 1:21 pm
by Combuster
You can time the timing code itself, so you can correct for any deviation caused by the timing code itself.

Re: Accurate timing measurements with HPET

Posted: Mon Jan 17, 2011 3:53 am
by Brendan
Hi,
limp wrote:I know but my CPU doesn't have an invarient TSC so it can't be used for getting accurate timing measurements. I disovered some other instructions like MFENCE which I think is a much more beneficial way for serialising. So by issuing a MFENCE instruction before reading HPET, I think I can get more accurate measurements that by issuing CPUID or by not issuing a serialising instruction at all.
First, what do you meant by "accurate"? Do you mean "precise" (e.g. like a counter that counts 1 nanosecond ticks, that may not be very good at tracking real time) or do you actually mean "accurate" (e.g. something that keeps track of real time without drift, that may not be very precise)?

If you need "accurate", then something like the (once per second) IRQ from the RTC/CMOS combined with a drift adjustment (potentially derived from NTP) is as accurate as you can get. If you need "precise" then the TSC is as precise as you can get (followed in order by the local APIC timer's count, then HPET, then PIT).

If you need both "accurate" and "precise", then you need to combine timers. For example, use the TSC (for its precision) and also use the RTC/CMOS (with drift adjustment) to regularly correct the TSC and keep it calibrated to real time.

Next, what are you doing with the time value? Imagine a perfect "get_current_time()" function - you call it, it gets the exact time, then a task switch or IRQ occurs before you actually use the value you got. To avoid that you'd need to disable task switching and IRQs until after you've used the value you get from the "get_current_time()" function. Now think about Out-of-Order-Execution - you'd need to serialise before you get the current time and disable Out-of-Order-Execution until after you do something with the value you get, and maybe account for the time spent after you get the exact time but before it was used. Of course this would be insane. If you can avoid worrying about Out-of-Order-Execution after you've got the time, then you can also stop worrying Out-of-Order-Execution while you are getting the time.

In practice, there are no sane cases where you must have the exact time, and therefore there's no cases where there's a reason to care about Out-of-Order-Execution while you are getting the time.

Most OSs actually use something like "10 ms between ticks" without any problem; but if "100 ms between ticks" isn't precise enough then it's likely that you shouldn't be using time at all.


Cheers,

Brendan