profiling CPU speed and precise timing

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
ivannz

profiling CPU speed and precise timing

Post by ivannz »

hello everyone!

Are there any ways (peferably in integer arithmetics):
0) what is the possible maximum resloution of TSC?
1) to determine the ratio of TSC (if supported) to PIT ticks (at any possible HZ rates, for instance 100 ticks per second)?
2) to detecet the CPU speed by, so to say, "direct probing"?

And also how to account for the CPU's with power determined internal clock speeed (or just volatile) (for instance my notebook CPU speeed is degraded when the battery is low)?

Can the follownig code be used for the item #1 of the list above?

Code: Select all

%define FREQ      ( 1000 / 50 )
%define CLATCH      ( ( 1193180 + FREQ / 2 ) / FREQ )  ;; initializet to the PIT channel 0 for calibration
%define ITICKS      ( 40 )
%define SCALE      ( ( 1000000 / FREQ ) * ITICKS )
...
calibrate_RDTSC:
;; So our CPU is either 486 with CPUID (newer 486)
;; or 586+ with definite CUPID support (Pentiums and higher)
;; anyway check if RDTSC is supported
  mov eax, 0x00000001
  cpuid
  test dl, 000010000b   ;; test TSC bit(4)
  jnz calibrate_RDTSC_present
  jmp calibrate_RDTSC_END
calibrate_RDTSC_present:

;; wait for a new PIT tick
  mov ebx, dword [gs:GS_timestamp]
@RDTSC1:
  cmp ebx, dword [gs:GS_timestamp]
  je @RDTSC1

:: START
  rdtsc
  mov dword [ebp + calibrate_RDTSC_low], eax
  mov dword [ebp + calibrate_RDTSC_high], edx


;; TICKS interval
  add ebx, ( ITICKS + 1 )

@RDTSC2:
  cmp ebx, dword [gs:GS_timestamp]
  jne @RDTSC2

:: END
  rdtsc
  sub eax, dword [ebp + calibrate_RDTSC_low]
  sbb edx, dword [ebp + calibrate_RDTSC_high]

  mov ebx, SCALE
  div ebx

  mov dword [gs:GS_TSC_COUNT], eax

calibrate_RDTSC_END:
...
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re:profiling CPU speed and precise timing

Post by Brendan »

Hi,
ivannz wrote:Are there any ways (peferably in integer arithmetics):
0) what is the possible maximum resloution of TSC?
1) to determine the ratio of TSC (if supported) to PIT ticks (at any possible HZ rates, for instance 100 ticks per second)?
2) to detecet the CPU speed by, so to say, "direct probing"?
There is no maximum resolution for the TSC, although Intel have guaranteed that it won't overflow within ten years of being reset to 0 (for current CPUs and future CPUs). If you do the math this works out to an "architecturally guaranteed" maximum CPU clock speed of (roughly) 350 GHz, or a resolution of (roughly) 0.00285 nS. You can do the same assuming a minimum clock speed of 25 Mhz to get a resolution of 40 nS. I can't remember what the slowest CPU that supported TSC actually was (it'd be possible to under-clock a CPU for power saving in embedded systems - 25 MHz is a fairly "safe" assumption).

The code you posted should be a quite accurate method of measuring TSC speed (slower PIT frequencies would improve accuracy), and is the only real way of detecting CPU speed with "direct probing".
ivannz wrote:And also how to account for the CPU's with power determined internal clock speeed (or just volatile) (for instance my notebook CPU speeed is degraded when the battery is low)?
That will be the real problem, depending on what you intend using the TSC for. It may be possible to use the local APIC thermal sensor interrupt, and/or the "IA32_THERM_STATUS_MSR" and other MSR's to determine when the CPU is using "Intel(R) SpeedStep(R)" but they probably won't help when the CPU's "DPSLP#" pin is asserted (ie. when the CPU is in "deep sleep" mode). If you ever support hyper-threading you'll could have more problems because the TSC is shared by both logical CPUs and doesn't have anything to do with work done by any single logical CPU.

In any case, if you're using the TSC to work out how much CPU time each thread has used then it can work (if you don't support hyper-threading, or adjust for it). If you're hoping the TSC can be used for reliably measuring real time, you'll have problems.

If you just want to display the CPU speed so the user can see it, use the word "current" (ie. "Current CPU speed: 1234 MHz" rather than "CPU speed: 1234 MHz"). An alternative might be the SMBIOS functions, especially as incorrect results won't make your OS crash (and you can blame someone else for incorrect results). Another alternative would be to use the CPUID family/model/revision information and a pile of tables to figure out what the CPU is designed for (which won't help if it's been overclocked or underclocked). AFAIK Intel's newer CPU's do also report the intended CPU speed as part of the CPUID brand name (e.g "Intel Pentium 4 1600MHZ").

If you want to use the CPU speed as the basis for calculating the time slice length the scheduler uses, or the frequency for other timing related things, then don't. CPU speed is largely meaningless and doesn't directly correspond to how fast the CPU can do actual work. For this purpose measure how many times the CPU can do <insert_some_code_here> within a fixed time limit.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Cjmovie

Re:profiling CPU speed and precise timing

Post by Cjmovie »

Why measure by how fast it can do something anyway? The user only cares that everything seems to be done all at once, meaning in about 1/2 a second everything should have a chance to run. Or more. For me I set the PIT to 100hz and I'll have all processes use 1 slice until I have 50 process's. If I have more I'll increase the frequency that the PIT uses. Or maybe even slow it down for less processes to reduce the overhead of switching.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re:profiling CPU speed and precise timing

Post by Brendan »

Hi,
Cjmovie wrote:Why measure by how fast it can do something anyway? The user only cares that everything seems to be done all at once, meaning in about 1/2 a second everything should have a chance to run. Or more. For me I set the PIT to 100hz and I'll have all processes use 1 slice until I have 50 process's. If I have more I'll increase the frequency that the PIT uses. Or maybe even slow it down for less processes to reduce the overhead of switching.
I've use IRQ8 (the RTC periodic interrupt) to keep track of real time, including measuring how much time each thread has used, waking sleeping threads, etc. For this, higher frequencies mean more accurate timing (for sleeping threads, etc) and more accurate measurement of how much time a thread used, but also increases overhead. For my scheduler there's the "base time slice length". Short values make the scheduling "smoother" but also increases overhead.

For both of these I measure how quickly the CPU can do real work and then calculate the IRQ8 frequency and "base time slice length" from this. The end result is that for slow computers you get less accurate time delays, less accurate time measurement, "chunky" scheduling and acceptable overhead. For fast computers you get very accurate timing, very smooth scheduling and acceptable overhead.

There's always a compromise. For your OS how did you decide on 100 Hz? Is 100 Hz too fast for slow computers, is it too slow for fast computers, or is it somewhere in between? What about in 10 years time, when the fast computers of today look slow compared to newer computers?


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Cjmovie

Re:profiling CPU speed and precise timing

Post by Cjmovie »

As I said, it adjusts. I probably will add a part, then, that figures out time-wise overhead of switching on THAT cpu, then use that to base time slice.
ivannz

Accurate & Precise Timing

Post by ivannz »

Hello people!
Question is about i8253 Programmable Interval Timer.

When i was deciding on what mode to use i thought that mode 3 would satisfy all my needs for precise timing and multitasking (more than that it was used by BIOS for the goal of accurate timing).
but recently i found that linux ver 2.0.40 uses mode 2 for such needs.
mode 2 - rate generator
mode 3 - square wave generator.
Why does linux use mode 2 but not 3?
If they differ only in the waveform generated, how is the system performance affected?

I've read the following:
http://www.csee.umbc.edu/~cpatel2/links/310/slides/chap11_lect09_IO2.pdf
pp. 13 - 19

P.S.: When does the PIT generate IRQ0 in mode 2 and in mode 3? is it when its internal counter reaches ZERO?
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re:Accurate & Precise Timing

Post by Brendan »

Hi,
ivannz wrote:but recently i found that linux ver 2.0.40 uses mode 2 for such needs.
mode 2 - rate generator
mode 3 - square wave generator.
Why does linux use mode 2 but not 3?
If they differ only in the waveform generated, how is the system performance affected?

I've read the following:
http://www.csee.umbc.edu/~cpatel2/links/310/slides/chap11_lect09_IO2.pdf
pp. 13 - 19
The short answer is that it doesn't matter which mode you use, as long as you set the timer count accordingly.

Because it doesn't make much difference, there's no guarantee which mode a BIOS uses (could be either depending on manufacturer).

The main difference between the modes is that in mode 2 the PIT gives a "pulse" every time the counter reaches zero which causes IRQ0 directly. For mode 3, the same pulse is fed to an internal "flip flop", which causes the output to change state every time the count reaches zero. When the output goes high the PIC chip notices and generates the IRQ, which happens every second time the count reaches zero (not every time, like in mode 2).

For example:
[tt] .............
__|__|__|__|_ <-output for mode 2, frequency = 1.193/count MHz

...__....__..
__|..|__|..|_ <-output for mode 3, frequency = 1.193/(2*count) MHz[/tt]

Due to the way the count works in each mode, 1.193 MHz is only possible with mode 2 (but it's probably way too fast anyway), and 9.1 Hz is only possible with mode 3 (but it's probably way too slow).

ivannz wrote:P.S.: When does the PIT generate IRQ0 in mode 2 and in mode 3? is it when its internal counter reaches ZERO?
For mode 2, yes. For mode 3, no.

Note: The exact frequency used by the PIT chip is 3579545/3 Hz (or about 1193181.66666666666667 Hz) and not 1.193 MHz, but most motherboards aren't exact anyway (temperature drift, cheap motherboards, etc) so there's not too much point being this precise.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Post Reply