Page 1 of 2

Performance Monitoring Counters on Intel Celeron M

Posted: Tue Jun 14, 2011 7:08 am
by limp
Hi all,

I am experiencing a number of problems when trying to make performance counters to work on an Intel Celeron M 440 processor.

First of all, I am having a hard time trying to classify the processor into one of the families that are mentioned on the tables of performance monitoring events in the Intel manuals. Specifically, I don't now whether my processor is in the P6 family or in the Pentium M family so that I can look the correct event table.

I've managed to make the counters to work but if I try to measure "DCU_MISS_OUTSTANDING" or "INST_RETIRED", I get a non zero value but when I try to measure "CPU_CLK_UNHLATED" I always get zero. Could it be possible that some events (like "CPU_CLK_UNHLATED") are not supported from my processor (cause its a Celeron one) and if yes, how can I find out which ones?

Thank you all for you help.

Re: Performance Monitoring Counters on Intel Celeron M

Posted: Wed Jun 15, 2011 3:04 am
by light

Re: Performance Monitoring Counters on Intel Celeron M

Posted: Wed Jun 15, 2011 3:48 am
by JamesM
light wrote:LibCPUID
http://sourceforge.net/projects/libcpu/

Intel cpuid documentation ( january 2011 )
http://www.intel.com/Assets/PDF/appnote/241618.pdf
People like you are the reason we get a bad rap for glib, stupid answers. The question was pointed, researched, and exact. The answer should be similarly so.

A simple RTFM for such a well-phrased question is insulting, please don't do it. He mentions specifically that he has read the manuals.

limp: I can't help you, but hopefully someone here can.

Re: Performance Monitoring Counters on Intel Celeron M

Posted: Wed Jun 15, 2011 8:45 am
by Brendan
Hi,
limp wrote:I am experiencing a number of problems when trying to make performance counters to work on an Intel Celeron M 440 processor.

First of all, I am having a hard time trying to classify the processor into one of the families that are mentioned on the tables of performance monitoring events in the Intel manuals. Specifically, I don't now whether my processor is in the P6 family or in the Pentium M family so that I can look the correct event table.
As far as I can tell, it's a "Yonah", and you should probably use the "Performance Monitoring Events for Intel Core Solo and Intel Core Duo Processors" part of the appendix.
limp wrote:I've managed to make the counters to work but if I try to measure "DCU_MISS_OUTSTANDING" or "INST_RETIRED", I get a non zero value but when I try to measure "CPU_CLK_UNHLATED" I always get zero. Could it be possible that some events (like "CPU_CLK_UNHLATED") are not supported from my processor (cause its a Celeron one) and if yes, how can I find out which ones?
I doubt that's the problem. Typically Intel design a chip (which is expensive) and then produce many variations of that chip with different features enabled/disabled, different cache sizes, etc (which is cheap). There's 2 reasons for this - partly because it's cheaper than designing lots of different chips (for product differentiation) and partly to sell CPUs that had defects (e.g. if there's a problem in a specific part of the cache, then disable that part of the cache and sell it as a lower-end CPU rather than throwing it away and losing money on it). The features that are disabled are large/noticeable features (e.g. half the cache, an entire core, hyper-threading, virtualisation, etc), and not minor differences.


Cheers,

Brendan

Re: Performance Monitoring Counters on Intel Celeron M

Posted: Wed Jun 15, 2011 12:31 pm
by jnc100
Agreed. Its a 32-bit Yonah, essentially an enhanced Pentium M. It should support the CPU_CLK_UNHALTED counter (http://oprofile.sourceforge.net/docs/in ... events.php). Note that there are some errata related to the use of INST_RETIRED on this chip (see http://download.intel.com/design/mobile ... 300303.pdf - W79 - the M 440 is stepping D-0) but CPU_CLK_UNHALTED should work fine.

If you have access to linux on it, then OProfile is good at identifying which performance counters are available.

Regards,
John.

Re: Performance Monitoring Counters on Intel Celeron M

Posted: Wed Jun 15, 2011 4:26 pm
by Gigasoft
JamesM wrote:People like you are the reason we get a bad rap for glib, stupid answers. The question was pointed, researched, and exact. The answer should be similarly so.

A simple RTFM for such a well-phrased question is insulting, please don't do it. He mentions specifically that he has read the manuals.
Are you sure that light is the source of your bad reputation? It wasn't the same manual. I am thinking that he must feel very invalidated now, feeling unwanted and incompetent, failing to live up to the expectations of what he should and shouldn't have said. Perhaps, in times when we feel underappreciated, and are looking to find a cause outside ourselves, something or someone on which to place responsibility, we should think about it and consider if the public identification of another person as the sole source of one's feeling of being unloved really makes one happier.

Re: Performance Monitoring Counters on Intel Celeron M

Posted: Thu Jun 16, 2011 1:29 am
by JamesM
Gigasoft wrote:Are you sure that light is the source of your bad reputation?
JamesM wrote:A simple RTFM for such a well-phrased question is insulting
Gigasoft wrote:I am thinking that he must feel very invalidated now, feeling unwanted and incompetent,
He shouldn't have been so rude to a newcomer on a public forum then. That is what an admonishment is meant to achieve.

Re: Performance Monitoring Counters on Intel Celeron M

Posted: Thu Jun 16, 2011 11:36 am
by DavidCooper
It's fully possible that no rudeness was intended, even though it looks rude - Light may have posted it by mistake while intending to do a print preview in order to test the links, and then he may have lost his Web connection before getting a chance to complete it. Probably fairest to give him the benefit of the doubt and drop the matter.

Re: Performance Monitoring Counters on Intel Celeron M

Posted: Thu Jun 16, 2011 12:35 pm
by JamesM
DavidCooper wrote:Probably fairest to give him the benefit of the doubt and drop the matter.
I have done, haven't I?

Re: Performance Monitoring Counters on Intel Celeron M

Posted: Thu Jun 16, 2011 12:54 pm
by quok
Personally I think going off-topic by ranting about etiquette isn't helping anyone here. Let's stop it now and get back to helping the OP.

Re: Performance Monitoring Counters on Intel Celeron M

Posted: Thu Jun 16, 2011 3:58 pm
by limp
Hi guys,

Thank you all for your help and particularly I would like to thank Brendan for his excellent answer (really cool all these extra info you gave)!
Brendan wrote:As far as I can tell, it's a "Yonah", and you should probably use the "Performance Monitoring Events for Intel Core Solo and Intel Core Duo Processors" part of the appendix.
That was very useful. From what I'd read in the Intel manuals and without knowing the exact family of my processor, I was using 0x79 as the event code for "CPU_CLK_UNHLATED" while, according to the new info brought by Brendan, "CPU_CLK_UNHLATED" is selected using 0x3C as the event code with UMASK of 0x01 for my processor.

I was really hoping that an event which counts all the bus cycles would appear, despite the sleep/HLT state of the processor, but as it seems the closest thing to BUS cycle counting is "CPU_CLK_UNHLATED".

Also, I would like to ask (if anyone knows) about the accuracy of the performance counters. I know that they are 40-bit timers but can they be considered as accurate as LAPIC time is (or even more as LAPIC timer is only 32-bit)? Furthermore, an interrupt caused from these counters is triggered directly on the processor or on the Processor System Bus for example?

Cheers

P.S. Peace to the people ;)

Re: Performance Monitoring Counters on Intel Celeron M

Posted: Thu Jun 16, 2011 10:46 pm
by Brendan
Hi,
limp wrote:That was very useful. From what I'd read in the Intel manuals and without knowing the exact family of my processor, I was using 0x79 as the event code for "CPU_CLK_UNHLATED" while, according to the new info brought by Brendan, "CPU_CLK_UNHLATED" is selected using 0x3C as the event code with UMASK of 0x01 for my processor.
Cool :)
limp wrote:I was really hoping that an event which counts all the bus cycles would appear, despite the sleep/HLT state of the processor, but as it seems the closest thing to BUS cycle counting is "CPU_CLK_UNHLATED".
The closest thing to bus cycle counting would be the local APIC timer (e.g. see how quickly the local APIC timer count decreases). It'd be easier to use than performance monitoring counters too (same code for all CPUs with local APICs, rather than different performance monitoring code for each different CPU model).
limp wrote:Also, I would like to ask (if anyone knows) about the accuracy of the performance counters. I know that they are 40-bit timers but can they be considered as accurate as LAPIC time is (or even more as LAPIC timer is only 32-bit)?
The performance monitoring counters are intended to be used for tuning/optimising code (e.g. profilers, etc), and they're reasonably accurate for that purpose. They're not accurate for measuring time though, as the duration of a CPU cycle can vary due to power management (anything that effects CPU frequency, including "SpeedStep" and "TurboBoost"). The local APIC timer is more accurate for measuring time, as it runs at (fixed frequency) bus speed rather than (variable frequency) CPU speed.

However, even though it's more accurate the local APIC timer would be less precise. For example, if the CPU's (nominal) frequency is 1.87 GHz and the bus speed is 533 MHz, then you'd get (up to) 0.535 ns precision from performance monitoring counters and 1.876 ns precision from the local APIC timer count.

Depending on what you're doing, if you actually need extreme precision then I'd consider the TSC. On older CPUs it has the same "variable frequency CPU clock" problem as the performance monitoring counters (but is easier to setup/use), and on newer CPUs it runs at the CPU's nominal clock frequency and not the CPU's actual clock speed and therefore (unlike performance monitoring counters) doesn't have the "variable frequency CPU clock" problem on these CPUs. Despite this, it's very unlikely that you'd need (rather than just want) such high precision.

Note: It's also possible to use both - e.g. use the local APIC count to continually synchronise the TSC or performance monitoring counter/s with real time, to get the accuracy of the local APIC timer with the precision of the TSC or performance monitoring counter/s. This is potentially over-complicated and likely to be messy though.

For maximum time between roll-over, for a bus speed of 533 MHz a 32-bit local APIC count would roll over after 8.06 seconds, for a 1.87 GHz CPU clock a 40-bit performance monitoring counter (measuring "all cycles") would roll over after 587.974 seconds, and for a 1.87 GHz CPU clock a 64-bit time stamp counter (e.g. RDTSC) would roll over after 9864569023.374 seconds (about 312 years).
limp wrote:Furthermore, an interrupt caused from these counters is triggered directly on the processor or on the Processor System Bus for example?
The performance monitoring counters (and the local APIC timer) tell the CPU's local APIC to send an IRQ to the processor. It wouldn't be visible on the bus, and there's no way to (for e.g.) make a performance monitoring counter (or local APIC timer) send an IRQ to a different CPU.


Cheers,

Brendan

Re: Performance Monitoring Counters on Intel Celeron M

Posted: Thu Jun 16, 2011 11:42 pm
by limp
Hi there,
Brendan wrote: The closest thing to bus cycle counting would be the local APIC timer (e.g. see how quickly the local APIC timer count decreases). It'd be easier to use than performance monitoring counters too (same code for all CPUs with local APICs, rather than different performance monitoring code for each different CPU model).
My main goal is not “bus cycle counting” but triggering a performance counter interrupt periodically after some fixed time (e.g. 1000 ms). As the bus frequency should remain preety much constant along system operation (i.e. 133 MHz), it should be quite straightforward to setup the performance counter for overflowing after a fixed time (it should be the same way we setup LAPIC timer, shouldn’t it?).
Brendan wrote: They're not accurate for measuring time though, as the duration of a CPU cycle can vary due to power management (anything that effects CPU frequency, including "SpeedStep" and "TurboBoost"). The local APIC timer is more accurate for measuring time, as it runs at (fixed frequency) bus speed rather than (variable frequency) CPU speed.
If I have the above mentioned features disabled (and also by preventing the system going to a sleep/HLT state), I should get pretty much same accuracy as LAPIC timer, am I right (in terms of making the PC overflowing after a fixed time interval as passed)?
Brendan wrote: Depending on what you're doing, if you actually need extreme precision then I'd consider the TSC.
In my CPU, TSC counting varies according to CPU frequency. Furthermore, I have to use the performance counters as I would like to use a different delivery mode for the interrupts rather than fixed.
Brendan wrote: Note: It's also possible to use both - e.g. use the local APIC count to continually synchronise the TSC or performance monitoring counter/s with real time, to get the accuracy of the local APIC timer with the precision of the TSC or performance monitoring counter/s. This is potentially over-complicated and likely to be messy though.
You mention “performance monitoring counter/s with real time”. Can you specify this a bit further please?
Brendan wrote: For maximum time between roll-over, for a bus speed of 533 MHz a 32-bit local APIC count would roll over after 8.06 seconds…
I think you’ve made a small mistake here; bus speed should be 533 MHz / 4 (quad-pumpt), so local APIC count would roll over after 32.29 seconds.

Thanks again.

Re: Performance Monitoring Counters on Intel Celeron M

Posted: Fri Jun 17, 2011 12:56 am
by Brendan
Hi,
limp wrote:Furthermore, I have to use the performance counters as I would like to use a different delivery mode for the interrupts rather than fixed.
Hrm - wasn't expecting that..

Of all the delivery modes, using SMI, ExtINT or INIT for a timer makes no sense at all. That only leaves Fixed and NMI. If you don't want to use Fixed then NMI is the only one left.

There's only 2 reasons to use NMI for a timer - watchdog timers and "poor man's profiling" (e.g. sample EIP at regular intervals to estimate where the CPU might be spending its time). It would seem strange to use performance monitoring counters (which are intended for very thorough and accurate profiling) to implement "poor man's profiling", so that only leaves the watchdog timer. The slow speed your looking for (e.g. "some fixed time (e.g. 1000 ms)" would also seem consistent with the watchdog timer idea.

For a watchdog timer, I'd get each CPU to increment an "I'm still alive" variable regularly (e.g. in the scheduler's timer's IRQ handler or something, so that the CPU's "I'm still alive" variable is incremented regularly). For multi-CPU systems you could have 2 or more CPUs doing the monitoring (checking if other CPUs have incremented their "I'm still alive" variable) to avoid the need for any NMI at all. For single-CPU you could use IO APIC (and PIT, RTC or HPET) or performance monitoring counters; but even in these cases you wouldn't need to care how accurate or precise the timing is.
limp wrote:
Brendan wrote: They're not accurate for measuring time though, as the duration of a CPU cycle can vary due to power management (anything that effects CPU frequency, including "SpeedStep" and "TurboBoost"). The local APIC timer is more accurate for measuring time, as it runs at (fixed frequency) bus speed rather than (variable frequency) CPU speed.
If I have the above mentioned features disabled (and also by preventing the system going to a sleep/HLT state), I should get pretty much same accuracy as LAPIC timer, am I right (in terms of making the PC overflowing after a fixed time interval as passed)?
When the CPU gets hot it enters a "thermal throttling" mode to cool down. You can't really disable that safely, and if you do disable it the CPU will probably end up doing a thermal shutdown instead (basically, the CPU locks up completely until the temperature drops).

It is possible to disable Turbo-boost (there's a flag in the "IA32_MISC_ENABLE" MSRs for this on CPUs that support Turbo-boost). It's also possible to avoid using any software controlled throttling.
limp wrote:
Brendan wrote: Note: It's also possible to use both - e.g. use the local APIC count to continually synchronise the TSC or performance monitoring counter/s with real time, to get the accuracy of the local APIC timer with the precision of the TSC or performance monitoring counter/s. This is potentially over-complicated and likely to be messy though.
You mention “performance monitoring counter/s with real time”. Can you specify this a bit further please?
Accuracy = the ability of a time source to keep track of time (e.g. and avoid problems like drift). If a timer doesn't have good accuracy, then you can synchronise that timer with a more accurate timer (or, synchronise that timer with the "real" time) to compensate.

You could even cascade this. For example, have a system of extremely accurate atomic clocks on the internet, and use them (and NTP) to synchronise other servers on the internet, and use these other servers (and NTP) to synchronise your RTC/CMOS; then use the RTC/CMOS to synchronise your local APIC timer, and use the local APIC timer to synchronise the CPU's TSC.


Cheers,

Brendan

Re: Performance Monitoring Counters on Intel Celeron M

Posted: Fri Jun 17, 2011 2:20 am
by limp
Hi,
Brendan wrote: It would seem strange to use performance monitoring counters (which are intended for very thorough and accurate profiling) to implement "poor man's profiling", so that only leaves the watchdog timer. The slow speed your looking for (e.g. "some fixed time (e.g. 1000 ms)" would also seem consistent with the watchdog timer idea.
Well, I am trying to investigate this as an option. I can't see something that can stop us of using performance monitoring counters (PMCs) for profiling. For example, if I wanted to measure very small intervals (microsecond precision) with a performance monitoring counter, in theory, I can get even better accuracy than if I have used LAPIC timer, if I ensure that any CPU feature (e.g. "SpeedStep", etc.) is disabled (please correct me if I am missing something on that).
Brendan wrote: They're not accurate for measuring time though, as the duration of a CPU cycle can vary due to power management (anything that effects CPU frequency, including "SpeedStep" and "TurboBoost"). The local APIC timer is more accurate for measuring time, as it runs at (fixed frequency) bus speed rather than (variable frequency) CPU speed.
Wait a minute. I thought that both LAPIC timer and PMCs are driven by the processor bus frequency. How could it be LAPIC timer to run at fixed frequency and PMCs to run at variable frequency? It would be really helpful if you could clarify this.

Edit: What I mean is that when a PMC is monitoring a "CPU_CLK_UNHLATED" event, the rate at which PMC increments should be equal to the rate that LAPIC timer decreases (in this case, both PMC and LAPIC timer are driven by bus frequency).

Something from a previous post of yours:
Brendan wrote: The performance monitoring counters (and the local APIC timer) tell the CPU's local APIC to send an IRQ to the processor. It wouldn't be visible on the bus, and there's no way to (for e.g.) make a performance monitoring counter (or local APIC timer) send an IRQ to a different CPU.
So that means that PMCs are located inside the CPU core, right? Are they using a dedicated line for connecting with the LAPIC's PMC pins? I am a bit confused on what it's actually connected on the processor bus (any hint on that will be quite useful).

Thanks a lot for all your help so far.

Kind regards