Hi,
bwat wrote:Brendan wrote:
Of course for most OS's, both "global CPU load" and "per process CPU load" is no better than a crude estimate, and is only ever accurate if you're lucky (in a "stopped clock is right twice a day" way).
Do you not think that with a timer with a period that was both relatively short and relatively prime to the system clock that the global CPU load result would be very accurate. On 32-bit machines with floats or with fixed-point reals, accuracy would be fairly good too.
Let's start by defining what you mean by "global CPU load". In my opinion, global CPU load is the actual (measured) amount of work the CPUs did in a specific period of time divided by the maximum amount of work the CPUs could have done in that same specific period of time. For a simple example, if there's only one CPU that was able to execute 2 billion instructions in a second but it only executed 1 billion instructions in that second, then the CPU was at 50% load for that second.
Let's define "maximum amount of work the CPU could have done". Is it constant? Surely if a CPU is temporarily over-clocked (e.g. Intel's "Turbo Boost") then the maximum amount of work the CPU can do increases; and if a CPU overheats and goes into hardware enforced thermal throttling mode the maximum amount of work the CPU can do will decrease.
Now; let's look at measuring the amount of work the CPU/s did. Given that the maximum amount of work that a CPU can do in a fixed length of time varies; it makes absolutely no sense to measure time and use that to estimate the amount of work done. Measuring "instructions retired" is a much better way to estimate work done, but different instructions take different amount of work, so this isn't ideal either. Measuring "micro-ops retired" is a better way, but that still isn't 100% accurate (as different micro-ops also represent different amounts of actual work). The most accurate way might be to determine the amount of work each instruction does in advance, then (as instructions are being executed) add the pre-determined "amount of work" for each individual instruction to a running total. Sadly, this is not practical (the overhead would be extreme). So; this leaves "micro-ops retired" as the most accurate way to estimate work done that is actually practical.
If "micro-ops retired" is being used to estimate (not measure) work done; then to avoid mixing different units, it makes a sense if "maximum number of micro-ops the CPU could've executed" is used instead of "the maximum amount of work the CPU could have done".
Finally; let's look at granularity. More correctly, let's look in the opposite direction to granularity. Instead of estimating average CPU load at regular (or irregular) intervals of time, you only need to estimate CPU load when maximum amount of work the CPU can do changes or when CPU load is being reported to user-space. Basically, you want some sort of log that might look like this:
- for the first 1234 nanoseconds, 567 micro-ops where retired and a maximum of 890 micro-ops could've been retired (63.70% CPU load for 1234 ns), then the CPU got "turboboost" (ending this period)
- while in "turboboost", 1000 micro-ops where retired and a maximum of 2000 micro-ops could've been retired (50% CPU load for 2345 ns), then the CPU overheated and got throttled (ending this period)
- while throttled, for the next 3333 nanoseconds 670 micro-ops where retired and a maximum of 670 micro-ops could've been retired (100% CPU load for 3333 ns), then some process asked for CPU load (ending this period)
- while still throttled, for the next 111 nanoseconds 8 micro-ops where retired and a maximum of 22 micro-ops could've been retired (36.36% CPU load for 111 ns), then the CPU returned to nominal speed (ending this period).
From a log like this, you can calculate the average load for any specific period of time without any accuracy loss caused by granularity. Of course you'd probably want a log like that for each CPU, so you can find the CPU load for individual CPUs and also find the total CPU load for all CPUs combined.
bwat wrote:Do you not think that with a timer with a period that was both relatively short and relatively prime to the system clock that the global CPU load result would be very accurate.
No, this would fail to meet my definition of "very accurate" by several orders of magnitude. It's what I called a "crude estimate" previously.
Of course I should point out that a crude estimate is often good enough to satisfy the requirements - it's hard to know how accurate the estimate needs to be without knowing what the requirements are. Often the only requirement is "make the stupid end-user think they've been given a useful statistic so they can feel good and ignore it", where any method of generating a number that seems to fluctuate would probably be good enough, even if it has nothing to do with CPU load at all (e.g. maybe just use CPU temperature and pretend it's CPU load - "Wow - our computer does less work on a cold night!"
).
Cheers,
Brendan