OSDev.org

Posted: **Tue Oct 29, 2013 12:05 am**

Brendan wrote: No, this would fail to meet my definition of "very accurate" by several orders of magnitude. It's what I called a "crude estimate" previously.

Without measurement we'll never know. My hunch is that it would be surprisingly accurate.

Brendan wrote: Of course I should point out that a crude estimate is often good enough to satisfy the requirements - it's hard to know how accurate the estimate needs to be without knowing what the requirements are. Often the only requirement is "make the stupid end-user think they've been given a useful statistic so they can feel good and ignore it", where any method of generating a number that seems to fluctuate would probably be good enough, even if it has nothing to do with CPU load at all (e.g. maybe just use CPU temperature and pretend it's CPU load - "Wow - our computer does less work on a cold night!" ).

Actually, if it is being used for verifying the sort of things in this document:
http://www.treewhimsy.com/TECPB/Article ... sights.pdf
i.e, you've planned the capacity; built the thing; and you need to verify that it behaves as planned, then it doesn't have to be very precise, fractions of percentages would be overkill. Accuracy is always good I suppose.

We'll all pretend you didn't suggest a metric that had nothing to do with CPU load at all!

Posted: **Tue Oct 29, 2013 12:57 am**

Hi,

bwat wrote:
Brendan wrote:No, this would fail to meet my definition of "very accurate" by several orders of magnitude. It's what I called a "crude estimate" previously.
Without measurement we'll never know. My hunch is that it would be surprisingly accurate.

Then your hunch is wrong.

bwat wrote:We'll all pretend you didn't suggest a metric that had nothing to do with CPU load at all!

I'm not sure why you'd think that; but if you're happy with some half-assed metric that assumes CPUs have constant performance (despite the fact that CPUs like that haven't existed for desktop/server 80x86 since the introduction of the "turbo button" in the 1980s) then I'll be happy knowing that whatever you implement will be inferior.

Cheers,

Brendan

Posted: **Tue Oct 29, 2013 2:16 am**

Brendan wrote:
bwat wrote:
Brendan wrote:No, this would fail to meet my definition of "very accurate" by several orders of magnitude. It's what I called a "crude estimate" previously.
Without measurement we'll never know. My hunch is that it would be surprisingly accurate.
Then your hunch is wrong.

Do you have measurements to back this up? I'd be happy to see them. Note I stated that I had a hunch and I'm quite open to evidence to the contrary. Do you have this evidence?

Brendan wrote:
bwat wrote:We'll all pretend you didn't suggest a metric that had nothing to do with CPU load at all!
I'm not sure why you'd think that;

I used your very own words. You wrote:

Brendan wrote: where any method of generating a number that seems to fluctuate would probably be good enough, even if it has nothing to do with CPU load at all (e.g. maybe just use CPU temperature and pretend it's CPU load

By your own words "has nothing to do with CPU load at all". Why would you suggest a, by your own definition, irrelevant metric? All I did was state that I think it is quite reasonable to discard any CPU load metric which "has nothing to do with CPU load". Frankly "a number that seems to fluctuate" isn't good enough.

Posted: **Tue Oct 29, 2013 4:05 pm**

Hi,

bwat wrote:
Brendan wrote:
bwat wrote:Without measurement we'll never know. My hunch is that it would be surprisingly accurate.
Then your hunch is wrong.
Do you have measurements to back this up? I'd be happy to see them. Note I stated that I had a hunch and I'm quite open to evidence to the contrary. Do you have this evidence?

Why would I need "evidence" when I've already shown the basis for my conclusion? Do you have evidence to show that evidence is necessary?

Here's some test cases for you to consider:

a) A task ran for 1 second on one logical CPU while the other logical CPU in the core was also busy. One logical CPU was doing a huge memory copy and was starved by RAM bandwidth while the other logical CPU was doing a heavily optimised mixture of integer and floating point calculations. How much CPU load did the task create for that 1 second period? In this case your method would decide "100% CPU load" for the task, even though the task was a fraction of that.

b) A task ran for 1 second on a CPU (that doesn't support hyper-threading). It was a very low priority task and the computer was a laptop running on battery power, so the OS decided to save power by voluntarily throttling the CPU down to 25% of its maximum performance. How much CPU load did the task create for that 1 second period? In this case your method would decide "100% CPU load" for the task, even though the task was a fraction of that.

c) A task is a server for a multiplayer game. In one second it handled 1000 packets from network. For each packet the task unblocks, runs for an average of 100 us and then blocks. How much CPU load did the task create for that 1 second period? In this case your method would probably decide "0% CPU load" because the task wasn't running when the timer happened to fire.

For all these cases your method is wrong by a significant amount. Over time (e.g. "average total CPU load for a 5 day period") you might be hoping that the sum of inaccurate values is accurate, but "worst case accuracy" remains the same regardless of how many inaccurate values you average.

bwat wrote:
Brendan wrote:
bwat wrote:We'll all pretend you didn't suggest a metric that had nothing to do with CPU load at all!
I'm not sure why you'd think that;
I used your very own words. You wrote:
Brendan wrote: where any method of generating a number that seems to fluctuate would probably be good enough, even if it has nothing to do with CPU load at all (e.g. maybe just use CPU temperature and pretend it's CPU load
By your own words "has nothing to do with CPU load at all". Why would you suggest a, by your own definition, irrelevant metric? All I did was state that I think it is quite reasonable to discard any CPU load metric which "has nothing to do with CPU load". Frankly "a number that seems to fluctuate" isn't good enough.

I understand now - I wrote a relatively detailed proposal for an accurate method of measuring CPU load (that happened to end with a silly comment about using something like temperature if accuracy doesn't matter at all). I thought you were criticising the detailed proposal, but you were talking about the small and irrelevant silly comment.

Cheers,

Brendan

Posted: **Tue Oct 29, 2013 5:20 pm**

I'd personally define a concept of load, derived from the (empirically useful) measure of load average that Unix uses. This isn't a measure of utilization efficiency but of work. It doesn't attempt to model things like the interactions between two processes on a hyperthreaded processor (in part because those interactions are useful for scheduling - if thread 1 is saturating the cache, then either thread 1 or 2 may wish to shed memory intensive loads to a less memory saturated core)

We'll start with instantaneous load: instantaneous load is the number of tasks which could be run at a given instant. If no tasks are runnable, load is zero; if the only task is "for(;;);", then load is always going to be pegged at 1; if two of these tasks are running, 2; etc.

Load is then defined as the weighted average of these values over time; if the processor had two tasks runnable for 1 second, then one task runnable for 1 second, the load value would be 1.5. Assuming no tasks have deadlines, the load tells you how much more time the tasks took than they would have taken with infinite cores. This admittedly isn't a good diagnostic for (soft) realtime tasks.

Load average then becomes the moving average of these load values over time.

We would also want to introduce weighting: In a system with frequency scaling, the load averages would be weighted by the relative clock speeds of the processors compared to their maximum (Obviously things like Turbo Boost cause some potential issues here in measuring the maximum performance). In an AMP system like ARM big.LITTLE, we also need to consider how to weight the smaller cores vs the larger ones.

These are not necessarily measures for the user, but for the scheduler - they can at least partially inform the scheduler about the conditions which exist, but may not tell the user what they want or expect.

Fortunately, they are measures which can be calculated inexpensively and opportunistically - by definition, on most systems a change in the instantaneous load can only happen at moments when the scheduler is activated (i.e. because a process just got (un)blocked)

Posted: **Wed Oct 30, 2013 2:25 am**

Brendan wrote:Hi,
bwat wrote:
Brendan wrote: Then your hunch is wrong.
Do you have measurements to back this up? I'd be happy to see them. Note I stated that I had a hunch and I'm quite open to evidence to the contrary. Do you have this evidence?
Why would I need "evidence" when I've already shown the basis for my conclusion? Do you have evidence to show that evidence is necessary?

Yes. My hunch is based on the results shown in section 5.2 in this paper
http://citeseerx.ist.psu.edu/viewdoc/su ... .1.50.3208
That tells me that a sampling of the program counter is a useful method. With more samples, the results will only be more accurate.

As for

Brendan wrote: Over time (e.g. "average total CPU load for a 5 day period") you might be hoping that the sum of inaccurate values is accurate, but "worst case accuracy" remains the same regardless of how many inaccurate values you average.

Please look at section 2 which reasons with the Law of Large Numbers. As the number of trials increases, the difference between the expectation and the actual value tends to zero.

p.s. Choose the postscript version of the paper, the PDF version is missing some text.

Posted: **Wed Oct 30, 2013 3:45 am**

bwat wrote:As for
Brendan wrote: Over time (e.g. "average total CPU load for a 5 day period") you might be hoping that the sum of inaccurate values is accurate, but "worst case accuracy" remains the same regardless of how many inaccurate values you average.
Please look at section 2 which reasons with the Law of Large Numbers. As the number of trials increases, the difference between the expectation and the actual value tends to zero.

And a scheduler in scientific literature only has the property of being "fair" as in "all processes will eventually get run". It is free to time everything in such a fashion that it always causes the worst case - and mind you, I have seen enough real cases of this happening in practice (because they bite).

The paper falsely applies the law of large numbers to a variable that is nondeterministic, rather than the required random, and it's conclusion is therefore theoretically invalid.

Posted: **Wed Oct 30, 2013 5:06 am**

Combuster wrote: The paper falsely applies the law of large numbers to a variable that is nondeterministic, rather than the required random, and it's conclusion is therefore theoretically invalid.

A random variable is nondeterministic. See the following link: http://mathworld.wolfram.com/Stochastic.html

Now, either you have very deep knowledge, which you simply must share with us all, or, your misconceptions are just confusing the issue.

Posted: **Wed Oct 30, 2013 5:54 am**

Another point.

Combuster wrote: And a scheduler in scientific literature only has the property of being "fair" as in "all processes will eventually get run"

We could also define fairness as each process gets an equal proportion of CPU time. That would not permit schedules that your definition would permit. A quote from Fairness and Scheduling in Single Server Queues ( http://www.cs.caltech.edu/~adamw/papers/survey.pdf‎ )

But, the concept of ‘fairness’ is amorphous and difficult to define, in large part because it can have a very different meaning depending on the setting considered.

Posted: **Wed Oct 30, 2013 4:53 pm**

Hi,

bwat wrote:
Brendan wrote:
bwat wrote:Do you have measurements to back this up? I'd be happy to see them. Note I stated that I had a hunch and I'm quite open to evidence to the contrary. Do you have this evidence?
Why would I need "evidence" when I've already shown the basis for my conclusion? Do you have evidence to show that evidence is necessary?
Yes. My hunch is based on the results shown in section 5.2 in this paper
http://citeseerx.ist.psu.edu/viewdoc/su ... .1.50.3208
That tells me that a sampling of the program counter is a useful method. With more samples, the results will only be more accurate.

It doesn't work like that. If one sample may be 50% wrong, then if you're unlucky 2 samples will both be 50% wrong, and if you continue to be very unlucky for a long time then millions of samples can all be 50% wrong. Worst case does not improve - more samples only reduces the chance of worst case occurring. Basically; if one sample isn't accurate then worst case accuracy will suck, and average case accuracy for very short time periods will also suck.

Also note that the paper is a "preprint to be presented at a conference in January 1993" (e.g. it was probably written before Intel released Pentium CPUs). A lot of the problems that exist for modern CPUs didn't exist back then (e.g. hyper-threading, turbo-boost) or were very rarely used (power management). Basically; back when the paper was written it wasn't unreasonable to assume the CPU did a fixed amount of work in a fixed amount of time, and this doesn't apply any more; and therefore any claim about accuracy in that paper is obsolete.

bwat wrote:As for
Brendan wrote: Over time (e.g. "average total CPU load for a 5 day period") you might be hoping that the sum of inaccurate values is accurate, but "worst case accuracy" remains the same regardless of how many inaccurate values you average.
Please look at section 2 which reasons with the Law of Large Numbers. As the number of trials increases, the difference between the expectation and the actual value tends to zero.

They call it an estimate (rather than a measurement) which implies that they never assume it's accurate and knew that worst case accuracy (rather than just average case accuracy) sucked. They also say "Moreover, the estimates should get better as we make more observations", where "should get better" is very different to "will get better" (also implying that they knew worst case accuracy sucked), and where "more observations" implies they knew average case accuracy sucked for short time periods (few samples).

Also note that the first paragraph of that section says "precise timing of every interrupt and system call is prohibitive", which (for Pentium and later CPUs that support RDTSC and performance monitoring counters) is mostly untrue now. This helps to confirm that the paper was written before all the problems of modern CPUs existed.

Mostly; your hunch may have been correct several decades ago (on ancient CPUs), but time has passed and CPUs have changed making your hunch wrong now. A smarter person would've realised this when I clearly detailed problems for modern CPUs in my previous posts, and probably wouldn't have continued their attempt at defending an old (now inaccurate) method. Note: An inaccurate method may still be fine if you don't need accuracy - I'm not saying these inaccurate methods are bad, I'm only saying they're not accurate.

Cheers,

Brendan

Posted: **Thu Oct 31, 2013 2:45 am**

Brendan wrote:
bwat wrote: Yes. My hunch is based on the results shown in section 5.2 in this paper
http://citeseerx.ist.psu.edu/viewdoc/su ... .1.50.3208
That tells me that a sampling of the program counter is a useful method. With more samples, the results will only be more accurate.
It doesn't work like that. If one sample may be 50% wrong, then if you're unlucky 2 samples will both be 50% wrong, and if you continue to be very unlucky for a long time then millions of samples can all be 50% wrong.

From section 2 we see that the sample space is {UserMode, SystemMode, InterruptMode}, that is to say, the outcome of a sample is one of those states. How can such a result be, as you say, 50% wrong?

Brendan wrote: Worst case does not improve - more samples only reduces the chance of worst case occurring. Basically; if one sample isn't accurate then worst case accuracy will suck, and average case accuracy for very short time periods will also suck.

You don't have to take my word on the Law of Large Numbers, there are proofs all over the web. If you have a problem with the law then you should take it up with the statistics department of your nearest university. You'll no doubt find someone to walk you through it much better than I can.

Posted: **Thu Oct 31, 2013 3:42 am**

bwat wrote:A random variable is nondeterministic.

Thus, if a pig is an animal, you conclude that a given animal is a pig.

See the following link: http://mathworld.wolfram.com/Stochastic.html

Please, either cite something that actually supports your claim, or learn to read.

You don't have to take my word on the Law of Large Numbers, there are proofs all over the web.

And the problem with probabilities is that they provide no guarantees.

Shamefully the paper is down right now, but the proof opened with the following construct:
- A is a variable representing an observation of CPU state
- B is an independent, bernoulli variable
- Sampling B repeatedly allows the invocation of the law of large numbers
- Therefore repeatedly sampling A

gives a good approximation of CPU state.
And never demonstrates that A shares the properties of B - or even the subset of properties that are mandatory for the invocation of the law of large numbers.

I can't look up the rest of the paper now (it's down) to see if they make any attempts at mitigating this, and construct a counterexample tailored to their implementation, but if it's nothing but timing at regular intervals, you can write a task to run periodically at the same interval, and the sampler would always return the same answer, yielding 0% or 100% regardless of the actual load.

Posted: **Thu Oct 31, 2013 4:20 am**

Combuster wrote: I can't look up the rest of the paper now (it's down) to see if they make any attempts at mitigating this, and construct a counterexample tailored to their implementation, but if it's nothing but timing at regular intervals, you can write a task to run periodically at the same interval, and the sampler would always return the same answer, yielding 0% or 100% regardless of the actual load.

It is not sampling at regular intervals! That's the whole point of the paper! It describes randomized aperiodic sampling! The title of the paper is "A Randomized Sampling Clock for CPU Utilization Estimation and Code Profiling". You missed the entire point of the paper as set out in the title, yet you earlier stated that "it's conclusion is therefore theoretically invalid." How can you make such a statement on such a flawed reading?

You criticise the output of professional researchers without taking the time to read what you criticise! Either advance everyone's knowledge by demonstrating the errors in the paper, or, just have the decency not to make statements about their work which you cannot back up. These people have careers and god help us if someone makes judgements concerning their abilities based on your blind analysis.

ps. You state that you are a "univerity staff member (Teacher's assistant - giving work colleges)."http://wiki.osdev.org/User:Combuster Wouldn't you have the professional courtesy to properly criticise the work of another academic?

Posted: **Thu Oct 31, 2013 5:46 am**

without taking the time to read what you criticise

If you bothered to read the exact formulation, I wouldn't need to call you a hypocrite.

So what gives? I'd make a process that uses the same source of randomness simply to further my evil cause (because true random does not exist in a computer). Or I just increase the administrative load in the EOI sequence to give a real-world demonstration that there are locations that can't be sampled, which do add bias to whatever observations you make.

Bottom line is, you can observe the observer and act on it. You can modify quicksort to prevent typical inputs from making it run worst-case, but you can always find a new input that yields the exact same result until you change the fundamental principles that make quicksort what it is.

But for now, your alleged research is unavailable for scrutiny, which makes anything you claim about it unscientific.

OSDev.org

Calculate CPU Load

Re: Calculate CPU Load

Re: Calculate CPU Load

Re: Calculate CPU Load

Re: Calculate CPU Load

Re: Calculate CPU Load

Re: Calculate CPU Load

Re: Calculate CPU Load

Re: Calculate CPU Load

Re: Calculate CPU Load

Re: Calculate CPU Load

Re: Calculate CPU Load

Re: Calculate CPU Load

Re: Calculate CPU Load

Re: Calculate CPU Load