Hi,
01000101 wrote:I still can not get past a way to figure out a ballpark WCET. Once I can figure some timings out then I can give those numbers at task creation time for the scheduler. I've been looking into tools that figure this out but I doubt any would work as they would only give the timings reported on the host (compiler) machine. Those timings could be completely wrong on a faster/slower/emulated machine.
If a task takes 5 seconds on a fast computer, how long would it take on a computer that's half as fast?
This is just a theory, and I'm not too sure how you'd use it in practice, but...
I'm thinking that maybe you could benchmark some of the computer's more important performance characteristics (general/integer instruction performance, FPU performance, SSE performance, RAM bandwidth, disk speed, etc). Then you could come up with a general formula for each task that looks something like "time = a * v + b * w + c * x + d * y + e * z", where "a = general/integer instruction performance", "b = FPU performance", "c = SSE performance", etc; where the developer needs to provide values for v, w, x, y, z. To get these values you'd need a way to mess with the corresponding characteristics. For example, you could run the task once and measure how long it takes, then run the task again while using delays to artificially halve the disk speed, and then use both of these measurements to determine how much disk performance effects the task.
For example, if disk access has been benchmarked and gets a score of 10 normally ("e = 10"), and a score of 5 when it's being artificially slowed down ("e = 5"); and if the task takes 5 seconds with normal disk access and 6 seconds with "half speed" disk access, then:
5 = a * v + b * w + c * x + d * y + 10 * z
And:
6 = a * v + b * w + c * x + d * y + 5 * z
So:
0 = a * v + b * w + c * x + d * y + 10 * z - 5
And:
0 = a * v + b * w + c * x + d * y + 5 * z - 6
Therefore:
a * v + b * w + c * x + d * y + 10 * z - 5 = a * v + b * w + c * x + d * y + 5 * z - 6
10 * z - 5 = 5 * z - 6
10 * z - 5 * z = 5 - 6
5 * z = -1
z = -1/5 = -0.2
To make SSE performance worse, you could disable SSE so that it causes an exception when the task tries to use an SSE instruction, then the exception handler could enable SSE and enable "single step" debugging, then when the debugging exception occurs you could disable SSE and disable "single step" debugging. This means that for every SSE instruction you're artificially adding 2 exceptions, and you'd be able to do your measurements and calculate "x". The same method could be used mess with FPU performance to find "w". Once you know "w" and "x" you could use "software controlled clock modulation" to slow down the entire CPU. This would effect "v", "w" and "x"; but because you already know "w" and "x" you'd be able to calculate "v".
I can't think of a way to make RAM bandwidth worse (not without something chipset specific), but if you know all the other values you don't need to make RAM bandwidth worse to calculate it's effect on the task. For example, if all the other values are known the formula might become "123 = 1 * 2 + 3 * 4 + 5 * 6 + 7 * d + 9 * 10".
Also, the developer may be able to do these tests on several machines and average the result to improve the accuracy of the final results.
Once you've found a way to calculate all of these (v, w, x, y, z) values for a task you'd be able estimate fairly accurately how much time the task will take on any computer by benchmarking each characteristic for that computer and using the results of these benchmarks and the task's pre-determined v, w, x, y, z values in your "time = a * v + b * w + c * x + d * y + e * z" formula.
Of course it's not this simple. The time a task takes may depend on how much data it needs to process, so you'd need to take that into account too ("time = (a * v + b * w + c * x + d * y + e * z) * amount_of_data"?).
I should also point out that I used the term "task" because it's meaning is ambiguous - for everything above, "task" might mean one specific operation that the application performs (e.g. the time between receiving an event and sending a reply/response), but "task" could also mean anything else.
01000101 wrote:I think Brenden is on to something (shocking?
) but I can not even fathom how to implement such a system. I hate to ask for it but maybe a little more detail of the design about the task messaging system you were describing?
I'm used to micro-kernels, where there's many tasks that send and receive messages to get anything done. For an example, the user presses a key causing an IRQ, so the kernel sends an "IRQ occured" message to the keyboard driver, the keyboard driver sends a message to the GUI, the GUI sends a message to an application, the application sends a message back to the GUI, the GUI sends a message to the video driver and the video driver updates the screen.
In all of these cases the kernel can measure the time it takes between a task receiving a message and that task asking to receive the next message. For example, imagine that the kernel's system timer says that "t = 10000" when the keyboard driver receives the "IRQ occurred" message, then the keyboard driver handles this message, sends another message to the GUI, then calls "getMessage()" again when the system timer says that "t = 10123". 10123 - 10000 = 123, so the kernel knows it took the keyboard driver 123 system timer ticks to process the "IRQ occurred" message.
Cheers,
Brendan