I still can not get past a way to figure out a ballpark WCET. Once I can figure some timings out then I can give those numbers at task creation time for the scheduler. I've been looking into tools that figure this out but I doubt any would work as they would only give the timings reported on the host (compiler) machine. Those timings could be completely wrong on a faster/slower/emulated machine.
I think Brenden is on to something (shocking? ) but I can not even fathom how to implement such a system. I hate to ask for it but maybe a little more detail of the design about the task messaging system you were describing?
I've almost completely ruled out things such as static analysis, and that really only leaves run-time analysis. My scheduler is already flexible, and will finish the task and continue just as normal if the task exceeds its deadline, but said task will be at an extremely high priority until it is finished.
Real-Time OS
Re: Real-Time OS
Website: https://joscor.com
Re: Real-Time OS
WCET is a terribly difficult thing to work out. Companies exist solely to provide WCET estimates for code - they use proprietary algorithms that aren't open, and the likelihood of you actually getting a good WCET estimate is remarkably slim. WCET estimation requires analysis of looping constructs etc, along with emulation of the full machine state including caches and pipelines. It's basically a massively complicated task.I still can not get past a way to figure out a ballpark WCET.
Cheating is your best option.
Re: Real-Time OS
Hi,
This is just a theory, and I'm not too sure how you'd use it in practice, but...
I'm thinking that maybe you could benchmark some of the computer's more important performance characteristics (general/integer instruction performance, FPU performance, SSE performance, RAM bandwidth, disk speed, etc). Then you could come up with a general formula for each task that looks something like "time = a * v + b * w + c * x + d * y + e * z", where "a = general/integer instruction performance", "b = FPU performance", "c = SSE performance", etc; where the developer needs to provide values for v, w, x, y, z. To get these values you'd need a way to mess with the corresponding characteristics. For example, you could run the task once and measure how long it takes, then run the task again while using delays to artificially halve the disk speed, and then use both of these measurements to determine how much disk performance effects the task.
For example, if disk access has been benchmarked and gets a score of 10 normally ("e = 10"), and a score of 5 when it's being artificially slowed down ("e = 5"); and if the task takes 5 seconds with normal disk access and 6 seconds with "half speed" disk access, then:
5 = a * v + b * w + c * x + d * y + 10 * z
And:
6 = a * v + b * w + c * x + d * y + 5 * z
So:
0 = a * v + b * w + c * x + d * y + 10 * z - 5
And:
0 = a * v + b * w + c * x + d * y + 5 * z - 6
Therefore:
a * v + b * w + c * x + d * y + 10 * z - 5 = a * v + b * w + c * x + d * y + 5 * z - 6
10 * z - 5 = 5 * z - 6
10 * z - 5 * z = 5 - 6
5 * z = -1
z = -1/5 = -0.2
To make SSE performance worse, you could disable SSE so that it causes an exception when the task tries to use an SSE instruction, then the exception handler could enable SSE and enable "single step" debugging, then when the debugging exception occurs you could disable SSE and disable "single step" debugging. This means that for every SSE instruction you're artificially adding 2 exceptions, and you'd be able to do your measurements and calculate "x". The same method could be used mess with FPU performance to find "w". Once you know "w" and "x" you could use "software controlled clock modulation" to slow down the entire CPU. This would effect "v", "w" and "x"; but because you already know "w" and "x" you'd be able to calculate "v".
I can't think of a way to make RAM bandwidth worse (not without something chipset specific), but if you know all the other values you don't need to make RAM bandwidth worse to calculate it's effect on the task. For example, if all the other values are known the formula might become "123 = 1 * 2 + 3 * 4 + 5 * 6 + 7 * d + 9 * 10".
Also, the developer may be able to do these tests on several machines and average the result to improve the accuracy of the final results.
Once you've found a way to calculate all of these (v, w, x, y, z) values for a task you'd be able estimate fairly accurately how much time the task will take on any computer by benchmarking each characteristic for that computer and using the results of these benchmarks and the task's pre-determined v, w, x, y, z values in your "time = a * v + b * w + c * x + d * y + e * z" formula.
Of course it's not this simple. The time a task takes may depend on how much data it needs to process, so you'd need to take that into account too ("time = (a * v + b * w + c * x + d * y + e * z) * amount_of_data"?).
I should also point out that I used the term "task" because it's meaning is ambiguous - for everything above, "task" might mean one specific operation that the application performs (e.g. the time between receiving an event and sending a reply/response), but "task" could also mean anything else.
In all of these cases the kernel can measure the time it takes between a task receiving a message and that task asking to receive the next message. For example, imagine that the kernel's system timer says that "t = 10000" when the keyboard driver receives the "IRQ occurred" message, then the keyboard driver handles this message, sends another message to the GUI, then calls "getMessage()" again when the system timer says that "t = 10123". 10123 - 10000 = 123, so the kernel knows it took the keyboard driver 123 system timer ticks to process the "IRQ occurred" message.
Cheers,
Brendan
If a task takes 5 seconds on a fast computer, how long would it take on a computer that's half as fast?01000101 wrote:I still can not get past a way to figure out a ballpark WCET. Once I can figure some timings out then I can give those numbers at task creation time for the scheduler. I've been looking into tools that figure this out but I doubt any would work as they would only give the timings reported on the host (compiler) machine. Those timings could be completely wrong on a faster/slower/emulated machine.
This is just a theory, and I'm not too sure how you'd use it in practice, but...
I'm thinking that maybe you could benchmark some of the computer's more important performance characteristics (general/integer instruction performance, FPU performance, SSE performance, RAM bandwidth, disk speed, etc). Then you could come up with a general formula for each task that looks something like "time = a * v + b * w + c * x + d * y + e * z", where "a = general/integer instruction performance", "b = FPU performance", "c = SSE performance", etc; where the developer needs to provide values for v, w, x, y, z. To get these values you'd need a way to mess with the corresponding characteristics. For example, you could run the task once and measure how long it takes, then run the task again while using delays to artificially halve the disk speed, and then use both of these measurements to determine how much disk performance effects the task.
For example, if disk access has been benchmarked and gets a score of 10 normally ("e = 10"), and a score of 5 when it's being artificially slowed down ("e = 5"); and if the task takes 5 seconds with normal disk access and 6 seconds with "half speed" disk access, then:
5 = a * v + b * w + c * x + d * y + 10 * z
And:
6 = a * v + b * w + c * x + d * y + 5 * z
So:
0 = a * v + b * w + c * x + d * y + 10 * z - 5
And:
0 = a * v + b * w + c * x + d * y + 5 * z - 6
Therefore:
a * v + b * w + c * x + d * y + 10 * z - 5 = a * v + b * w + c * x + d * y + 5 * z - 6
10 * z - 5 = 5 * z - 6
10 * z - 5 * z = 5 - 6
5 * z = -1
z = -1/5 = -0.2
To make SSE performance worse, you could disable SSE so that it causes an exception when the task tries to use an SSE instruction, then the exception handler could enable SSE and enable "single step" debugging, then when the debugging exception occurs you could disable SSE and disable "single step" debugging. This means that for every SSE instruction you're artificially adding 2 exceptions, and you'd be able to do your measurements and calculate "x". The same method could be used mess with FPU performance to find "w". Once you know "w" and "x" you could use "software controlled clock modulation" to slow down the entire CPU. This would effect "v", "w" and "x"; but because you already know "w" and "x" you'd be able to calculate "v".
I can't think of a way to make RAM bandwidth worse (not without something chipset specific), but if you know all the other values you don't need to make RAM bandwidth worse to calculate it's effect on the task. For example, if all the other values are known the formula might become "123 = 1 * 2 + 3 * 4 + 5 * 6 + 7 * d + 9 * 10".
Also, the developer may be able to do these tests on several machines and average the result to improve the accuracy of the final results.
Once you've found a way to calculate all of these (v, w, x, y, z) values for a task you'd be able estimate fairly accurately how much time the task will take on any computer by benchmarking each characteristic for that computer and using the results of these benchmarks and the task's pre-determined v, w, x, y, z values in your "time = a * v + b * w + c * x + d * y + e * z" formula.
Of course it's not this simple. The time a task takes may depend on how much data it needs to process, so you'd need to take that into account too ("time = (a * v + b * w + c * x + d * y + e * z) * amount_of_data"?).
I should also point out that I used the term "task" because it's meaning is ambiguous - for everything above, "task" might mean one specific operation that the application performs (e.g. the time between receiving an event and sending a reply/response), but "task" could also mean anything else.
I'm used to micro-kernels, where there's many tasks that send and receive messages to get anything done. For an example, the user presses a key causing an IRQ, so the kernel sends an "IRQ occured" message to the keyboard driver, the keyboard driver sends a message to the GUI, the GUI sends a message to an application, the application sends a message back to the GUI, the GUI sends a message to the video driver and the video driver updates the screen.01000101 wrote:I think Brenden is on to something (shocking? ) but I can not even fathom how to implement such a system. I hate to ask for it but maybe a little more detail of the design about the task messaging system you were describing?
In all of these cases the kernel can measure the time it takes between a task receiving a message and that task asking to receive the next message. For example, imagine that the kernel's system timer says that "t = 10000" when the keyboard driver receives the "IRQ occurred" message, then the keyboard driver handles this message, sends another message to the GUI, then calls "getMessage()" again when the system timer says that "t = 10123". 10123 - 10000 = 123, so the kernel knows it took the keyboard driver 123 system timer ticks to process the "IRQ occurred" message.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: Real-Time OS
Again, all you're providing there is a way to determine (experimentally) the average-case execution time of the code.
Worst-case execution time is just that - worst case. This involves looking at all the code paths and looking at all possible input combinations to see where the longest reachable path is, and when (or if) it terminates.
I know how companies do this, but I can't tell you much because I'm under NDA. Needless to say, it involves some sort of experimental stage to get timing data, but it then applies this to an analytical search of the code to find the longest path.
Average execution time should do for 0101101's solution because nothing there is hard-realtime. Everything's soft, so take the average execution time and add on a bit. Hey presto.
Worst-case execution time is just that - worst case. This involves looking at all the code paths and looking at all possible input combinations to see where the longest reachable path is, and when (or if) it terminates.
I know how companies do this, but I can't tell you much because I'm under NDA. Needless to say, it involves some sort of experimental stage to get timing data, but it then applies this to an analytical search of the code to find the longest path.
Average execution time should do for 0101101's solution because nothing there is hard-realtime. Everything's soft, so take the average execution time and add on a bit. Hey presto.
Re: Real-Time OS
I do this and I'm not under NDA. You pretty much have to build the back end of an optimizing compiler but in reverse - you go from binary to control flow graph. If you find a loop in a basic block you have to get user input on how often the loop is traversed in the worst case. Then you have to figure out all the memory latencies and pipeline stalls and all that jazz. It isn't easy. Programming languages can make this easy or hard. Setjmp/longjmp in C as well as non-local exits or continuations in Lisp like languages are pretty hard if not impossible to figure out. To do it completely automatically would be akin to solving the halting problem and we wont be doing that in a hurry.I know how companies do this, but I can't tell you much because I'm under NDA. Needless to say, it involves some sort of experimental stage to get timing data, but it then applies this to an analytical search of the code to find the longest path.
The rule of thumb is measurement of "peak load"* response time (Deadline Monotonic) or processor demand (Earliest Deadline First) is about 50% less than calculated with the above technique.
* It is never really peak load - do you know how hard it is to generate peak load input?
Barry
barrywatson.se
Every universe of discourse has its logical structure --- S. K. Langer.
Re: Real-Time OS
Exactly what I wanted to say but couldn'tbwat wrote:I do this and I'm not under NDA. You pretty much have to build the back end of an optimizing compiler but in reverse - you go from binary to control flow graph. If you find a loop in a basic block you have to get user input on how often the loop is traversed in the worst case. Then you have to figure out all the memory latencies and pipeline stalls and all that jazz. It isn't easy. Programming languages can make this easy or hard. Setjmp/longjmp in C as well as non-local exits or continuations in Lisp like languages are pretty hard if not impossible to figure out. To do it completely automatically would be akin to solving the halting problem and we wont be doing that in a hurry.I know how companies do this, but I can't tell you much because I'm under NDA. Needless to say, it involves some sort of experimental stage to get timing data, but it then applies this to an analytical search of the code to find the longest path.
The rule of thumb is measurement of "peak load"* response time (Deadline Monotonic) or processor demand (Earliest Deadline First) is about 50% less than calculated with the above technique.
* It is never really peak load - do you know how hard it is to generate peak load input?
Barry
barrywatson.se