Benchmarking and Speed profiling
Benchmarking and Speed profiling
Hi,I've been working on improving my os but have a few questions. How do you guys benchmark your os?I mean, after you implemement or change something,how do you verify that it has improved the performance of the os? Also,how do you check the speed of execution,memory reference and time wasted on unnecessary code?
Thanks for your replies
Thanks for your replies
Re:Benchmarking and Speed profiling
You could stress-test it by writing code to perform the operation 1000X or more (the idea is to magnify the speed difference between implementations so the actual speed increase is more noticable). Have the tester code use the realtime clock to time both implementations.
Re:Benchmarking and Speed profiling
You'll need to profile your code. In brief, add code at the top and bottom of each function to measure the time spent within that function. Run your OS and do some typical stuff. Look at the list of total times spent within each function to gauge where to optimize, and whether your optimizations are taking effect. The results may suprise you.
Note: the function taking up most time will probably be your kernel's idle loop. There is no point trying to optimize this .
Note: the function taking up most time will probably be your kernel's idle loop. There is no point trying to optimize this .
Re:Benchmarking and Speed profiling
I want to know how to go about trying to implement a idle loop. Any related info would be helpful as i dont understand completely what and how this is done. ???
Only Human
Re:Benchmarking and Speed profiling
The easiest way (though perhaps not the best) is simple: after your interrupts and exception handlers are set up and your scheduler is running, create as your first task a kernel task with code somewhat like the following (assuming NASM):
I can't promise that this code will work as written, but it should give you an idea of how it should look. Note that this version must be a kernel process, as it uses two privileged instructions ([tt]sti[/tt] and [tt]hlt[/tt]); however, the null process should have the lowest priority of any process, running only when all other processes are idle (which happens more often than you'd expect). This is about as optimal as it can get without busy-waiting, or risking instability trouble, AFAICT. C&CW, as always.
Code: Select all
[bits 32]
; null_loop - an example of a kernel null process
; This code is an example only, and has not been tested
null_loop:
sti ; Ensure that interrupts are enabled, or you'll have
; a looong wait for the loop to finish ;-)
hlt ; shuts down the processor until the next interrupt
jmp short null_loop ; when the process returns, loop back to the beginning
ret ; can't happen, but put it here anyway in case
; things go very, very wrong
Re:Benchmarking and Speed profiling
Here's an idle loop:
The idle thread's code consists solely of an idle loop. The idle thread is only scheduled if there are no other threads ready to run, and its purpose it to soak up CPU time until another thread is ready to run. Clearly you can't run no thread at all, so if you're going to run something, you need to run something that does nothing.
The HLT instruction here stops the processor until the next interrupt arrives. This is the same as waiting for a new thread to be ready to run. Why? Because if there are no threads that are ready, they must all be waiting on something. What can threads wait on? Hardware devices (for read(), write() etc.) and timers (sleep() etc.). What have these got in common? They're all triggered by an IRQ. So waiting for an IRQ is enough to wait for an event which causes another thread to run.
You could write an idle loop like this:
However, this causes the CPU to loop as fast as it can, for no reason. Putting HLT in there allows the CPU to shut down, which keeps it cool, and saves power on laptops.
Code: Select all
while (true)
__asm__("hlt");
The HLT instruction here stops the processor until the next interrupt arrives. This is the same as waiting for a new thread to be ready to run. Why? Because if there are no threads that are ready, they must all be waiting on something. What can threads wait on? Hardware devices (for read(), write() etc.) and timers (sleep() etc.). What have these got in common? They're all triggered by an IRQ. So waiting for an IRQ is enough to wait for an event which causes another thread to run.
You could write an idle loop like this:
Code: Select all
while (true)
; /* do nothing */
Re:Benchmarking and Speed profiling
Uh oh, it looks like we experienced race conditions in those postings. But I must admit, Robinson-sensei did a better job of explaining than I did.
Re:Benchmarking and Speed profiling
See true realtime multithreading in action at Mega-Tokyo!
Re:Benchmarking and Speed profiling
What is the wake-up time from a HLT?
Could it make sense to busy-idle some time within the scheduler, expecting some thread to become ready soon, before "powering down" by using a HLT?
Could it make sense to busy-idle some time within the scheduler, expecting some thread to become ready soon, before "powering down" by using a HLT?
Every good solution is obvious once you've found it.
Re:Benchmarking and Speed profiling
it would make more sense to have the idle thread wake up the swap thread or some other bookkeeping daemon which lingers around whilst other processes are busy. And then, when all the work is done, have the idle thread drop into the hlt state.
This would make more sense for me.
Wake up time from hlt? hm ... didn't think about that. Interrupt arrives, Processor continues work. Hm. Don't have the slightest clue.
This would make more sense for me.
Wake up time from hlt? hm ... didn't think about that. Interrupt arrives, Processor continues work. Hm. Don't have the slightest clue.
Re:Benchmarking and Speed profiling
Some people have this idea that you could use the idle loop to process previously recorded data to train decision trees, perceptrons, etc to define system policies on scheduling, swapping, etc instead of halting. I don't know how good of an idea that is, but someone might want to look into it.
Re:Benchmarking and Speed profiling
While applying unused cycles to such activites may be a worthwhile practice, you wouldn't use the null process for it. Even very low-priority processes like those have occasional need to wait on something (e.g., when save a result). The null process exists solely to run when there is no process active. Even the lowest priority background process will run in preference to the null process. The null process should never be more than a backstop for times when every other process is waiting on something - period. You can spawn all the low-priority threads you wish to try and ensure that every available cycle is used, but they won't be the null process.kaos wrote: Some people have this idea that you could use the idle loop to process previously recorded data to train decision trees, perceptrons, etc to define system policies on scheduling, swapping, etc instead of halting. I don't know how good of an idea that is, but someone might want to look into it.
Re:Benchmarking and Speed profiling
As Schol-R-Lea says, there are lots of things the kernel can be doing during idle time, but not by the idle thread. The idle thread is there to soak up CPU cycles when there really is nothing else to do. Useful work can be done by separate low-priority threads, which can be scheduled as normal.
For instance, in Windows, you have:
For instance, in Windows, you have:
- Balance set manager, for trimming process' working sets
- Modified page writer threads (two), for writing modified pages to disk
- Zero page threads (one per CPU), for zeroing recently-freed pages so that they can be reallocated
Re:Benchmarking and Speed profiling
and what speaks *against* having the idle thread do some wake up stuff, so that these low level processes can come into play?
Somewhere they *have* to be placed when their work is done. Usually, they would sleep or block. F. ex. who shall set a *page-out* event for the swap thread? who notifies it, that work is to be done? a timer? Why not waking it up when there is really nothing to do except waiting for disk io or the user?
Any other execution than waking up some threads I can't imagine in the idle thread.
Somewhere they *have* to be placed when their work is done. Usually, they would sleep or block. F. ex. who shall set a *page-out* event for the swap thread? who notifies it, that work is to be done? a timer? Why not waking it up when there is really nothing to do except waiting for disk io or the user?
Any other execution than waking up some threads I can't imagine in the idle thread.
- Pype.Clicker
- Member
- Posts: 5964
- Joined: Wed Oct 18, 2006 2:31 am
- Location: In a galaxy, far, far away
- Contact:
Re:Benchmarking and Speed profiling
As schol-R-lea said, what speaks against doing things in the idle thread is that the idle thread should *never* be blocked (i mean unable to run because it waits for something). The best way to make sure it will be the case is not to do anything in it.BI lazy wrote: and what speaks *against* having the idle thread do some wake up stuff, so that these low level processes can come into play?
Well, if you just want a task to be done in the background when there is nothing else to do, the best way would be to have a 'background task' queue or to assign those tasks a priority such as it will always be lower than any user operation thread.Why not waking it up when there is really nothing to do except waiting for disk io or the user?
Any other execution than waking up some threads I can't imagine in the idle thread.
No need to wake them up: they were not sleeping (just waiting for CPU).
Now, look at the problem of zeroing page frames: it's not sufficient that the CPU is unused to run that task: you also need pages to zero! Thus the zeroing-page-daemon will block until new page are freed and should be zeroed.
Same occurs for things like training neural networks, etc. Training is only needed when new input can feed the network.