Batch Processing?
Posted: Wed Apr 18, 2007 4:24 pm
Hi,
I've been thinking about scheduling again, and had a strange thought....
If it takes you 1 hour to paint a fence, how long would it take to paint 3 similar fences?
If you do one fence at a time, then you'd finish the first fence after 1 hour, the second fence after 2 hours and the third fence after 3 hours. The average amount of time between knowing the fences need painting and completing a fence works out to 2 hours.
Imagine if you did 10% of the first fence, 10% of next fence, 10% of the third fence, then went back to the first fence and did the next 10% (and so on until all fences were painted). In this case the first fence would be done after 2 hours and 48 minutes, the second fence would be done after 2 hours and 54 minutes and the third fence would be done after 3 hours. The average amount of time between knowing the fences need painting and completing a fence works out to 2 hours and 54 minutes, but...
It'd actually take longer. Switching from one fence to another costs time (cleaning your brushes, travelling time, etc). If it costs 1 minute to switch between fences, then it'd take 29 minutes longer to complete all fences. The first fence would be done after 3 hours and 15 minutes, the second fence would be done after 3 hours and 22 minutes and the third fence would be done after 3 hours and 29 minutes. The average amount of time between knowing the fences need painting and completing a fence works out to 3 hours and 22 minutes - it's 82 minutes slower than doing the fences one at a time.
Then there's resources. If the painter does one fence at a time he can buy the paint for one fence, paint the fence, then get his money back from the customer before buying the paint for the next fence. If he was doing small parts of each fence in turn, then he'd need enough money to buy all the paint for all the fences - for 3 fences he'd need 3 times as much money to do it.
Doing 10% of each fence increases the average time before a fence is completed and uses more resources.
Modern schedulers pretend to run multiple threads at the same time by giving threads time slices and switching to a different thread if the entire time slice is used. This is perfect for some things, but (just like the painter doing 10% of each fence) this would increase the average time a task takes to complete and use more resources.
If you're compiling 10 large files, is it better to use 100 MB of memory and after 11 seconds have 10 half compiled files, or is it better to use 20 MB of memory and after 10 seconds have 5 compiled files and 5 files that haven't been started yet?
I'm thinking of having a special "batch processing" thread type, which tries to get jobs that take a finite amount of time completed in the least average time.
Basically (for threads of this type at the same priority) the scheduler would have a FIFO queue. The first thread on the queue that can run will run until it completes or blocks, and if a thread closer to the start of the queue unblocks it pre-empts other threads in the queue.
Any ideas or comments?
Cheers,
Brendan
I've been thinking about scheduling again, and had a strange thought....
If it takes you 1 hour to paint a fence, how long would it take to paint 3 similar fences?
If you do one fence at a time, then you'd finish the first fence after 1 hour, the second fence after 2 hours and the third fence after 3 hours. The average amount of time between knowing the fences need painting and completing a fence works out to 2 hours.
Imagine if you did 10% of the first fence, 10% of next fence, 10% of the third fence, then went back to the first fence and did the next 10% (and so on until all fences were painted). In this case the first fence would be done after 2 hours and 48 minutes, the second fence would be done after 2 hours and 54 minutes and the third fence would be done after 3 hours. The average amount of time between knowing the fences need painting and completing a fence works out to 2 hours and 54 minutes, but...
It'd actually take longer. Switching from one fence to another costs time (cleaning your brushes, travelling time, etc). If it costs 1 minute to switch between fences, then it'd take 29 minutes longer to complete all fences. The first fence would be done after 3 hours and 15 minutes, the second fence would be done after 3 hours and 22 minutes and the third fence would be done after 3 hours and 29 minutes. The average amount of time between knowing the fences need painting and completing a fence works out to 3 hours and 22 minutes - it's 82 minutes slower than doing the fences one at a time.
Then there's resources. If the painter does one fence at a time he can buy the paint for one fence, paint the fence, then get his money back from the customer before buying the paint for the next fence. If he was doing small parts of each fence in turn, then he'd need enough money to buy all the paint for all the fences - for 3 fences he'd need 3 times as much money to do it.
Doing 10% of each fence increases the average time before a fence is completed and uses more resources.
Modern schedulers pretend to run multiple threads at the same time by giving threads time slices and switching to a different thread if the entire time slice is used. This is perfect for some things, but (just like the painter doing 10% of each fence) this would increase the average time a task takes to complete and use more resources.
If you're compiling 10 large files, is it better to use 100 MB of memory and after 11 seconds have 10 half compiled files, or is it better to use 20 MB of memory and after 10 seconds have 5 compiled files and 5 files that haven't been started yet?
I'm thinking of having a special "batch processing" thread type, which tries to get jobs that take a finite amount of time completed in the least average time.
Basically (for threads of this type at the same priority) the scheduler would have a FIFO queue. The first thread on the queue that can run will run until it completes or blocks, and if a thread closer to the start of the queue unblocks it pre-empts other threads in the queue.
Any ideas or comments?
Cheers,
Brendan