Hi,
kenna wrote:Which of these two algorithms would be more efficient? Or would it simply be a matter of taste?
The first algorithm would be more efficient (less context switches), although I'd use "A A A A B B C C". This is what I call "variable time slice scheduling" where all threads get one time slice each and a thread's priority determines the length of it's time slice. The algorithm used in the scheduler for this can easily be O(1).
You are correct about delta time though - with 100 running threads and an average of 20 ms per thread, each thread would have to wait for almost 2 seconds before it gets CPU time again. This makes it very bad for higher priority threads (e.g. threads that deal with the GUI).
The second algorithm is what I call "variable frequency scheduling" - each thread gets the same length time slices (e.g. 1 ms) but higher priority threads get time slices more often. This is worse for efficiency but better for high priority threads. However, it's extremely difficult (impossible?) to implement this as an "O(1)" algorithm - you end up searching through a list of threads to find out which thread should be run next.
There are also other algorithms. For example, the "highest priority thread that can run does run" algorithm is much better for very high priority threads (e.g. IRQ handling threads in my OS, and perhaps the GUI and threads that handle user interfaces). This algorithm is also easy to implement as "O(1)".
So, which algorithm do I think is best? Honestly, none of them (or more specifically, all of them).
What I do is split things up into 4 groups (or "scheduling policies"). Policy 0 is for very high priority threads and uses the "highest priority thread that can run does run" algorithm. Policy 1 is for high priority threads and uses the "variable frequency" algorithm. Policy 2 is for normal threads and uses the "variable time slice" algorithm. Policy 3 is for idle threads and also uses the "variable time slice" algorithm.
Each thread has a policy, and a priority within that policy. You could think of this as a 10-bit number, where the highest 2 bits determine the policy and the lowest 8 bits determine the priority within that policy.
The scheduler makes sure that it's always running threads from the highest priority policy. For example, if it's running a thread that belongs to Policy 2 and a thread from Policy 1 becomes ready to run, then the currently running thread is pre-empted.
Within each policy this may not apply. If a policy 0 thread is running and a higher priority policy 0 thread becomes ready to run, then the currently running thread will be pre-empted. However, if a policy 1, 2 or 3 thread is running and a higher priority thread in the same policy becomes ready to run, then the currently running thread won't be pre-empted.
To implement this I use 259 linked lists. The first 256 linked lists are used for Policy 0 with a seperate linked list for each thread priority (note: if there's 2 or more policy 0 threads with the same priority, then they take turns in a "round robin" way). The 257th linked list is for all policy 1 threads, then 258th linked list if for all policy 2 threads and the 259th linked list is for all policy 3 threads.
The other thing to consider is where threads are inserted into the "ready to run" list/s. For example, for "variable time slice" scheduling if a thread becomes ready to run you could insert it at the end of the list so it waits for all other threads to get their time slice before it gets a time slice, or you could insert it at the start of the list so that it's the next thread to get a time slice. I always put the new thread at the end of the list - otherwise a thread could hog almost all CPU time by doing something like sleeping for 1 ms at the end of it's time slice.
Lastly, there's time slice lengths. First I work out a performance rating for the computer. For e.g. the number of instructions that all CPUs could execute in 1 second divided by "(number_of_CPUs + 1) / 2". Then I use the performance rating to calculate the "base time slice length". For e.g. "base_time_slice = performance_rating / K" where K is selected to give 1 ms time slices on an average CPU. This means that the faster the computer is the smoother scheduling gets (or less context switching and overhead on slower computers).
For Policy 0 and policy 1, all threads get the base time slice length (e.g. 1 ms for an average computer). For Policy 2 a thread gets "base_time_slice * thread_priority" (e.g. 1 ms for a low priority thread and 256 ms for a high priority thread, on an average computer), and for policy 3 a thread gets "base_time_slice * 4 * thread_priority" (e.g. 4 ms for a low priority thread and 1024 ms for a high priority thread, on an average computer)
It all sounds complicated, but I've implemented it and found it to be extremely good. For example, you can have normal threads doing infinite loops without effecting how responsive the GUI is (or device driver threads), or idle threads doing infinite loops without effecting any threads in other policies.
There is one other "problem" with it (other than complexity) - software must use the policy/priority scheme, and they must use it correctly. Crappy software (e.g. POSIX software that almost never supports the OS's thread priorities) will not work well, and programmers need to choose the policy and priority of threads wisely.
For example, for a web browser you'd want the main "GUI handling" thread (that takes care of drawing the window, menus, etc) to be a high priority thread (e.g. policy 1, priority 128) and the code that renders HTML pages (and supplies this data to the main thread) to be a normal priority thread (e.g. policy 2, priority 32) so that the GUI remains responsive regardless of what the HTML renderer is doing. Then you might want a third thread for fetching data from the internet and web cache (e.g. policy 2, priority 196), and perhaps a fourth thread that analyzes usage patterns and prefetches web pages in the background (e.g. policy 3, priority 50) so that data is preloaded before the user wants it.
Cheers,
Brendan