Fair scheduling using virtual time

OSwhatever · Post by **OSwhatever** » Thu Aug 02, 2012 6:43 am

I'm trying to understand fair scheduling but it seems my brain is blocking me to understand this. What I've understand is that you have the running time for each thread/process and use that as a key in some sort of structure, list or tree or whatever. Then when you reschedule you pick the one with the lowest key number (virtual time). Now this might be good as new threads gets to run very quickly but it might also block older threads that has run for a week for example. This is the part I don't understand, it would take ages for a new thread to get the same amount of running time as a thread that is a week old so the new thread becomes first all the time blocking older threads. Now, you can of course always reset this counting now and then. This however requires that you build up your tree or list from the beginning or you reset when you insert the thread or structure again.

Also using the raw time as key, even if it is unlikely, you have to take care of wrap arounds. Preferably having some kind of filtered ratio would be nice here but possibly computationally expensive. Divisions should be avoided, time is often 64-bit even on a 32-bit machine. I've been also thinking about if this can be solved with some IIR filtering.

Brendan · Post by **Brendan** » Sun Aug 05, 2012 12:51 am

Hi,

OSwhatever wrote:I'm trying to understand fair scheduling but it seems my brain is blocking me to understand this. What I've understand is that you have the running time for each thread/process and use that as a key in some sort of structure, list or tree or whatever. Then when you reschedule you pick the one with the lowest key number (virtual time). Now this might be good as new threads gets to run very quickly but it might also block older threads that has run for a week for example. This is the part I don't understand, it would take ages for a new thread to get the same amount of running time as a thread that is a week old so the new thread becomes first all the time blocking older threads.

I'd assume that when a new task is started they do something like "task->virtual_time_key = virtual_now()".

OSwhatever wrote:Also using the raw time as key, even if it is unlikely, you have to take care of wrap arounds. Preferably having some kind of filtered ratio would be nice here but possibly computationally expensive. Divisions should be avoided, time is often 64-bit even on a 32-bit machine. I've been also thinking about if this can be solved with some IIR filtering.

If you use unsigned integers and only store the lowest n bits, then wrap-around should take care of itself as long as no task is more than "2 ** n ticks" behind any other. For example, if 1 tick is 1 virtual ms and you're storing the lowest 32 bits of the key, then everything would be fine as long as no task is more than 4294967.296 virtual seconds behind any other.

Also note that the entire idea of "fair" is completely stupid. It's like saying someone that got ran over by a train should have the same amount of medical attention as someone with a small splinter. Some tasks are more important than others; and because Linux never knows how important tasks actually are (excluding "real time" tasks that require root privileges and are therefore unusable) it's impossible to write a scheduler for Linux that isn't completely stupid.

Cheers,

Brendan

rdos · Post by **rdos** » Sun Aug 05, 2012 1:20 pm

Brendan wrote:Also note that the entire idea of "fair" is completely stupid. It's like saying someone that got ran over by a train should have the same amount of medical attention as someone with a small splinter.

Maybe a stupid example, but it is quite likely that somebody that got ran over by a train doesn't need any medical attention at all since they would most likely be dead.

It would be interesting to continue that analogy for fair scheduling, but how do you know if a task is utterly useless and thus need no "medical attention"?

OSwhatever · Post by **OSwhatever** » Sun Aug 05, 2012 2:31 pm

Brendan wrote:Also note that the entire idea of "fair" is completely stupid. It's like saying someone that got ran over by a train should have the same amount of medical attention as someone with a small splinter. Some tasks are more important than others; and because Linux never knows how important tasks actually are (excluding "real time" tasks that require root privileges and are therefore unusable) it's impossible to write a scheduler for Linux that isn't completely stupid.

Priorities are used in order solve which thread or process that should have higher priority. The "fair" part is how to solve threads or processes on the same priority.

An interesting example is process 1 needs 100% of the CPU time and another process 2 needs very little, frequent short burst of CPU time. With normal queuing, the process 1 would block process 2 until it is preempted as its time share is finished. With time based scheduling, it is possible to get process 2 to run more quickly as it doesn't gets last in the queue.

Antti · Post by **Antti** » Mon Aug 06, 2012 6:23 am

Brendan wrote:Also note that the entire idea of "fair" is completely stupid. [...] it's impossible to write a scheduler for Linux that isn't completely stupid.

I am interested in to hear more about this. Based on the knowledge I have, the Completely Fair Scheduler of Linux seems to be quite good. Perhaps the thing to be proud of. Saying that something is completely stupid sounds like an exaggeration.

As a completely offtopic and not meant to be taken seriously, I find some similarities between writings of Linus Torvalds himself and this one. Linus easily criticise something to be completely stupid, design disaster, brain damaged etc. There is nothing wrong with that because mostly the statements are true and people have different writing styles. I avoid using these expressions but that is because usually I do not know for sure whether something really is that bad.

Brendan · Post by **Brendan** » Mon Aug 06, 2012 11:18 am

Hi,

Antti wrote:
Brendan wrote:Also note that the entire idea of "fair" is completely stupid. [...] it's impossible to write a scheduler for Linux that isn't completely stupid.
I am interested in to hear more about this. Based on the knowledge I have, the Completely Fair Scheduler of Linux seems to be quite good. Perhaps the thing to be proud of. Saying that something is completely stupid sounds like an exaggeration.

As a completely offtopic and not meant to be taken seriously, I find some similarities between writings of Linus Torvalds himself and this one. Linus easily criticise something to be completely stupid, design disaster, brain damaged etc. There is nothing wrong with that because mostly the statements are true and people have different writing styles. I avoid using these expressions but that is because usually I do not know for sure whether something really is that bad.

For Linux, there are real time tasks and normal tasks. Real time tasks do have explicit priorities, but they need to run with root privileges which makes them mostly unusable (except for some daemons perhaps). Normal tasks don't have explicit priorities, and this is the problem.

Because normal tasks don't have explicit priorities, the scheduler can't make good decisions about which tasks should get CPU time when. The load caused by things that aren't important (e.g. something creating a backup in the background) effects the CPU time available to things that are important (e.g. a HTTP server that is having a hard time trying to keep up with requests). You can't tell the scheduler that the HTTP server is higher priority and the backup is lower priority to fix this problem.

Instead, Linux tries to hide the symptoms of the "no explicit priorities" problem. This used to mean that tasks that block are assumed to be "interactive" and are given a boost - for example, that backup that should have been happening in the background might be constantly blocking due to file IO and be given a boost; and the HTTP server that is having a hard time trying to keep up with requests might not block (due to caches, etc) and be penalised. You can see how this "tasks that block are interactive" assumption can be very wrong and make the problem worse.

The Completely Fair Scheduler essentially does something a little similar - it attempts to give tasks the same amount of CPU time, regardless of whether they block or not. In a 2 second time period, if that unimportant backup task spends half it's time blocked then it will get 1 second of time and the HTTP server that is having a hard time trying to keep up with requests gets 1 second of CPU time too. This is useless if you need the HTTP server to get 80% of CPU time to keep up with requests.

These aren't the only misguided and broken attempts they've made at hiding the symptoms of the "no explicit priorities" problem. Their "group scheduling"[url] is used to help hide the symptoms too - referring to the example in that LWN article; Alice's X server should be running at a higher priority while Karl's compiler tasks should be normal/lower priority, so that Alice's X server isn't crippled by Karl's compiler processes to begin with.

Of course it'd also be silly to expect users to mess about manually changing the group scheduling all the time just to compensate for the scheduler's stupidity. To fix that problem, now they've got "auto-group scheduling" that tries to do it for the users. This "hide the problems caused by a solution that exists to hide the symptoms with a broken solution" hack does seem to [url=http://www.phoronix.com/scan.php?item=linux_2637_video&num=2&page=article]hide the symptoms much better, but it's unlikely to be close to ideal for all cases.

Also note that when there's lots of CPUs/cores and most of them are idle (which is typical for modern desktop users) the scheduler's problems are much harder to notice. CPU manufacturers increasing the number of CPUs/cores is hiding the "dodgy scheduler" symptoms much better than anything that the Linux community have done.

However, despite all of this, when it comes to interactivity Linux still gets it's behind kicked badly by almost all other OSs. You only have to compare Linux desktop performance to something like Haiku or Windows (e.g. running similar software on the exact same hardware, with some background load) to see how severely crappy the Linux scheduler is for interactive/desktop users.

Now I'm not saying that Linux (the kernel) deserves 100% of the blame - it doesn't. Part of the problem is that the Linux kernel is bundled with a generic user-space that doesn't use task priorities properly, and the Linux kernel has to work well for this dodgy/generic user-space. If they fix the scheduler it won't work well with the existing user-space (and will probably make things seem worse), and if they fix the user-space it'd be mostly pointless. They'd have to fix both at the same time, which would be a huge nightmare (it'd be like trying to train hundreds of cats to do synchronised swimming).

Basically, given the existing user-land (which doesn't use explicit priorities) and inadequate organisation between parties (no central leadership to coordinate an "OS wide" change); it's impossible to write a scheduler for Linux that isn't completely stupid.

We're hobbyist OS developers. We don't have these problems. We do control our scheduler's design and our user-land. We can have explicit priorities if we want. We don't need to settle for "completely fair" (or "completely stupid"). We can do better than Linux.

I just don't want people to see the Linux scheduler and think it's "good" just because it's Linux, without actually thinking about what they want for their OS.

Cheers,

Brendan

Owen · Post by **Owen** » Mon Aug 06, 2012 11:27 am

$ man 2 setpriority

man 2 setrpiority wrote: Name
getpriority, setpriority - get/set program scheduling priority

(That man page fails to document it, but Linux also takes PRIO_THREAD as a which value)

(See also: nice(1), renice(1))

bewing · Post by **bewing** » Mon Aug 06, 2012 12:45 pm

Antti wrote: Linus easily criticise something to be completely stupid, design disaster, brain damaged etc. There is nothing wrong with that because mostly the statements are true and people have different writing styles. I avoid using these expressions but that is because usually I do not know for sure whether something really is that bad.

Two very important qualities for successful OS design are 1) having strong opinions, and 2) not being timid.

Brendan · Post by **Brendan** » Mon Aug 06, 2012 2:44 pm

Hi,

Owen wrote:$ man 2 setpriority
man 2 setrpiority wrote: Name
getpriority, setpriority - get/set program scheduling priority

The main problem with this is that it doesn't work. You can't set a task to "only run if no high priority tasks can run", and it doesn't effect preemption (e.g. high priority tasks that unblock do not preempt low priority tasks that happen to be running at the time).

Basically, even if a software developer wastes their time trying to find a way to get setpriority() to work in all the different kernel version and end-user configuration cases, it still won't help things like how quickly a process can respond to keyboard/mouse/network.

Cheers,

Brendan

Antti · Post by **Antti** » Tue Aug 07, 2012 4:24 am

First of all, it is nice to solve priority issues (pun intended).

Thank you Brendan for posting an extensive reply. I read the post carefully and I am now beginning to understand the issue. I found it interesting when you (almost) said that the increasing number of CPUs/cores is making the whole scheduler design less meaningful than it used to be with a single CPU computer. Unless the system has a very high load.

Is it just sad but true that the increase of hardware resources leads to inefficient algorithms? For example, a thread could always have a dedicated CPU? This does not sound plausible yet but at least the scheduler would be quite simple.

I will continue with my round-robin-like scheduler because it currently is good enough for my OS. After a year, it might not be enough...

rdos · Post by **rdos** » Tue Aug 07, 2012 6:10 am

Antti wrote:I will continue with my round-robin-like scheduler because it currently is good enough for my OS. After a year, it might not be enough...

I'll continue with my prioritized round-robin scheduler as well.

I think Brendan forgets an important point: Letting applications set their own priorities will lead to endless priority-wars, because every software developper think their application is the most important, and that it should have the highest priority possible (maybe except for backup-utilities, but it is valid for everything else). That's because giving your application lower priority makes it run badly, which means people think you wrote a bad app.

Therefore I think the ideal solution is to let the OS decide what is important, and not userspace. Therefore, Linux actually is on the right track, even if their kernel scheduler implementation is not optimal.

Even if I do have priorities in my OS, no userland applications really use them, rather they default to some standard-value that ensures fairness. It is only some kernel-threads that run with above nornal priority.

OSwhatever · Post by **OSwhatever** » Tue Aug 07, 2012 6:45 am

Time based schedulers have also an opportunity o give processes unequal "weights" which means that the user can manually assign one process more time share if necessary.

Having user processes play around with priorities is usually not used at all unless there is some kind of special demand. The point from the beginning was how create a time based scheduler which I'm still unsure how to implement it.

The classical trick is having two queues, and when the time share is done the process is put on the second queue. The second queue is not considered until the first queue is empty and the queues are swapped in order to make the second queue the active queue. In practice this method should work quite well, I'm not sure if there are any drawbacks with this method. I link Linux used this method at some point and then they moved on with their time based version just using a red black tree with time as key. Not sure why they did though.

bewing · Post by **bewing** » Tue Aug 07, 2012 12:38 pm

It's the user who knows which apps are the most/least important to the user. Some user applications do know that some of their threads are low priority -- but they should not really be allowed to set their own priorities above normal, as rdos said. Asking the OS to guess is ridiculous -- it has no information on what the user needs at any particular moment. The shell needs to give the user a good and exclusive mechanism to adjust priorities -- it's as simple as that.

Brendan · Post by **Brendan** » Tue Aug 07, 2012 2:44 pm

Hi,

rdos wrote:I think Brendan forgets an important point: Letting applications set their own priorities will lead to endless priority-wars, because every software developper think their application is the most important, and that it should have the highest priority possible (maybe except for backup-utilities, but it is valid for everything else). That's because giving your application lower priority makes it run badly, which means people think you wrote a bad app.

No. Most (non-Unix) OS's have been doing it for years without any problem.

Don't forget that we're actually talking about threads. For example, a normal application would/should have a high priority thread for dealing with the user (e.g. keyboard, mouse, video) and lower priority thread/s for doing the grunt work. If the software developer tries to make their "back end" high priority then they kill the responsiveness of their "front end"; and only a very stupid developer would do that.

bewing wrote:It's the user who knows which apps are the most/least important to the user. Some user applications do know that some of their threads are low priority -- but they should not really be allowed to set their own priorities above normal, as rdos said.

It's a very bad idea to assume that end-users are more knowledgeable than software developers and have a better idea of what priorities different things should run at. An OS that requires end-users to constantly mess about with task priorities (e.g. every time any process or thread is started) should have a big splash screen at boot that says "WARNING: We failed to do it right, therefore we expect you to sort out our mess".

bewing wrote:Asking the OS to guess is ridiculous -- it has no information on what the user needs at any particular moment. The shell needs to give the user a good and exclusive mechanism to adjust priorities -- it's as simple as that.

Yes - the OS/kernel has little or no idea what different threads are doing and can't make sane priority decisions (however; an OS/GUI can know which application has "focus" and can give that application a priority boost). The user doesn't know either (normally the user doesn't even know how many threads each application actually has, or why). Only the software developer that designed the process knows what that process' threads do, and they are the only ones able to make sane priority decisions.

OSwhatever wrote:Time based schedulers have also an opportunity o give processes unequal "weights" which means that the user can manually assign one process more time share if necessary.

Time based schedulers (which have historically been used in Unix-like OSs) suck. When the user presses a key they do not want to wait for thousands of low priority tasks to consume 1 ms each before the application they're using has a chance to respond to that key press.

I can have thousands of low priority threads running in the background all consuming as much CPU time as they can get their hands on, and the user won't be able to tell the difference (the GUI and applications will respond just as fast as they would if the computer was idle).

One of the most important parts of scheduler design is to ensure that lower priority threads won't get any CPU time at all if it's possible for the scheduler to run higher priority threads instead; including making sure that when a higher priority thread becomes "ready to run" it pre-empts any lower priority threads immediately (not when the currently running thread finishes using it's time slice).

Cheers,

Brendan

gerryg400 · Post by **gerryg400** » Tue Aug 07, 2012 2:50 pm

bewing wrote:It's the user who knows which apps are the most/least important to the user. Some user applications do know that some of their threads are low priority -- but they should not really be allowed to set their own priorities above normal, as rdos said. Asking the OS to guess is ridiculous -- it has no information on what the user needs at any particular moment. The shell needs to give the user a good and exclusive mechanism to adjust priorities -- it's as simple as that.

I'm pretty sure that most computer users do not know which apps are most important to them. Most users do not know what an 'app' or 'thread' is and certainly it is impossible for an average user to instruct a scheduler in real time.

OSDev.org

Fair scheduling using virtual time

Fair scheduling using virtual time

Re: Fair scheduling using virtual time

Re: Fair scheduling using virtual time

Re: Fair scheduling using virtual time

Re: Fair scheduling using virtual time

Re: Fair scheduling using virtual time

Re: Fair scheduling using virtual time

Re: Fair scheduling using virtual time

Re: Fair scheduling using virtual time

Re: Fair scheduling using virtual time

Re: Fair scheduling using virtual time

Re: Fair scheduling using virtual time

Re: Fair scheduling using virtual time

Re: Fair scheduling using virtual time

Re: Fair scheduling using virtual time