True scheduler

wolfram · Post by **wolfram** » Mon Mar 15, 2010 2:52 pm

Scheduler usually limits working time of a thread by giving timeslice to them. When the thread exhausts its timeslice, it preempts. Some threads get more long timeslice, some — less long...
So, thread's working time controlled by the timer. We can say that threads(tasks) divide access to the timer.
Yes, usually say that tasks divide access to the CPU(or CPUs). But we don't controll how many instructions thread execute on CPU. We don't control that a thread uses the CPU, it can write to HD. That's why let me say that threads divide access to the timer.
Why not schedule threads guiding by other limits? For example by writing speed to HD, in MiB/S. If a thread writes to HD and exceeds its writing speed limit, it just preempts, inspite of it has got time quantum.

What do you think about it?

P.S. I have heard some operating system uses this shcedulling scheme, but I'm not shure, that it's the truth.
P.P.S. Please, sorry for my bad English, it's not my native language.

Love4Boobies · Post by **Love4Boobies** » Mon Mar 15, 2010 5:21 pm

This makes no sense. I'm not sure if I understand your post right, but I want to touch on a couple of things:

Scheduling by counting CPU instructions isn't fair because one instruction may take more cycles than another. Also, if the hardware had to check the program counter at every step, you'd have uneeded overhead.
Why do you think the thread will even write anything to the disk? Perhaps it can control the writing speed, perhaps it can't. Why is this even remotely related to scheduling?

wolfram · Post by **wolfram** » Mon Mar 15, 2010 6:08 pm

Love4Boobies wrote:I'm not sure if I understand your post right, but I want to touch on a couple of things:

You haven'y understood my post. May be it's my bad English?

Love4Boobies wrote:

Scheduling by counting CPU instructions isn't fair because one instruction may take more cycles than another. Also, if the hardware had to check the program counter at every step, you'd have uneeded overhead.

I didn't have in mind that CPU must count instructions. I just explained what I mean by "access to the timer". A thread can do anything. It is not important. It is important that it is active until it expires its quantum. Then it preempts.(Of course, a thread may be preempted if thread with higher priority becomes ready for execution, but now it is not important).
My idea is to not limit thread execution only by time quantum. It may be disk I/O speed limit, network I/O limit etc.

Love4Boobies · Post by **Love4Boobies** » Mon Mar 15, 2010 6:11 pm

Oh, okay - but as far as disks are concerned, I'd go for "amount written" instead of speed. You'd probably end up with better disk response but worse overall performance and more fragmentation. I have no benchmarks to support this.

aeritharcanum · Post by **aeritharcanum** » Mon Mar 15, 2010 7:00 pm

Hi:

Or you could let the scheduler schedule all threads fairly, and have the user decide which threads need high priority scheduling. and @OP:

The most a kernel can do within reason is try to detect whether a thread is I/O bound or CPU bound. anything beyond that is impossible or approaching impracticality.Keeping track of the number of bytes or KB or whatever a thread reads or writes to disk/network/etc is adding pointless overhead to every I/O operation which is monitored in that manner. You'll have a list of threads and some form of per-process data where you would store the 'nBytesIoOperations' or what-have-you for each thread, and upon each read or write I suppose you then search the whole list, find the right thread, and increment its byte usage counter?

Expensive...Uselessly so...

--All the best,
aeritharcanum

geomagas · Post by **geomagas** » Tue Mar 16, 2010 3:00 am

Hi,

@OP:

Fairness is, IMHO, the main motivation for exploring the possibility of inventing a new scheduling algorithm. It has always been like that. Fairness guided the inventor of RoundRobin. It wasn't fair for a process (thread) to occupy the cpu (probably the most valuable resource of a computer) for as long as it takes for it to finish, letting other processes starve. And the way to measure fairness was time. Circuits for measuring time not only existed, but most importantly were (and are) external to the cpu, which implies the ability to interrupt the cpu whenever appropriate and let the kernel decide what to do, according to the scheduler policy.

One thing to clarify here is that, in a time-sharing system, the timer is probably the main source of process preemption, but not the only one. It is more than obvious that, when a process has to wait for some event to happen, it should give up the cpu until the event happens (executes a system call eg. semaphore_wait, or an I/O system call that implies semaphore _wait -- kernel takes over and decides what to do according to some blocking policy). That includes I/O operations as well as other stuff like message handling, IPC etc. Again, when the event happens, there should be an external mechanism to signal the cpu (kernel takes over and decides what needs to be done according to some unblocking policy).

As you can see, if someone wants to apply fairness using criteria other than time, the mechanisms are already there (blocking/unblocking) and have nothing to do with the scheduler. The scheduler is about time.

@aeritharcanum: I generally agree with what you say, I only want to point out that, to implement a process-switching system that makes decisions having more than time in mind, you will eventually have to keep track of the info you need to make the decision. For example, if the criterion is "the use of resource X" (X can even be quanta) you have to store the measurement of how much P uses X and update it every time X is used. That doesnt necessarily make it expensive, it depends on the resource, the number of different resources you want to monitor and the frequency those resources are granted to processes (among other things, i guess).

Regards

Gigasoft · Post by **Gigasoft** » Tue Mar 16, 2010 5:56 am

aeritharcanum wrote:Keeping track of the number of bytes or KB or whatever a thread reads or writes to disk/network/etc is adding pointless overhead to every I/O operation which is monitored in that manner. You'll have a list of threads and some form of per-process data where you would store the 'nBytesIoOperations' or what-have-you for each thread, and upon each read or write I suppose you then search the whole list, find the right thread, and increment its byte usage counter?

Expensive...Uselessly so...

Huh? An operating system always knows which thread it's currently executing. Why should it need to search for it? Information about the current thread is frequently accessed by an operating system, and this has to be fast (otherwise threre's something very wrong with your design). Reading the the variable which points to the current thread, and adding a value to some field within that structure takes just a few nanoseconds, while the I/O operation itself typically takes much longer. Windows does keep I/O operation statistics, by the way.

Solar · Post by **Solar** » Tue Mar 16, 2010 6:21 am

The main argument against this is probably that different applications are more or less I/O centric. A database application will have lots of I/O, while a raytracer has very little - which doesn't mean the raytracer does less.

No matter whether it's CPU cycles, memory usage, I/O bandwidth - what all these have in common is that they take time.

@ wolfram:

If your goal is writing an OS, I would strongly suggest to get some literature on the subject of task scheduling, and opt for one of the concepts described there. You are of course free to adapt, mix & mingle, and otherwise modify the algorithms you find in the books, but I wouldn't set out to create a whole new scheduler concept unless your goal is researching scheduler algorithms.

aeritharcanum · Post by **aeritharcanum** » Tue Mar 16, 2010 6:25 am

Hi:

@Gigasoft:

Sorry, but what...? In a Uni processor system, where you know how many CPUs you're going to be dealing with (one), of course the kernel always "knows" which thread the CPU is executing instantly: it's only good design that in such a case you use a global variable to point to the CPU-data for the only CPU on the board.

But when dealing with a variable number of CPUs, as in x86-pc-*, for example, this is obviously not feasible. A single global variable for one CPU will obviously not work there, unless you write a kernel with a BKL, and allow only one CPU into the kernel at once. But then you already have an inefficient hacked up kludge, so what's one more ugly, messy thorn then?

The normal thing to do on an MP system is to have a linked list of CPUs. Every CPU, upon entering the kernel, must have some way of "knowing" its ID within the kernel, and then looking itself up in this list of CPUs. Unless you make another bad design decision, and have a huge static, speculative array of CPUs, this is called 'overhead'. From here, you now access the pointer within the current CPU's per-CPU data structure to get the information on which task it's executing. You may now read/update, etc that task's information.

If you do this lookup and access this information on every I/O access, this is called 'unnecessary overhead'. In a process such as GNU make, or any Database back-end, or any standard archive manager, or a compiler, or any other I/O bound application, this will needlessly harbour performance.

EDIT: About your Windows assertion: I have never seen this information on processes' I/O usage while using Windows. It may or may not be there; Regardless, since when has Windows been a paradigm for good design? And also, what good does this information do for the kernel? The use is usually fully aware of how much information process X will process on disk, or over the network, before he chooses to run process X. And if process X deems it fit to monitor its own I/O footprint, it will usually do so of its own accord. Take a Download Manager for example.

--All the best
gravaera.

Combuster · Post by **Combuster** » Tue Mar 16, 2010 6:30 am

But the normal thing to do is to have a linked list of CPUs.

It's not normal. Linked lists have O(n) indexing and search time. Please use an (dynamically allocated) array.

Gigasoft · Post by **Gigasoft** » Tue Mar 16, 2010 7:36 am

Usually, on an x86 based system, one keeps a different GDT for each CPU, with an entry pointing to the information about that CPU. Another solution, involving more hassle, would be to have a global page mapped differently in every CPU. Or one could read the processor number from the local APIC and look it up in a table.

Either way, you don't need a "huge" array of CPUs. Memory only needs to be allocated for the CPUs that are present, and for the pointer table, one knows the maximum number of CPUs supported by the architecture the OS is designed for.

In Windows, selector 0x30 is used to point to CPU specific information, and this is loaded into the FS register at the start of every interrupt handler when control is transferred from user mode.

Solar · Post by **Solar** » Tue Mar 16, 2010 8:00 am

Just as a toss-in, with the Linux kernel you get an option to state the maximum number of SMP CPUs the kernel should run on. I.e., Linux apparently uses a statically sized array for CPU data.

geomagas · Post by **geomagas** » Fri Mar 19, 2010 4:26 am

I think this is getting offtopic. Single or multiple CPUs, separate GDTs or not, linked lists or static/dynamic arrays, the fact is that, no matter the implementation,

Gigasoft wrote:Information about the current thread is frequently accessed by an operating system, and this has to be fast (otherwise threre's something very wrong with your design).

I think the OP's point is "what's fair and what's not". Apparently motivated by <efficiency>=k*<fairness> where k is a system-specific constant (includes every other aspect of the kernel as well as the h/w synthesis), which may be arguable (anyone?).

I also think that fairness measures "the ability to share a finite number of resources to a finite number of processes". To enforce a policy that implements this, a kernel must be able to listen to requests for allocation/release of resources and to respond accordingly. Also, it must be able to actively preempt a process from a resource it is currently holding, whenever it sees fit.

What troubled me all along with the OP's theory (and I just realized it) is that, the preemption of a process from the (a) CPU is judged by the amount of usage of another resource (ie the HD). Isn't this wrong? Moreover, isn't it unfair to begin with?

I can imagine the kernel saying "Ok, you want the CPU? Take it for 10ms, then get back in queue" or "You want to write to the disk? You can write 1M and then let others access it until it is your turn again (and give up the CPU while you are waiting, since meanwhile you have nothing else to do with it)".

But not "What? You want the CPU? Well, let me see... You have already read 100G from disk and written 20G, so... no CPU for you!". What if the process changed behaviour and now wants to do an intensive calculation on the 100G it fetched from disk? And what if... the process waiting list is empty?

To conclude: A process P's access to resource X should be judged according to 1) how much P uses X and 2) how many others request access to X at the same period.

Disclaimer: I am not an expert to the subject, and the above is more a question than a point, as I find the subject sufficiently interesting, although not too important for my own implementation right now.

Regards

qw · Post by qw » Fri Mar 19, 2010 6:42 am

Geomagas,
You're making a point. Why would the scheduler deny a thread CPU time because of its disk usage?

Moreover, IMHO it makes little sense to "fairly" share disk usage between threads. Some threads may not use the disk at all. Threads are all about CPU time, not about disk usage. (The same counts for other resources.)

Roel

P.S. I think it comes down to: if it's not broken, don't fix it.

Brendan · Post by **Brendan** » Sat Mar 20, 2010 3:23 am

Hi,

For "fair" scheduling, 2 tasks would get an equal share of CPU time, so an important task (e.g. a user interface that needs to respond quickly to keyboard and mouse but also needs to process lots of graphics data) gets the same share of CPU time as an unimportant task (e.g. something doing a spell check in the background).

What I'm suggesting here is that "fair" is stupid - it only makes sense if tasks are as important as each other (or, if the OS is so crappy that it has no idea how important each task is).

The unfortunate part is that there's quite a few of these crappy OSs, and even more crappy/portable programs that don't use task priorities.

For an example, for something I'm writing (a utility in C that uses POSIX pthreads) I tried my hardest to give some of my more important threads higher priority than other less important threads, and found that it's entirely impossible on Linux unless the threads use the "real time" scheduling policy (which is entirely inappropriate for my utility, and only works if the process has permission to use the "real time" scheduling policy), and even less possible to do it in a portable way that behaves correctly in different OSs that support pthreads. The end result is, despite my best effort, my utility is yet another crappy program that doesn't use task priorities.

It might be possible to trace this stupidity through the evolution of Unix, all that way back to ancient "time shared" systems (that used a round-robin scheduler and scheduled programs that were designed for single-tasking systems). I don't know. What I do know is that there's been several scheduler rewrites in Linux to attempt to "fix" the symptoms of the problem and no serious attempt to fix the cause of the problem. The latest attempt is the "Completely Fair Scheduler"; which is probably (at least partially) responsible for idiotic idea of "fair" scheduling becoming popular recently.

To fix the cause of the problem they'd need update the underlying SYSV/POSIX specifications, rewrite the scheduler in all "Unix style" OSs, and modify all the software that's designed for any/all of these OSs to take advantage of prioritised scheduling for normal (non-real time, non-idle) tasks. It'd be a massive nightmare, especially because of the "inmates run the nut-house" (virtually zero leadership) development model used in most "Unix style OS" and open source projects.

Non-Unix OSs (VMS, the Windows NT kernel, BeOS, all real time kernels, etc) have done scheduling properly for a very long time. Of course this means there's no recent hype; which in turn makes it easy for deluded "Unix people" to confuse "better" with "good" (for example, if someone is stabbed in the eye less often they might think that is a "good thing", and forget that some people aren't stabbed in the eye at all).

@wolfram: The idea of scheduling can be applied to any shared resource. For example, an OS might have a "CPU scheduler" that determines how CPU/s are shared between tasks; and may also have several "I/O schedulers" that determine how disk access (or networking, etc) is shared between different tasks. Typically a task's priority effects both it's use of CPU time and it's use of I/O. For example, when a higher priority task becomes ready to run it might preempt lower priority tasks that are running on the CPU at the time (so the higher priority task gets CPU time immediately and lower priority tasks have to wait); and when a higher priority task asks to transfer data to/from disk then it's disk I/O might preempt any disk I/O for lower priority tasks (so the higher priority task gets disk access immediately and lower priority task's disk access has to wait).

The main idea is that important things don't wait for unimportant things to complete; so that important things happen quickly (and unimportant things happen slower). Of course without any usable priorities for normal tasks you can't really have usable I/O scheduling for normal tasks either...

Cheers,

Brendan

OSDev.org

True scheduler

True scheduler

Re: True scheduler

Re: True scheduler

Re: True scheduler

Re: True scheduler

Re: True scheduler

Re: True scheduler

Re: True scheduler

Re: True scheduler

Re: True scheduler

Re: True scheduler

Re: True scheduler

Re: True scheduler

Re: True scheduler

Re: True scheduler