Multitasking, SMP & Timers

Solar · Post by **Solar** » Mon May 21, 2012 4:27 am

...aaaaaand now you've made "scheduling" a long and confusing section in the administrator's manual instead of somehow solving it gracefully in the algorithm.

rdos · Post by **rdos** » Mon May 21, 2012 5:14 am

Solar wrote:
rdos wrote:It is possible to let higher priority threads always interrupt lower priority threads. The scheduler will keep per-priority queues of ready threads (and a pointer to the current highest priority queue), and will only select threads from the highest priority queue.
If you have a task that is high-priority but doesn't make system calls that often, that thread will almost always be "ready", stopping your lower-priority household tasks cold and starving the system.

It really doesn't matter if it makes system calls or not. If a thread consume lots of cycles in user-space, the preemption timer will expire and select a new thread in the same priority queue. Also, user threads are almost always of the same priority in my design (the C++ API doesn't have the priority parameter). It is commonly kernel threads that run with above or below normal (user) priority. So unless somebody explicitly uses the low-level API to create a thread with above normal priority, and then let it calculate PI with five million digits, it should not be a problem.

rdos · Post by **rdos** » Mon May 21, 2012 5:21 am

Owen wrote:Then such a task should not be allowed to run high priority. For example, you might cap user applications at "kPriorityNormal", except for those run by administrators, which are capped at "kPriorityAboveNormal", while the session manager might run at "kPriorityAboveNormal"/"kPriorityHigh" respectively.

Perhaps the thing we can take from this is that "Urgency" and "Priority" should be separate variables - urgency controls the processing order (i.e. more urgent tasks absolutely preempt less urgent ones) while priority controls the time slice size

I strongly feel that applications should not mess with priority or urgency, as it will lead to priority-wars or urgency-wars as everybody think their application is the most urgent and should have the highest priority. It is better to let the scheduler assign default priorities for user threads, and then possibly award cooperative behavior (giving up control voluntary) and punish uncooperative behavior (causing preemption timer to expire). In that scenario, giving low-priority threads longer time-slices might make sense.

rdos · Post by **rdos** » Mon May 21, 2012 5:53 am

Brendan wrote:This also means that tasks sleep until "now() > task->wake_time"; and you don't have to decrease a "time remaining" value for every sleeping task every time the timer IRQ occurs.

This also makes the delta queue idea a bit redundant - you just use a list/queue that is sorted in order of wake time.

To reduce the "O(n)" insertion time, you can have buckets - one list for tasks that should wake up in the next 100 ms, one list for tasks that should wake up between 100 and 200 ms, one list for tasks that should wake up in 200 to 300 ms, etc. If you've got 128 lists/queues (representing 100 ms of time each) then it only covers 12.8 seconds. For delays larger than that you'd need a "longer than 12.8 seconds" list/queue too. Every 100 ms your timer IRQ handler would switch to the next list and discard the old (empty) list, then create a new list by splitting the "longer than 12.8 seconds" list.

That also seems awfully complicated for little to no gain. If you chose a sufficiently short preemption timeout in the scheduler (in your case perhaps 50ms, although I use 1ms), then it is easier to just update a list of low precision timers every time the scheduler is invoked. This way, there would be no reentrancy issues. Additionally, you don't need any extra IRQs to update lists when there might be no entries and thus nothing to do.

Brendan wrote:This means faster insertion, more queues/lists, more locks, less lock contention, etc. In theory, this can also be "per CPU" (e.g. using local APIC timer for the timer IRQ) to further reduce insertion times; with some other timer (e.g. HPET) used as a backup for when the local APIC is turned off to save power (e.g. where you merge a CPU's queues with the HPET's global queue before turning the local APIC timer off).

Less lock contention? Updating when the scheduler runs means no locks are needed.

Brendan wrote:The scheduler decides which CPU a task should use when a task needs to be put back onto a "ready to run" queue. This is how load balancing works (e.g. one CPU might look at it's own queue and say "I'm too busy!" and put the task on a different CPU's queue).

That means the local CPU's thread queues must use spinlocks, somewhat defeating the purpose of having per-CPU thread queues. A much better approach is to place excess load in a global queue, with a global spinlock. Then as long as the CPU has threads to run it never needs to access any spinlocks to schedule work.

Brendan wrote:A CPU can only take tasks off of its own "ready to run" queue; but because any CPU can insert a task into any other CPU's run queue you'd still need a lock for each queue.

Right, which kind of defeats the purpose of local queues.

Another problem is that CPUs can start to crosspost threads in each others queues all the time, eventually getting stuck on the spinlocks used, and getting no work done. This is also a positive feedback loop as increased load means more CPUs will claim "I'm overloaded", and thus would crosspost more, leading to more lock-contention, which leads to less work being done and even more overload. Moving threads also means more cache misses which also leads to more overload. The proper thing to do in overload situations is to minimize overhead, and especially to minimize thread movement between cores.

Brendan wrote:Alternatively you could use IPIs. For example, if one CPU wants to put a task onto another CPU's queue, it could send an IPI to the other CPU to ask it to put the task on it's own queue. However, assuming low lock contention, an IPI is more expensive than locks; and because you need to tell the other CPU which task to put on its queue you'd have to store a "thread ID" somewhere and you'd probably need a lock to protect that (so a third CPU doesn't trash your "thread ID" value while you're sending the IPI).

That's even worse since IPIs consume a massive amount of CPU cycles, primarily on the target CPU, and these cycles does nothing useful and thus contribute to more IPIs and more overload.

Solar · Post by **Solar** » Mon May 21, 2012 5:58 am

I agree that by default all user-space applications should run with the same priority.

But the user (or the administrator) might want to take the priority of a specific application up or down a notch for performance fine-tuning. After all, that's what priorities are for.

rdos wrote:If a thread consume lots of cycles in user-space, the preemption timer will expire and select a new thread in the same priority queue

...and if the current thread is the only one of its (high) priority, it will be promtply scheduled again, because...

rdos wrote:the scheduler [...] will only select threads from the highest priority queue.

That concept is broken. So much so that it got its own Wikipedia entry.

rdos · Post by **rdos** » Mon May 21, 2012 5:59 am

bluemoon wrote:
gerryg400 wrote:Just a clarification, by "You don't really need a sleep queue." I mean your scheduler doesn't need a sleep queue, just a bunch of ready queues.
For the very same reason you don't need a queue for blocked thread. sleeping is just another word for blocked for timer.

True, but it is quite convinient to be able to see (with some kind of tool) why a thread is blocked. Just saying "waiting for timer" for every possible block-type is not helping much.

Solar · Post by **Solar** » Mon May 21, 2012 6:01 am

rdos wrote:A much better approach is to place excess load in a global queue, with a global spinlock. Then as long as the CPU has threads to run it never needs to access any spinlocks to schedule work.

And as soon as the system gets under load, any threads put into the global queue never get rescheduled since each CPU is happily munching away on its own thread list...

rdos · Post by **rdos** » Mon May 21, 2012 6:22 am

Solar wrote:
rdos wrote:A much better approach is to place excess load in a global queue, with a global spinlock. Then as long as the CPU has threads to run it never needs to access any spinlocks to schedule work.
And as soon as the system gets under load, any threads put into the global queue never get rescheduled since each CPU is happily munching away on its own thread list...

A CPU will check the global queue from time to time and schedule entries from there. The thing is that it will not do it every time if it has threads ready to run, as is the case when CPUs crosspost directly between themselves. Also, the CPU needs to check the highest priority entry in the global queue, against it's own highest priority entry, but this can be done without locks.

In fact, in my algorithm, entries from the global queue are only taken when they are higher priority than the current local queue. This will not lead to starvation because either the CPU has idle time (in which case it takes entries from the global queue), or it will crosspost new threads to the global queue (because they used-up their time-slice). Since there is posting to the global queue, it is inevitable that some CPU will get idle time and thus remove entries from it. Threads in the global queue might be temporarily suspended, but they are never permantently starved. If they are high priority they are never starved.

rdos · Post by **rdos** » Mon May 21, 2012 6:38 am

Solar wrote:
rdos wrote:If a thread consume lots of cycles in user-space, the preemption timer will expire and select a new thread in the same priority queue
...and if the current thread is the only one of its (high) priority, it will be promtply scheduled again, because...

rdos wrote:the scheduler [...] will only select threads from the highest priority queue.
That concept is broken. So much so that it got its own Wikipedia entry.

I don't think it is broken. It becomes broken if high priority threads never give up their time slice voluntarily, but if the design wants to cope with that it can punish such threads by lowering their priority until they are lowest priority. Personally, I don't see a need for the this safe-guard, as I write the threads myself, and assign reasonable priority to kernel threads, which are the only threads that execute with above normal priority. I also don't have the background / foreground junk of other designs, so priorities are not used as a means to boost certain threads at the expense of others in a badly designed GUI.

Combuster · Post by **Combuster** » Mon May 21, 2012 6:43 am

This will not lead to starvation

Wrong. Counterexample:

Code: Select all

void process_a
{
    while(1)
    {
        int i = recv(); // blocking call
        send(i+1);
    }
}

void process_b
{
    int i = 0;
    while(1)
    {  
        send(i)
        i = recv(); // blocking call
    }
}

rdos · Post by **rdos** » Mon May 21, 2012 7:07 am

Combuster wrote:

This will not lead to starvation

Wrong. Counterexample:

Code: Select all

void process_a
{
    while(1)
    {
        int i = recv(); // blocking call
        send(i+1);
    }
}

void process_b
{
    int i = 0;
    while(1)
    {  
        send(i)
        i = recv(); // blocking call
    }
}

Bad counter example as this is exactly what one of my test apps do (with multiple senders if wanted), and it doesn't lead to starvation. It works just fine because:

1. These two processes tend to end up on the same core (because they signal each others)
2. Both will have the same priority, and the same priority as every other user thread in the system, which means they run in "round-robin" fashion with other threads on one core in the system.
3. When the sender-thread signals the receiver thread it wakes it up but doesn't schedule it. Rather, the thread is placed last in the "user priority queue".

The only thread they will starve is the null thread.

Besides, I can prove that it works with performance monitor logs.

Edit: I need to make a slight correction to the above algorithm. A certain percentage of threads that are scheduled away from are put in the global queue, regardless if they used up their time-slice or not. The default value is 35%. This is probably a little bit higher than optimal.

Solar · Post by **Solar** » Mon May 21, 2012 7:24 am

rdos wrote:I don't think it is broken.

You make a hobby of disagreeing with software engineering lore in general?

rdos wrote:Personally, I don't see a need for the this safe-guard, as I write the threads myself...

And again, you mistake your special case in your special application running on your special self-made OS with the general case being discussed here.

Edit: As can be seen in your reply to Combuster above. We were talking different priorities. You cannot magically swing the discussion your way by simply replying that they aren't different.

rdos · Post by **rdos** » Mon May 21, 2012 7:34 am

Brendan wrote:Think about tasks that block.

You might do something like this:
Task is running, timer is counting down

Task decides to "sleep()" or block waiting for IO or something

Kernel puts task into some sort of "waiting" state

Kernel tries to find another task to switch to

Kernel disables IRQs

Kernel switches to the new task

Kernel sets new timer count

Kernel enables IRQs

New task is running

By turning of interrupts for this long you have effectively ruined each and every possibility of having decent interrupt performance. I bet this is why you need minimum delays with timer IRQs.

A more proper way to do it:

1. Task decides to "sleep()" or block waiting for IO or something
2. Kernel puts task into some sort of "waiting" state
3. Kernel acquires scheduler lock
4. Kernel processes delayed IRQs
5. Kernel tries to find another task to switch to
6. Kernel loads new timeout (if expired go to 4)
7. Kernel setup up LDT and CR3 environment
8. Kernel releases scheduler lock and disables IRQs (if waiting IRQ events, goto 3)
9. Kernel loads new register context and switches to the new thread (iretd enables interrupts again)

rdos · Post by **rdos** » Mon May 21, 2012 7:53 am

Solar wrote:
rdos wrote:I don't think it is broken.
You make a hobby of disagreeing with software engineering lore in general?

rdos wrote:Personally, I don't see a need for the this safe-guard, as I write the threads myself...
And again, you mistake your special case in your special application running on your special self-made OS with the general case being discussed here.

Edit: As can be seen in your reply to Combuster above. We were talking different priorities. You cannot magically swing the discussion your way by simply replying that they aren't different.

Many early multitasking systems used this kind of scheduling, and nobody (at that time) claimed the concept was broken. The concept becomes broken with GUIs that adjust priorities based on foreground/background status of threads, or when priorities are used in other ways by the GUI (or other OS service) to accomodate specific needs of that part of the system. IMHO, the thing that is broken is dynamic priority assignment. In that context it is true that highest priority cannot always be selected for next thread to run.

So the answer depends on if you use a (broken) dynamic priority scheme or not. If the original fixed priority scheme is used, then the concept is not broken, and never has been. In the original scheme, high priority tasks are assumed to use little CPU-time, and thus will never block lower priority threads unless the system is severely overloaded.

Solar · Post by **Solar** » Mon May 21, 2012 7:59 am

I disagree. Let's leave it at that.

OSDev.org

Multitasking, SMP & Timers

Re: Multitasking, SMP & Timers

Re: Multitasking, SMP & Timers

Re: Multitasking, SMP & Timers

Re: Multitasking, SMP & Timers

Re: Multitasking, SMP & Timers

Re: Multitasking, SMP & Timers

Re: Multitasking, SMP & Timers

Re: Multitasking, SMP & Timers

Re: Multitasking, SMP & Timers

Re: Multitasking, SMP & Timers

Re: Multitasking, SMP & Timers

Re: Multitasking, SMP & Timers

Re: Multitasking, SMP & Timers

Re: Multitasking, SMP & Timers

Re: Multitasking, SMP & Timers