Re: Multitasking, SMP & Timers
Posted: Mon May 21, 2012 4:27 am
...aaaaaand now you've made "scheduling" a long and confusing section in the administrator's manual instead of somehow solving it gracefully in the algorithm.
The Place to Start for Operating System Developers
https://f.osdev.org/
It really doesn't matter if it makes system calls or not. If a thread consume lots of cycles in user-space, the preemption timer will expire and select a new thread in the same priority queue. Also, user threads are almost always of the same priority in my design (the C++ API doesn't have the priority parameter). It is commonly kernel threads that run with above or below normal (user) priority. So unless somebody explicitly uses the low-level API to create a thread with above normal priority, and then let it calculate PI with five million digits, it should not be a problem.Solar wrote:If you have a task that is high-priority but doesn't make system calls that often, that thread will almost always be "ready", stopping your lower-priority household tasks cold and starving the system.rdos wrote:It is possible to let higher priority threads always interrupt lower priority threads. The scheduler will keep per-priority queues of ready threads (and a pointer to the current highest priority queue), and will only select threads from the highest priority queue.
I strongly feel that applications should not mess with priority or urgency, as it will lead to priority-wars or urgency-wars as everybody think their application is the most urgent and should have the highest priority. It is better to let the scheduler assign default priorities for user threads, and then possibly award cooperative behavior (giving up control voluntary) and punish uncooperative behavior (causing preemption timer to expire). In that scenario, giving low-priority threads longer time-slices might make sense.Owen wrote:Then such a task should not be allowed to run high priority. For example, you might cap user applications at "kPriorityNormal", except for those run by administrators, which are capped at "kPriorityAboveNormal", while the session manager might run at "kPriorityAboveNormal"/"kPriorityHigh" respectively.
Perhaps the thing we can take from this is that "Urgency" and "Priority" should be separate variables - urgency controls the processing order (i.e. more urgent tasks absolutely preempt less urgent ones) while priority controls the time slice size
That also seems awfully complicated for little to no gain. If you chose a sufficiently short preemption timeout in the scheduler (in your case perhaps 50ms, although I use 1ms), then it is easier to just update a list of low precision timers every time the scheduler is invoked. This way, there would be no reentrancy issues. Additionally, you don't need any extra IRQs to update lists when there might be no entries and thus nothing to do.Brendan wrote:This also means that tasks sleep until "now() > task->wake_time"; and you don't have to decrease a "time remaining" value for every sleeping task every time the timer IRQ occurs.
This also makes the delta queue idea a bit redundant - you just use a list/queue that is sorted in order of wake time.
To reduce the "O(n)" insertion time, you can have buckets - one list for tasks that should wake up in the next 100 ms, one list for tasks that should wake up between 100 and 200 ms, one list for tasks that should wake up in 200 to 300 ms, etc. If you've got 128 lists/queues (representing 100 ms of time each) then it only covers 12.8 seconds. For delays larger than that you'd need a "longer than 12.8 seconds" list/queue too. Every 100 ms your timer IRQ handler would switch to the next list and discard the old (empty) list, then create a new list by splitting the "longer than 12.8 seconds" list.
Less lock contention? Updating when the scheduler runs means no locks are needed.Brendan wrote:This means faster insertion, more queues/lists, more locks, less lock contention, etc. In theory, this can also be "per CPU" (e.g. using local APIC timer for the timer IRQ) to further reduce insertion times; with some other timer (e.g. HPET) used as a backup for when the local APIC is turned off to save power (e.g. where you merge a CPU's queues with the HPET's global queue before turning the local APIC timer off).
That means the local CPU's thread queues must use spinlocks, somewhat defeating the purpose of having per-CPU thread queues. A much better approach is to place excess load in a global queue, with a global spinlock. Then as long as the CPU has threads to run it never needs to access any spinlocks to schedule work.Brendan wrote:The scheduler decides which CPU a task should use when a task needs to be put back onto a "ready to run" queue. This is how load balancing works (e.g. one CPU might look at it's own queue and say "I'm too busy!" and put the task on a different CPU's queue).
Right, which kind of defeats the purpose of local queues.Brendan wrote:A CPU can only take tasks off of its own "ready to run" queue; but because any CPU can insert a task into any other CPU's run queue you'd still need a lock for each queue.
That's even worse since IPIs consume a massive amount of CPU cycles, primarily on the target CPU, and these cycles does nothing useful and thus contribute to more IPIs and more overload.Brendan wrote:Alternatively you could use IPIs. For example, if one CPU wants to put a task onto another CPU's queue, it could send an IPI to the other CPU to ask it to put the task on it's own queue. However, assuming low lock contention, an IPI is more expensive than locks; and because you need to tell the other CPU which task to put on its queue you'd have to store a "thread ID" somewhere and you'd probably need a lock to protect that (so a third CPU doesn't trash your "thread ID" value while you're sending the IPI).
...and if the current thread is the only one of its (high) priority, it will be promtply scheduled again, because...rdos wrote:If a thread consume lots of cycles in user-space, the preemption timer will expire and select a new thread in the same priority queue
That concept is broken. So much so that it got its own Wikipedia entry.rdos wrote:the scheduler [...] will only select threads from the highest priority queue.
True, but it is quite convinient to be able to see (with some kind of tool) why a thread is blocked. Just saying "waiting for timer" for every possible block-type is not helping much.bluemoon wrote:For the very same reason you don't need a queue for blocked thread. sleeping is just another word for blocked for timer.gerryg400 wrote:Just a clarification, by "You don't really need a sleep queue." I mean your scheduler doesn't need a sleep queue, just a bunch of ready queues.
And as soon as the system gets under load, any threads put into the global queue never get rescheduled since each CPU is happily munching away on its own thread list...rdos wrote:A much better approach is to place excess load in a global queue, with a global spinlock. Then as long as the CPU has threads to run it never needs to access any spinlocks to schedule work.
A CPU will check the global queue from time to time and schedule entries from there. The thing is that it will not do it every time if it has threads ready to run, as is the case when CPUs crosspost directly between themselves. Also, the CPU needs to check the highest priority entry in the global queue, against it's own highest priority entry, but this can be done without locks.Solar wrote:And as soon as the system gets under load, any threads put into the global queue never get rescheduled since each CPU is happily munching away on its own thread list...rdos wrote:A much better approach is to place excess load in a global queue, with a global spinlock. Then as long as the CPU has threads to run it never needs to access any spinlocks to schedule work.
I don't think it is broken. It becomes broken if high priority threads never give up their time slice voluntarily, but if the design wants to cope with that it can punish such threads by lowering their priority until they are lowest priority. Personally, I don't see a need for the this safe-guard, as I write the threads myself, and assign reasonable priority to kernel threads, which are the only threads that execute with above normal priority. I also don't have the background / foreground junk of other designs, so priorities are not used as a means to boost certain threads at the expense of others in a badly designed GUI.Solar wrote:...and if the current thread is the only one of its (high) priority, it will be promtply scheduled again, because...rdos wrote:If a thread consume lots of cycles in user-space, the preemption timer will expire and select a new thread in the same priority queue
That concept is broken. So much so that it got its own Wikipedia entry.rdos wrote:the scheduler [...] will only select threads from the highest priority queue.
Wrong. Counterexample:This will not lead to starvation
Code: Select all
void process_a
{
while(1)
{
int i = recv(); // blocking call
send(i+1);
}
}
void process_b
{
int i = 0;
while(1)
{
send(i)
i = recv(); // blocking call
}
}
Bad counter example as this is exactly what one of my test apps do (with multiple senders if wanted), and it doesn't lead to starvation. It works just fine because:Combuster wrote:Wrong. Counterexample:This will not lead to starvation
Code: Select all
void process_a { while(1) { int i = recv(); // blocking call send(i+1); } } void process_b { int i = 0; while(1) { send(i) i = recv(); // blocking call } }
You make a hobby of disagreeing with software engineering lore in general?rdos wrote:I don't think it is broken.
And again, you mistake your special case in your special application running on your special self-made OS with the general case being discussed here.rdos wrote:Personally, I don't see a need for the this safe-guard, as I write the threads myself...
By turning of interrupts for this long you have effectively ruined each and every possibility of having decent interrupt performance. I bet this is why you need minimum delays with timer IRQs.Brendan wrote:Think about tasks that block.
You might do something like this:
- Task is running, timer is counting down
- Task decides to "sleep()" or block waiting for IO or something
- Kernel puts task into some sort of "waiting" state
- Kernel tries to find another task to switch to
- Kernel disables IRQs
- Kernel switches to the new task
- Kernel sets new timer count
- Kernel enables IRQs
- New task is running
Many early multitasking systems used this kind of scheduling, and nobody (at that time) claimed the concept was broken. The concept becomes broken with GUIs that adjust priorities based on foreground/background status of threads, or when priorities are used in other ways by the GUI (or other OS service) to accomodate specific needs of that part of the system. IMHO, the thing that is broken is dynamic priority assignment. In that context it is true that highest priority cannot always be selected for next thread to run.Solar wrote:You make a hobby of disagreeing with software engineering lore in general?rdos wrote:I don't think it is broken.
And again, you mistake your special case in your special application running on your special self-made OS with the general case being discussed here.rdos wrote:Personally, I don't see a need for the this safe-guard, as I write the threads myself...
Edit: As can be seen in your reply to Combuster above. We were talking different priorities. You cannot magically swing the discussion your way by simply replying that they aren't different.