Multitasking, SMP & Timers

Brendan · Post by **Brendan** » Mon May 21, 2012 12:23 pm

Hi,

rdos wrote:
Brendan wrote:This is what I call "variable time slice scheduling". It's bad because high priority tasks have to wait for all medium/low priority tasks to have their turn before they get any CPU time.
It is possible to let higher priority threads always interrupt lower priority threads.

I know. My last scheduler used this for "policy 0" (device driver IRQ handlers, etc). It's bad for normal, medium and low priority threads where the intention is to create the illusion of tasks running at the same time. Consider something like a game, where more CPU time means smoother animation. Now imagine 2 of these "as much CPU time as I can get" tasks trying to share the same CPU.

rdos wrote:
Brendan wrote:You could create different groups for different purposes. This allows you to choose the best method of scheduling for the specific type of tasks being scheduled.
That seems awfully complex. Why not simply decide that higher priority threads are always run before lower priority threads, and that higher priority threads that become ready preempt lower priority threads?

Because that doesn't allow you to choose the best method of scheduling for the specific type of tasks being scheduled. If your OS is intentionally designed to be bad for everything except one specific use, then a scheduler that is bad for everything except one specific type of tasks might be fine; but most people aren't stupid enough to intentionally limit the potential uses of their OS in that way.

rdos wrote:
Brendan wrote:For SMP, you don't want a global "scheduler lock", as many CPUs might be fighting for this lock at the same time (and wasting CPU time). The more CPUs there are the worse it would get - lock contention would cause poor scalability. If your scheduler has a lock for each CPU then the chance of CPUs fighting for the same lock is a lot less, and performance/scalability would improves. If your scheduler has 6 different locks per CPU (one for each scheduling policy) then there'd be even less lock contention and you'd improve performance/scalability even more.
You still need global locks to be able to move threads between CPUs.

Not sure where you think I claimed otherwise - I was only suggesting "many global locks" is better than "one global lock". However, it is possible to use lock-free queues (even though I didn't say so; you may not need global locks).

rdos wrote:
Brendan wrote:If one CPU is running at 2 GHz and another CPU is running at 500 MHz (because it got too hot and needs to cool down, for example); then if 2 tasks have the same priority and one gets 10 ms on the first CPU, then the other should get 40 ms on the other (slower) CPU, so that they both get the same fair share of CPU time.

The best way to do this is probably to use a performance monitoring counter setup to count "instructions retired". The scheduler doesn't let a task run for a certain amount of time, but instead the scheduler lets a task run for a certain number of instructions. If one task runs for 1 million instructions on one CPU and another task runs for 1 million instructions on another CPU, then the tasks get exactly the same share of CPU time regardless of how much "wall clock" time the tasks actually consumed.

If performance monitoring counters can't be used, then you can try to keep track of how fast a CPU can currently execute instructions and use the local APIC timer to approximate "instructions retired". This isn't going to be as accurate (especially when things like hyper-threading and TurboBoost are involved), but it's still going to be better than scheduling " raw time".
I disagree, but it depends on if these threads need more than 10ms on the fast CPU (or 40ms on the slow one). If the threads need more than this amount, load balancing should kick in and move them both to the faster CPU as they will execute faster there even when they compete for the same CPU.

You're only thinking about your own "scheduler that is bad for everything except one specific type of tasks" system. You're not taking into account tasks that should never block. For example, imagine 2 different users who are both trying to run something like Bochs at the same time - should one user get worse performance than the other user simply because their task got the slow CPU and the scheduler is unfair?

rdos wrote:
Brendan wrote:This also means that tasks sleep until "now() > task->wake_time"; and you don't have to decrease a "time remaining" value for every sleeping task every time the timer IRQ occurs.

This also makes the delta queue idea a bit redundant - you just use a list/queue that is sorted in order of wake time.

To reduce the "O(n)" insertion time, you can have buckets - one list for tasks that should wake up in the next 100 ms, one list for tasks that should wake up between 100 and 200 ms, one list for tasks that should wake up in 200 to 300 ms, etc. If you've got 128 lists/queues (representing 100 ms of time each) then it only covers 12.8 seconds. For delays larger than that you'd need a "longer than 12.8 seconds" list/queue too. Every 100 ms your timer IRQ handler would switch to the next list and discard the old (empty) list, then create a new list by splitting the "longer than 12.8 seconds" list.
That also seems awfully complicated for little to no gain. If you chose a sufficiently short preemption timeout in the scheduler (in your case perhaps 50ms, although I use 1ms), then it is easier to just update a list of low precision timers every time the scheduler is invoked. This way, there would be no reentrancy issues. Additionally, you don't need any extra IRQs to update lists when there might be no entries and thus nothing to do.

I agree - if your scheduler isn't very good (e.g. uses a timer in periodic mode) and your "sleep()" has poor precision for no reason, then your crappy piece of crud will probably be simpler than something that's several orders of magnitude better.

rdos wrote:
Brendan wrote:This means faster insertion, more queues/lists, more locks, less lock contention, etc. In theory, this can also be "per CPU" (e.g. using local APIC timer for the timer IRQ) to further reduce insertion times; with some other timer (e.g. HPET) used as a backup for when the local APIC is turned off to save power (e.g. where you merge a CPU's queues with the HPET's global queue before turning the local APIC timer off).
Less lock contention? Updating when the scheduler runs means no locks are needed.

Why stop there? If you're going to have bad precision anyway (e.g. rather than "better than micro-second" precision for "nano_sleep()"), then you can make it a lot simpler if you only update when the OS is shut down.

I also wonder if you're doing a "stop the world and reschedule all CPUs at the same time" thing that completely sucks.

rdos wrote:
Brendan wrote:The scheduler decides which CPU a task should use when a task needs to be put back onto a "ready to run" queue. This is how load balancing works (e.g. one CPU might look at it's own queue and say "I'm too busy!" and put the task on a different CPU's queue).
That means the local CPU's thread queues must use spinlocks, somewhat defeating the purpose of having per-CPU thread queues. A much better approach is to place excess load in a global queue, with a global spinlock. Then as long as the CPU has threads to run it never needs to access any spinlocks to schedule work.

And under heavy load, high priority tasks that should get CPU time immediately just sit in the global queue forever because CPUs are too busy to bother checking the global queue? You can see how this could be funny if the task sitting in the queue forever is a shell that the administrator is trying to use to kill a fork bomb.

rdos wrote:
Brendan wrote:A CPU can only take tasks off of its own "ready to run" queue; but because any CPU can insert a task into any other CPU's run queue you'd still need a lock for each queue.
Right, which kind of defeats the purpose of local queues.

The purpose of the local queues was to drastically reduce the chance of lock contention. A very small chance of lock contention (even when there's hundreds of CPU under heavy load) does not defeat the purpose of local queues. Creating severe design flaws just to avoid an irrelevant "non-problem" isn't an improvement.

It is a fact that nobody has ever been bitten by a shark when a they've got a pineapple jammed up their rectum; therefore (even though sharks aren't a problem on dry land and my solution is far worse than the problem it tries to solve); you should insert a pineapple up your butt to avoid being bitten by sharks.

rdos wrote:Another problem is that CPUs can start to crosspost threads in each others queues all the time, eventually getting stuck on the spinlocks used, and getting no work done.

Only if the scheduler is incredibly stupid (e.g. shifts tasks from a CPU under load to a CPU that is under even more load).

rdos wrote:
Brendan wrote:Alternatively you could use IPIs. For example, if one CPU wants to put a task onto another CPU's queue, it could send an IPI to the other CPU to ask it to put the task on it's own queue. However, assuming low lock contention, an IPI is more expensive than locks; and because you need to tell the other CPU which task to put on its queue you'd have to store a "thread ID" somewhere and you'd probably need a lock to protect that (so a third CPU doesn't trash your "thread ID" value while you're sending the IPI).
That's even worse since IPIs consume a massive amount of CPU cycles, primarily on the target CPU, and these cycles does nothing useful and thus contribute to more IPIs and more overload.

Thanks for agreeing with me.

rdos wrote:
Brendan wrote:
AlfaOmega08 wrote:I don't understand how can the race condition happen.
Think about tasks that block.

You might do something like this:
Task is running, timer is counting down

Task decides to "sleep()" or block waiting for IO or something

Kernel puts task into some sort of "waiting" state

Kernel tries to find another task to switch to

Kernel disables IRQs

Kernel switches to the new task

Kernel sets new timer count

Kernel enables IRQs

New task is running
By turning of interrupts for this long you have effectively ruined each and every possibility of having decent interrupt performance. I bet this is why you need minimum delays with timer IRQs.

This was only to illustrate a race condition, wasn't intended for any other purpose, and wasn't a suggested method of doing task switches.

rdos wrote:A more proper way to do it:

1. Task decides to "sleep()" or block waiting for IO or something
2. Kernel puts task into some sort of "waiting" state
3. Kernel acquires scheduler lock
4. Kernel processes delayed IRQs
5. Kernel tries to find another task to switch to
6. Kernel loads new timeout (if expired go to 4)
7. Kernel setup up LDT and CR3 environment
8. Kernel releases scheduler lock and disables IRQs (if waiting IRQ events, goto 3)
9. Kernel loads new register context and switches to the new thread (iretd enables interrupts again)

By turning of interrupts for this long (during your step 9) you have "effectively ruined each and every possibility of having decent interrupt performance".

A more proper way to do it:

<a new list of steps that is essentially identical to rdos's list with a few different/irrelevant details, that was essentially identical to my original list with a few different/irrelevant details>

Cheers,

Brendan

rdos · Post by **rdos** » Mon May 21, 2012 2:39 pm

Brendan wrote:I know. My last scheduler used this for "policy 0" (device driver IRQ handlers, etc). It's bad for normal, medium and low priority threads where the intention is to create the illusion of tasks running at the same time. Consider something like a game, where more CPU time means smoother animation. Now imagine 2 of these "as much CPU time as I can get" tasks trying to share the same CPU.

In my experience, smoothness of animations (or anything else for that matter), is mostly related to the length of the time-slice used. If the scheduler uses 100s of milliseconds for the preemption timer, it cannot run animations smoothly under load. It doesn't matter how smart the scheduling algorithms are, the system will still suck with long time slices.

So, when I imagine these two "use as much CPU time as I can get" tasks, I see them being preemted 1,000 times per second, and being put in the global scheduler queue 350 times per second. And they run just fine both of them. On one core as well as on two or four.

Brendan wrote:Because that doesn't allow you to choose the best method of scheduling for the specific type of tasks being scheduled. If your OS is intentionally designed to be bad for everything except one specific use, then a scheduler that is bad for everything except one specific type of tasks might be fine; but most people aren't stupid enough to intentionally limit the potential uses of their OS in that way.

IMHO, parts of the kernel should not fiddle with priorities or scheduling algoritms. That is the domain of the scheduler. The GUI should not need to fiddle with scheduling algoritms or priorities for tasks in order to achieve smooth animations or fast updates of the screen. The need to do so only means either the scheduler or the GUI is badly written.

Brendan wrote:You're only thinking about your own "scheduler that is bad for everything except one specific type of tasks" system. You're not taking into account tasks that should never block. For example, imagine 2 different users who are both trying to run something like Bochs at the same time - should one user get worse performance than the other user simply because their task got the slow CPU and the scheduler is unfair?

In the example you gave one of the CPUs was 4 times faster than the other. That means that the faster CPU could run both tasks twice as fast as the slower one, and that an optimal scheduler thus would move both tasks to the faster CPU.

In your new example of two Bochs users, that is a typical example of a non-cooperative task that tries to use as much CPU time as possible. Those two tasks would still run best on the same CPU. When run on a system with multiple identical cores, it would run best on one core each.

Brendan wrote:I agree - if your scheduler isn't very good (e.g. uses a timer in periodic mode) and your "sleep()" has poor precision for no reason, then your crappy piece of crud will probably be simpler than something that's several orders of magnitude better.

It doesn't, as per the previous discussion. The scheduler uses high-precision timers (HPET or PIT) for the preemption timeout, except for the case when it must use the local APIC timer. However, low precision timeouts (typically in the range of several ms) could be implemented lock-free and IRQ-free by the scheduler.

Brendan wrote:I also wonder if you're doing a "stop the world and reschedule all CPUs at the same time" thing that completely sucks.

Not likely. I use a local per-CPU queue per priority + a global queue per priority, which doesn't require CPUs to synchronize their scheduling actions.

Brendan wrote:And under heavy load, high priority tasks that should get CPU time immediately just sit in the global queue forever because CPUs are too busy to bother checking the global queue? You can see how this could be funny if the task sitting in the queue forever is a shell that the administrator is trying to use to kill a fork bomb.

Not likely either. The highest priority thread in the global queue is checked minimum 1,000 times per second and CPU.

Brendan wrote:Only if the scheduler is incredibly stupid (e.g. shifts tasks from a CPU under load to a CPU that is under even more load).

That's another problem with the concept. The shifting CPU must know which CPU to switch to, and thus need (global) CPU load information, which creates even more lock contention. The global priority-queue design doesn't require the shifting CPU to know anything about the load of other CPUs. It will shuffle threads there regardless of the load of CPUs. The locked region only consist of inserting and removing an entry into/from a list.

Additionally, if you use historic load information, you risk ending up with oscillations where CPUs shift between being unloaded and overloaded because your load information is not the actual load. This is because when the CPUs discover that one CPU is virtually unloaded, the other CPUs will shift all their load to it, making it overloaded instead.

Edit: If you like a design that can handle 1,000s of cores, the approach can be updated to group CPUs together, and let them share a group queue instead. Then you could put a certain percentage of all scheduled-from tasks in the group queue (say 35%), and a much smaller number in a global queue that shifts tasks between groups (say 1%).

AndrewAPrice · Post by **AndrewAPrice** » Mon May 21, 2012 3:53 pm

How does a repeating time critical event work in non-real time OS's like Windows?

For example, the Nvidia 3D Vision glasses require a 110 or 120Hz monitor, and the left/right frame is alternated on the screen with each refresh. On a 120Hz monitor the driver must alternate the frame every 8.33 ms or risk the eyes going out of sync (which would be immediately obvious to the user). How does the driver guarantee that this will occur every 8.33 ms when Windows (on a single core machine) allocates 30ms timeslices to foreground processes and 10ms timeslices to background processes?

Brendan · Post by **Brendan** » Mon May 21, 2012 4:14 pm

Hi,

rdos wrote:
Brendan wrote:I know. My last scheduler used this for "policy 0" (device driver IRQ handlers, etc). It's bad for normal, medium and low priority threads where the intention is to create the illusion of tasks running at the same time. Consider something like a game, where more CPU time means smoother animation. Now imagine 2 of these "as much CPU time as I can get" tasks trying to share the same CPU.
In my experience, smoothness of animations (or anything else for that matter), is mostly related to the length of the time-slice used. If the scheduler uses 100s of milliseconds for the preemption timer, it cannot run animations smoothly under load. It doesn't matter how smart the scheduling algorithms are, the system will still suck with long time slices.

So, when I imagine these two "use as much CPU time as I can get" tasks, I see them being preemted 1,000 times per second, and being put in the global scheduler queue 350 times per second. And they run just fine both of them. On one core as well as on two or four.

I imagined one of the tasks (possibly a task with slightly higher priority) running on the CPU forever and the other task not running and not having any animation at all, because someone was silly enough to think that "highest priority task that can run" is a good idea for all types of tasks.

rdos wrote:
Brendan wrote:Because that doesn't allow you to choose the best method of scheduling for the specific type of tasks being scheduled. If your OS is intentionally designed to be bad for everything except one specific use, then a scheduler that is bad for everything except one specific type of tasks might be fine; but most people aren't stupid enough to intentionally limit the potential uses of their OS in that way.
IMHO, parts of the kernel should not fiddle with priorities or scheduling algoritms. That is the domain of the scheduler. The GUI should not need to fiddle with scheduling algoritms or priorities for tasks in order to achieve smooth animations or fast updates of the screen. The need to do so only means either the scheduler or the GUI is badly written.

The scheduler is a part of the kernel, so you're saying the scheduler (a part of the kernel) shouldn't fiddle with priorities or scheduling algorithms? I'd agree with that. For example, a GUI task might tell the scheduler "I want to use policy 1 because I need to respond to events quickly" and a heavy processing task might tell the scheduler "I want to use policy 3 because I don't block and don't want to screw things up for tasks that need to respond to events quickly", and in both cases the scheduler should do what it's told using the most appropriate scheduling algorithm for each type of task.

rdos wrote:
Brendan wrote:You're only thinking about your own "scheduler that is bad for everything except one specific type of tasks" system. You're not taking into account tasks that should never block. For example, imagine 2 different users who are both trying to run something like Bochs at the same time - should one user get worse performance than the other user simply because their task got the slow CPU and the scheduler is unfair?
In the example you gave one of the CPUs was 4 times faster than the other. That means that the faster CPU could run both tasks twice as fast as the slower one, and that an optimal scheduler thus would move both tasks to the faster CPU.

That's silly - now the faster CPU is overloaded and the slower CPU is completely wasted.

rdos wrote:
Brendan wrote:And under heavy load, high priority tasks that should get CPU time immediately just sit in the global queue forever because CPUs are too busy to bother checking the global queue? You can see how this could be funny if the task sitting in the queue forever is a shell that the administrator is trying to use to kill a fork bomb.
Not likely either. The highest priority thread in the global queue is checked minimum 1,000 times per second and CPU.

So, all your CPUs are trying to acquire a "global queue lock" a minimum of 1000 times per second; and in some way you think this improves lock contention?

rdos wrote:
Brendan wrote:Only if the scheduler is incredibly stupid (e.g. shifts tasks from a CPU under load to a CPU that is under even more load).
That's another problem with the concept. The shifting CPU must know which CPU to switch to, and thus need (global) CPU load information, which creates even more lock contention.

If you can't do global CPU load without locks (and without more lock contention), then you should probably just stick to single-CPU. No synchronisation is needed for this at all (beyond Intel's "aligned 32-bit reads and writes are atomic" guarantee).

Cheers,

Brendan

Combuster · Post by **Combuster** » Mon May 21, 2012 11:08 pm

MessiahAndrw wrote:For example, the Nvidia 3D Vision glasses require a 110 or 120Hz monitor, and the left/right frame is alternated on the screen with each refresh.

Stereoscopy's typically a video card's job. It's either done completely in hardware, or maybe in the VSync interrupt which chooses the next frame.

rdos · Post by **rdos** » Tue May 22, 2012 1:18 am

Brendan wrote:I imagined one of the tasks (possibly a task with slightly higher priority) running on the CPU forever and the other task not running and not having any animation at all, because someone was silly enough to think that "highest priority task that can run" is a good idea for all types of tasks.

As I wrote before, user tasks will typically not specify which priority they run on, and thus all user tasks typically run at the same priority. That's the optimal solution to the "I'm most important, so give me the highest possible priority" problem. Additionally, the create thread function in the pthread library doesn't have complex priority or scheduler parameters, which means that the use of odd scheduling algoritms is highly non-portable and not used widely in portable applications. Typically, this will be a waste of time both to implement and it will also slow down your scheduler.

Brendan wrote:The scheduler is a part of the kernel, so you're saying the scheduler (a part of the kernel) shouldn't fiddle with priorities or scheduling algorithms? I'd agree with that. For example, a GUI task might tell the scheduler "I want to use policy 1 because I need to respond to events quickly" and a heavy processing task might tell the scheduler "I want to use policy 3 because I don't block and don't want to screw things up for tasks that need to respond to events quickly", and in both cases the scheduler should do what it's told using the most appropriate scheduling algorithm for each type of task.

I disagree. The part of the scheduler that selects the next task to run should be simple and fast. It shouldn't be burdened with calculating load on own or other CPU, it shouldn't use complex algorithms for selecting next task, but rather should always use priority-queues and round robin. There could be a kernel thread part of the scheduler that rearranges tasks based on complex scheduling algoritms, that uses load and similar, and that possibly also changes time-slice lengths based on usage and task preferences, but this should absolutely not be done in real-time in the scheduler!

Brendan wrote:That's silly - now the faster CPU is overloaded and the slower CPU is completely wasted.

In your original post it was overheated, so that makes a lot of sense. It would cool down faster if it is unloaded.

Brendan wrote:So, all your CPUs are trying to acquire a "global queue lock" a minimum of 1000 times per second; and in some way you think this improves lock contention?

No, checking the highest priority in the global queue is a lockless operation. Only if the global queue contains a higher priority thread than the local will the lock be acquired.

bluemoon · Post by **bluemoon** » Tue May 22, 2012 1:22 am

As I wrote before, user tasks will typically not specify which priority they run on, and thus all user tasks typically run at the same priority. That's the optimal solution to the "I'm most important, so give me the highest possible priority" problem.

"There is no limitation in my design since I don't provide any feature"?

No, checking the highest priority in the global queue is a lockless operation. Only if the global queue contains a higher priority thread than the local will the lock be acquired.

There will be lock in the memory bus, on many cpu system this may be a problem.

Brendan · Post by **Brendan** » Tue May 22, 2012 2:47 am

Hi,

rdos wrote:
Brendan wrote:I imagined one of the tasks (possibly a task with slightly higher priority) running on the CPU forever and the other task not running and not having any animation at all, because someone was silly enough to think that "highest priority task that can run" is a good idea for all types of tasks.
As I wrote before, user tasks will typically not specify which priority they run on, and thus all user tasks typically run at the same priority.

That's a broken design. If all tasks have the same priority, then the scheduler has to try to make assumptions just to avoid doing round robin and nothing more (and round robin is the worst scheduler possible).

rdos wrote:Additionally, the create thread function in the pthread library doesn't have complex priority or scheduler parameters, which means that the use of odd scheduling algoritms is highly non-portable and not used widely in portable applications. Typically, this will be a waste of time both to implement and it will also slow down your scheduler.

Pthreads has scheduling policies and thread priorities within those scheduling policies. These are fairly "generic" - an OS or pthread library can map them to whichever native policies/priorities seem the suitable relatively easily.

rdos wrote:
Brendan wrote:The scheduler is a part of the kernel, so you're saying the scheduler (a part of the kernel) shouldn't fiddle with priorities or scheduling algorithms? I'd agree with that. For example, a GUI task might tell the scheduler "I want to use policy 1 because I need to respond to events quickly" and a heavy processing task might tell the scheduler "I want to use policy 3 because I don't block and don't want to screw things up for tasks that need to respond to events quickly", and in both cases the scheduler should do what it's told using the most appropriate scheduling algorithm for each type of task.
I disagree. The part of the scheduler that selects the next task to run should be simple and fast. It shouldn't be burdened with calculating load on own or other CPU, it shouldn't use complex algorithms for selecting next task, but rather should always use priority-queues and round robin. There could be a kernel thread part of the scheduler that rearranges tasks based on complex scheduling algoritms, that uses load and similar, and that possibly also changes time-slice lengths based on usage and task preferences, but this should absolutely not be done in real-time in the scheduler!

Just because something seems too complex for you, doesn't mean a 3 year old with a severe hang-over can't figure out a way.

You only need to use "lock add" (when a task is added to the queue) and "lock sub" (when task is removed from the queue/s) to keep track of how much work is on a CPU's queue/s. Any/all CPUs can read each other's "current work on queue" values with plain old "mov" (no spinlock, no lock prefix).

rdos wrote:
Brendan wrote:So, all your CPUs are trying to acquire a "global queue lock" a minimum of 1000 times per second; and in some way you think this improves lock contention?
No, checking the highest priority in the global queue is a lockless operation. Only if the global queue contains a higher priority thread than the local will the lock be acquired.

So you're trying to say that for per-CPU queues (where other CPUs might occasionally add a thread) the lock contention is a problem and it can't be done locklessly; but as soon as you've got many CPUs constantly checking, adding and removing threads from the same global queue/s there is no problems and it can be lockless?

Cheers,

Brendan

rdos · Post by **rdos** » Tue May 22, 2012 3:38 am

Brendan wrote:That's a broken design. If all tasks have the same priority, then the scheduler has to try to make assumptions just to avoid doing round robin and nothing more (and round robin is the worst scheduler possible).

No, all USER tasks have the same default priority.

Brendan wrote:Pthreads has scheduling policies and thread priorities within those scheduling policies. These are fairly "generic" - an OS or pthread library can map them to whichever native policies/priorities seem the suitable relatively easily.

Sure, the publicly available scheduler types are round robin and FIFO. Native policies == unportable and not used by standard applications.

Brendan wrote:You only need to use "lock add" (when a task is added to the queue) and "lock sub" (when task is removed from the queue/s) to keep track of how much work is on a CPU's queue/s. Any/all CPUs can read each other's "current work on queue" values with plain old "mov" (no spinlock, no lock prefix).

How scalable is it to let each CPU scheduler loop over n CPUs in order to gather load information? That's and O(n) algorithm. I can really imagine your 1,000 cores looping 999 times each time a new task should be scheduled in order to gather load information from other CPUs, and not getting anything useful done.

Brendan wrote:So you're trying to say that for per-CPU queues (where other CPUs might occasionally add a thread) the lock contention is a problem and it can't be done locklessly; but as soon as you've got many CPUs constantly checking, adding and removing threads from the same global queue/s there is no problems and it can be lockless?

I think I told you before that CPUs cannot directly add new tasks into another CPUs local queue, and thus the local queues need no locks. The way moving of tasks between CPUs work is by letting CPUs add entries to the global queue (or to an intermediate level if there are many CPUs).

Solar · Post by **Solar** » Tue May 22, 2012 4:28 am

rdos wrote:No, all USER tasks have the same default priority.

...and people challenge your claim that this is a good thing.

rdos wrote:Sure, the publicly available scheduler types are round robin and FIFO. Native policies == unportable and not used by standard applications.

pthreads API wrote: SCHED_OTHER
Scheduling behavior is determined by the operating system.

And why would a good scheduling policy have to be supported by pthreads? If you are going for "standard applications", you could just as well go for "standard operating systems" and leave. We aren't discussing only POSIX things, either.

rdos · Post by **rdos** » Tue May 22, 2012 4:57 am

Solar,

I just thought the thread was rather dull with people proposing and agreeing to the usual solutions. I just added some diversity to the discussion. People don't need to agree that my alternative solutions are any good, but I think some of the ideas have at least some merit.

Solar · Post by **Solar** » Tue May 22, 2012 5:04 am

...and people disagree, and you get defensive, and for the last two pages this thread has been about you, how your system works, and that everybody else is wrong.

Pattern...

rdos · Post by **rdos** » Tue May 22, 2012 5:15 am

bluemoon wrote:
As I wrote before, user tasks will typically not specify which priority they run on, and thus all user tasks typically run at the same priority. That's the optimal solution to the "I'm most important, so give me the highest possible priority" problem.
"There is no limitation in my design since I don't provide any feature"?

Deciding not to support certain features is also a design choice. You can decide not to support a feature for many potential reasons including:

1. The feature might be too complex and you don't find it worth while to use a lot of time to implement and debug it.
2. The feature might slow down some crucial function (like the scheduler).
3. You might suspect that nobody will use the feature.

bluemoon wrote:
No, checking the highest priority in the global queue is a lockless operation. Only if the global queue contains a higher priority thread than the local will the lock be acquired.
There will be lock in the memory bus, on many cpu system this may be a problem.

There is a solution to this. You group CPUs and provide an extra level for a group queue, and only let perhaps 1% go to the global queue.

BTW, yesterday I tested to change the percentage of posts to the global queue from 35% to 10%, and it seems to work just as well. When one "I want as much CPU time as possible" thread is running, all the 6 cores show similar load, which means the movement code works and evens-out load between available cores.

rdos · Post by **rdos** » Tue May 22, 2012 5:17 am

Solar wrote:...and people disagree, and you get defensive, and for the last two pages this thread has been about you, how your system works, and that everybody else is wrong.

Pattern...

No, the last two pages are about a different solution, while the first two pages basically is the standard solution. Anybody else of course is encouraged to present a 3:rd solution so we can discuss the pros and cons of that one.

Edit: Besides, Solar, why did you get defensive yourself when I brought up pthreads? Don't tell me it is because you are working on a competing solution?

Griwes · Post by **Griwes** » Tue May 22, 2012 5:21 am

rdos wrote:
Solar wrote:...and people disagree, and you get defensive, and for the last two pages this thread has been about you, how your system works, and that everybody else is wrong.

Pattern...
No, the last two pages are about a different solution, while the first two pages basically is the standard solution. Anybody else of course is encouraged to present a 3:rd solution so we can discuss the pros and cons of that one.

No, the last two pages are about how *your* different solution is better (t'was your point of view, of course) and how everyone disagree with you on that point. And you still claim you know better than everybody else.

OSDev.org

Multitasking, SMP & Timers

Re: Multitasking, SMP & Timers

Re: Multitasking, SMP & Timers

Re: Multitasking, SMP & Timers

Re: Multitasking, SMP & Timers

Re: Multitasking, SMP & Timers

Re: Multitasking, SMP & Timers

Re: Multitasking, SMP & Timers

Re: Multitasking, SMP & Timers

Re: Multitasking, SMP & Timers

Re: Multitasking, SMP & Timers

Re: Multitasking, SMP & Timers

Re: Multitasking, SMP & Timers

Re: Multitasking, SMP & Timers

Re: Multitasking, SMP & Timers

Re: Multitasking, SMP & Timers