Multitasking, SMP & Timers
- AlfaOmega08
- Member
- Posts: 226
- Joined: Wed Nov 07, 2007 12:15 pm
- Location: Italy
Multitasking, SMP & Timers
Hi, I'm looking for some suggestions to design a SMP scheduler. I have HPET and LAPIC Timer drivers ready, and APs being booted so far. In the past I wrote a simple scheduler using the PIT, but that was for a single processor system. It was pretty good, I think. It had priorities, choosed processes in a round-robin way, but let processes with higher priority run for a larger amount of time. Eg: if time slice is 5 ms, then a process with priority 10 would have runt 50 ms, and a process with priority 3, just for 15 ms. It also kept a list of sleeping processes, with timeouts. Basically I subtracted the slices from the timeouts each time the scheduler was called and when the timeout got to 0, the threads were waked up.
My problems here:
1) I don't believe this would work in a SMP system.
2) I don't think my implementation of sleep was quite good. I read somewhere on the forum that some sort of one-shot timer might be used, interrupting one thread to wake up a sleeping one.
Things are even harder, now that I have the PIT, the LAPIC Timer, 1 Periodic timer from HPET and up to other 31 One Shot timers from HPET. (Eventually PM-timer will be implemented too).
How would you distribute the workload? Do you believe the HPET Periodic timer more appropriate than the LAPIC Timer for scheduling?
Should sleeping threads be "stuck" to the current AP? What if, using my old method, two APs call "schedule()" and they both subtract the time slice from the timeout and woke up the thread too early?
Waiting for your helpful considerations...
My problems here:
1) I don't believe this would work in a SMP system.
2) I don't think my implementation of sleep was quite good. I read somewhere on the forum that some sort of one-shot timer might be used, interrupting one thread to wake up a sleeping one.
Things are even harder, now that I have the PIT, the LAPIC Timer, 1 Periodic timer from HPET and up to other 31 One Shot timers from HPET. (Eventually PM-timer will be implemented too).
How would you distribute the workload? Do you believe the HPET Periodic timer more appropriate than the LAPIC Timer for scheduling?
Should sleeping threads be "stuck" to the current AP? What if, using my old method, two APs call "schedule()" and they both subtract the time slice from the timeout and woke up the thread too early?
Waiting for your helpful considerations...
Please, correct my English...
Motherboard: ASUS Rampage II Extreme
CPU: Core i7 950 @ 3.06 GHz OC at 3.6 GHz
RAM: 4 GB 1600 MHz DDR3
Video: nVidia GeForce 210 GTS... it sucks...
Motherboard: ASUS Rampage II Extreme
CPU: Core i7 950 @ 3.06 GHz OC at 3.6 GHz
RAM: 4 GB 1600 MHz DDR3
Video: nVidia GeForce 210 GTS... it sucks...
Re: Multitasking, SMP & Timers
The problem of two cores updating the same values, you really should be using atomic operations (lock's, mutex's, semaphores depending on what you're trying to do) for any variables/code that can be run by many cores at a time.
As for idle processes, having a count-down timer that triggers every so often should be fine as long as it can't be starved (also, might want to set a known maximum interval for idle processes, so you can 'reasonably' guarrantee that it will be called within a certain time frame so developers know if the process is able to be idle, or should go to a higher priority).
For sleeping threads, they should only ever have to be woken up if they are requested from another process. If I put my HDD thread to sleep, it doesn't need a keep alive, it just needs to be notified when someone needs disk access.
Of course, you're ideas can easily differ (and it's your OS, so do what makes sense for what you want your final product). There isn't a one size fits all approach, a real-time OS will have very differing requirements than a non RT OS. An OS that has MANY little threads vs. one with a few large threads will have different requirements. There are lots of variables that could make you take one path versus another, I have no idea what your design is, or what your expected outcome is, so without any further info, it's hard to give anything specific (even if i had a clue, it's still hard to give specifics to someone elses design).
It is considered good practice (well, depending on the platform of course), to try to keep a process on the same CPU that it was on last time (of course, if one is heavily loaded, and another idle, you should move some processes around for load balance). So, if it's a sleeping task, there is no reason to do anything with it, since it should sleep until needed. I would probably have a flag somewhere (some sort of 'hint' variable) to determine the ease of moving a process from one CPU to another. Some platforms this is a big deal (think NUMA), while others it isn't as big a deal (think multi-core CPU, or a single CPU with hyperthreading). Again, your design considerations are going to dictate how you go about it.
As for idle processes, having a count-down timer that triggers every so often should be fine as long as it can't be starved (also, might want to set a known maximum interval for idle processes, so you can 'reasonably' guarrantee that it will be called within a certain time frame so developers know if the process is able to be idle, or should go to a higher priority).
For sleeping threads, they should only ever have to be woken up if they are requested from another process. If I put my HDD thread to sleep, it doesn't need a keep alive, it just needs to be notified when someone needs disk access.
Of course, you're ideas can easily differ (and it's your OS, so do what makes sense for what you want your final product). There isn't a one size fits all approach, a real-time OS will have very differing requirements than a non RT OS. An OS that has MANY little threads vs. one with a few large threads will have different requirements. There are lots of variables that could make you take one path versus another, I have no idea what your design is, or what your expected outcome is, so without any further info, it's hard to give anything specific (even if i had a clue, it's still hard to give specifics to someone elses design).
It is considered good practice (well, depending on the platform of course), to try to keep a process on the same CPU that it was on last time (of course, if one is heavily loaded, and another idle, you should move some processes around for load balance). So, if it's a sleeping task, there is no reason to do anything with it, since it should sleep until needed. I would probably have a flag somewhere (some sort of 'hint' variable) to determine the ease of moving a process from one CPU to another. Some platforms this is a big deal (think NUMA), while others it isn't as big a deal (think multi-core CPU, or a single CPU with hyperthreading). Again, your design considerations are going to dictate how you go about it.
Re: Multitasking, SMP & Timers
Check the wiki for delta queue. (Timed) Sleep process is inserted in a sorted way, and only the first process in the queue need to be checked. Insert is slow, but then it's using sleeping process's time and a sleeping process won't complain for slow. You may also use one-shot timer this way.AlfaOmega08 wrote:It also kept a list of sleeping processes, with timeouts. Basically I subtracted the slices from the timeouts each time the scheduler was called and when the timeout got to 0, the threads were waked up.
Re: Multitasking, SMP & Timers
Generally no. The LAPIC timer is perfect for local scheduling. It allows the cores to be scheduled separately. And a core with nothing to schedule can disable its LAPIC and save power until work arrives.Do you believe the HPET Periodic timer more appropriate than the LAPIC Timer for scheduling?
Not stuck, but there are advantages to keeping threads on the same core. Generally when they are sleeping, threads are entirely (at least conceptually) removed from the scheduler and lose their association with a core. If a thread has slept for a long time and other threads have run in its core there is probably little use resuming it on the same core. If the thread has slept for a short time the cache may still contain its data. It might be worth sending it to the same core. Keep a thread local variable holding the last_core just in case.Should sleeping threads be "stuck" to the current AP?
Oh, and have a look at the timer-wheels algorithm. It's a little old fashioned now but they are an excellent idea in a small kernel.
If a trainstation is where trains stop, what is a workstation ?
Re: Multitasking, SMP & Timers
Hi,
You'll notice that the the high priority task has to wait for 200 ms while other tasks are using the CPU. With 100 "priority 3 tasks" it'd have to wait for 1500 ms. You can see how this is likely to cause problems if the high priority task is trying to respond to key presses or do smooth animation.
There's a variation that I call "variable frequency scheduling"; where each task runs for a fixed length of time, but high priority tasks run more often. For example, a priority 3 task might get 3 different 5 ms time slices and a priority 10 task might get 10 different 5 ms time slices. This might look like this:
The tasks are still getting the same amount of time as before, but now the high priority task only waits for 20 ms before it gets any CPU time (rather than 200 ms). With 100 "priority 3 tasks" it'd have to wait for 150 ms (rather than 1500 ms). This is better, but it increases overhead a little (time spent doing task switches) and probably still isn't good enough if the high priority task is trying to respond to key presses or do smooth animation when the system is under load.
The next step is to recognise that the difference between task priorities can be so large that lower priority tasks shouldn't get CPU time at all if higher priority tasks are able to run. For example, you could have a "priority infinity" task which preempt any lower priority task and run immediately, where no lower priority task gets any CPU time when a "priority infinity" task is able to run. If there's several "priority infinity" tasks they could share the CPU. You could also have "priority infinity + n" tasks, where different "priority infinity + n" tasks share the CPU somehow (in the same way that "priority n" tasks do).
In a similar way, you could have "priority negative infinity" tasks which are preempted by anything and never run if anything else is able to run. You could also have "priority negative infinity - n" tasks, where different "priority negative infinity - n" tasks share the CPU somehow.
Effectively, what we've done is create 3 separate "priority groups". You could have any number of "priority groups". Within each priority group you could use a different policy for scheduling. One group might use "variable frequency scheduling", one group might use "variable time slice scheduling", and other groups might use something completely different.
You could create different groups for different purposes. This allows you to choose the best method of scheduling for the specific type of tasks being scheduled.You might have soft real time tasks that use "earliest deadline first" scheduling and call that "scheduling policy 1". The second group (scheduling policy 2) might use "variable frequency scheduling" and be intended for tasks that handle user interaction (key presses, video, etc). The third group (scheduling policy 3) might use "variable time slice scheduling" and be intended for normal tasks. A fourth group might be for things that should only be done if there's nothing else to do (idle tasks) and might use "variable time slice scheduling" too.
For SMP, you don't want a global "scheduler lock", as many CPUs might be fighting for this lock at the same time (and wasting CPU time). The more CPUs there are the worse it would get - lock contention would cause poor scalability. If your scheduler has a lock for each CPU then the chance of CPUs fighting for the same lock is a lot less, and performance/scalability would improves. If your scheduler has 6 different locks per CPU (one for each scheduling policy) then there'd be even less lock contention and you'd improve performance/scalability even more.
For my last scheduler, there was 8 scheduling policies with 256 priorities within each scheduling policy. The first (highest) policy was cooperative scheduling and used 256 locks (and 256 FIFO queues), and the remaining 7 policies had one lock each. That's 263 locks per CPU. Even with 1000 CPUs the chance of lock contention was still very low.
Cheers,
Brendan
This is what I call "variable time slice scheduling". It's bad because high priority tasks have to wait for all medium/low priority tasks to have their turn before they get any CPU time. Here's what happens when there's 10 low priority threads (A to J), 10 medium priority threads (K to T) and one extremely high priority thread (X), where each character represents 5 ms of time:AlfaOmega08 wrote:It had priorities, choosed processes in a round-robin way, but let processes with higher priority run for a larger amount of time. Eg: if time slice is 5 ms, then a process with priority 10 would have runt 50 ms, and a process with priority 3, just for 15 ms.
Code: Select all
AAABBBCCCDDDEEEFFFGGGHHHIIIJJJKLMNOPQRSTXXXXXXXXXX
There's a variation that I call "variable frequency scheduling"; where each task runs for a fixed length of time, but high priority tasks run more often. For example, a priority 3 task might get 3 different 5 ms time slices and a priority 10 task might get 10 different 5 ms time slices. This might look like this:
Code: Select all
ABCDXEFGHXIJKLXMABCXDEFGXHIJNXOPABXCDEFXGHIJXQRSTX
The next step is to recognise that the difference between task priorities can be so large that lower priority tasks shouldn't get CPU time at all if higher priority tasks are able to run. For example, you could have a "priority infinity" task which preempt any lower priority task and run immediately, where no lower priority task gets any CPU time when a "priority infinity" task is able to run. If there's several "priority infinity" tasks they could share the CPU. You could also have "priority infinity + n" tasks, where different "priority infinity + n" tasks share the CPU somehow (in the same way that "priority n" tasks do).
In a similar way, you could have "priority negative infinity" tasks which are preempted by anything and never run if anything else is able to run. You could also have "priority negative infinity - n" tasks, where different "priority negative infinity - n" tasks share the CPU somehow.
Effectively, what we've done is create 3 separate "priority groups". You could have any number of "priority groups". Within each priority group you could use a different policy for scheduling. One group might use "variable frequency scheduling", one group might use "variable time slice scheduling", and other groups might use something completely different.
You could create different groups for different purposes. This allows you to choose the best method of scheduling for the specific type of tasks being scheduled.You might have soft real time tasks that use "earliest deadline first" scheduling and call that "scheduling policy 1". The second group (scheduling policy 2) might use "variable frequency scheduling" and be intended for tasks that handle user interaction (key presses, video, etc). The third group (scheduling policy 3) might use "variable time slice scheduling" and be intended for normal tasks. A fourth group might be for things that should only be done if there's nothing else to do (idle tasks) and might use "variable time slice scheduling" too.
For SMP, you don't want a global "scheduler lock", as many CPUs might be fighting for this lock at the same time (and wasting CPU time). The more CPUs there are the worse it would get - lock contention would cause poor scalability. If your scheduler has a lock for each CPU then the chance of CPUs fighting for the same lock is a lot less, and performance/scalability would improves. If your scheduler has 6 different locks per CPU (one for each scheduling policy) then there'd be even less lock contention and you'd improve performance/scalability even more.
For my last scheduler, there was 8 scheduling policies with 256 priorities within each scheduling policy. The first (highest) policy was cooperative scheduling and used 256 locks (and 256 FIFO queues), and the remaining 7 policies had one lock each. That's 263 locks per CPU. Even with 1000 CPUs the chance of lock contention was still very low.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
- AlfaOmega08
- Member
- Posts: 226
- Joined: Wed Nov 07, 2007 12:15 pm
- Location: Italy
Re: Multitasking, SMP & Timers
That won't solve my problem. If two cores stop executing a task and decide to subtract the time slice from the sleep time, they aquire the lock, both subtract the value (bad), and release the lock. What I meant by "stuck" sleeping threads is that only one AP should be able to decrement the sleep timeout.Ready4Dis wrote:The problem of two cores updating the same values, you really should be using atomic operations (lock's, mutex's, semaphores depending on what you're trying to do) for any variables/code that can be run by many cores at a time.
What if a thread calls sleep(ms). No other process is going to request a wake up for it...Ready4Dis wrote:For sleeping threads, they should only ever have to be woken up if they are requested from another process. If I put my HDD thread to sleep, it doesn't need a keep alive, it just needs to be notified when someone needs disk access.
I would like to make my OS generic and scalable, with a lot of user interactivity (you know, play movies, surf the web ). It will not be a real-time OS.Ready4Dis wrote:Of course, you're ideas can easily differ (and it's your OS, so do what makes sense for what you want your final product). There isn't a one size fits all approach, a real-time OS will have very differing requirements than a non RT OS. An OS that has MANY little threads vs. one with a few large threads will have different requirements. There are lots of variables that could make you take one path versus another, I have no idea what your design is, or what your expected outcome is, so without any further info, it's hard to give anything specific (even if i had a clue, it's still hard to give specifics to someone elses design).
I just fell in love for this structure... But insertion is O(n) for any case...bluemoon wrote:Check the wiki for delta queue. (Timed) Sleep process is inserted in a sorted way, and only the first process in the queue need to be checked. Insert is slow, but then it's using sleeping process's time and a sleeping process won't complain for slow. You may also use one-shot timer this way.
I didn't consider Power Management... good point. What about this: using the LAPIC Timer for local scheduling, one of the HPET one-shot timers to wake up the next thread in the delta-queue and eventually one HPET Periodic timer just to keep date/time.gerryg400 wrote:Generally no. The LAPIC timer is perfect for local scheduling. It allows the cores to be scheduled separately. And a core with nothing to schedule can disable its LAPIC and save power until work arrives.
Wouldn't that need separate thread queues for each CPU? Also, what's the point of a scheduler lock for a single CPU? The same CPU can't execute twice the critical section. I tought scheduler locks were used to prevent other CPUs from reading/altering queues while one of the CPUs is reading/modifing it. Doesn't this require a global lock?Brendan wrote:For SMP, you don't want a global "scheduler lock", as many CPUs might be fighting for this lock at the same time (and wasting CPU time). The more CPUs there are the worse it would get - lock contention would cause poor scalability. If your scheduler has a lock for each CPU then the chance of CPUs fighting for the same lock is a lot less, and performance/scalability would improves. If your scheduler has 6 different locks per CPU (one for each scheduling policy) then there'd be even less lock contention and you'd improve performance/scalability even more.
For my last scheduler, there was 8 scheduling policies with 256 priorities within each scheduling policy. The first (highest) policy was cooperative scheduling and used 256 locks (and 256 FIFO queues), and the remaining 7 policies had one lock each. That's 263 locks per CPU. Even with 1000 CPUs the chance of lock contention was still very low.
Please, correct my English...
Motherboard: ASUS Rampage II Extreme
CPU: Core i7 950 @ 3.06 GHz OC at 3.6 GHz
RAM: 4 GB 1600 MHz DDR3
Video: nVidia GeForce 210 GTS... it sucks...
Motherboard: ASUS Rampage II Extreme
CPU: Core i7 950 @ 3.06 GHz OC at 3.6 GHz
RAM: 4 GB 1600 MHz DDR3
Video: nVidia GeForce 210 GTS... it sucks...
Re: Multitasking, SMP & Timers
Hi,
If one CPU is running at 2 GHz and another CPU is running at 500 MHz (because it got too hot and needs to cool down, for example); then if 2 tasks have the same priority and one gets 10 ms on the first CPU, then the other should get 40 ms on the other (slower) CPU, so that they both get the same fair share of CPU time.
The best way to do this is probably to use a performance monitoring counter setup to count "instructions retired". The scheduler doesn't let a task run for a certain amount of time, but instead the scheduler lets a task run for a certain number of instructions. If one task runs for 1 million instructions on one CPU and another task runs for 1 million instructions on another CPU, then the tasks get exactly the same share of CPU time regardless of how much "wall clock" time the tasks actually consumed.
If performance monitoring counters can't be used, then you can try to keep track of how fast a CPU can currently execute instructions and use the local APIC timer to approximate "instructions retired". This isn't going to be as accurate (especially when things like hyper-threading and TurboBoost are involved), but it's still going to be better than scheduling " raw time".
Cheers,
Brendan
Using the local APIC timer for local scheduling is "almost perfect" (and is better than using any other timer).gerryg400 wrote:Generally no. The LAPIC timer is perfect for local scheduling. It allows the cores to be scheduled separately. And a core with nothing to schedule can disable its LAPIC and save power until work arrives.Do you believe the HPET Periodic timer more appropriate than the LAPIC Timer for scheduling?
If one CPU is running at 2 GHz and another CPU is running at 500 MHz (because it got too hot and needs to cool down, for example); then if 2 tasks have the same priority and one gets 10 ms on the first CPU, then the other should get 40 ms on the other (slower) CPU, so that they both get the same fair share of CPU time.
The best way to do this is probably to use a performance monitoring counter setup to count "instructions retired". The scheduler doesn't let a task run for a certain amount of time, but instead the scheduler lets a task run for a certain number of instructions. If one task runs for 1 million instructions on one CPU and another task runs for 1 million instructions on another CPU, then the tasks get exactly the same share of CPU time regardless of how much "wall clock" time the tasks actually consumed.
If performance monitoring counters can't be used, then you can try to keep track of how fast a CPU can currently execute instructions and use the local APIC timer to approximate "instructions retired". This isn't going to be as accurate (especially when things like hyper-threading and TurboBoost are involved), but it's still going to be better than scheduling " raw time".
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
- AlfaOmega08
- Member
- Posts: 226
- Joined: Wed Nov 07, 2007 12:15 pm
- Location: Italy
Re: Multitasking, SMP & Timers
Damn. What a mess!Brendan wrote:If one CPU is running at 2 GHz and another CPU is running at 500 MHz (because it got too hot and needs to cool down, for example); then if 2 tasks have the same priority and one gets 10 ms on the first CPU, then the other should get 40 ms on the other (slower) CPU, so that they both get the same fair share of CPU time.
I think I'm missing something on scheduler locks however... Aren't those supposed to protect queues from simultaneous CPUs access? Having a lock per CPU means having a queue per CPU, and thus, threads are stuck to a CPU?
Thanks
Please, correct my English...
Motherboard: ASUS Rampage II Extreme
CPU: Core i7 950 @ 3.06 GHz OC at 3.6 GHz
RAM: 4 GB 1600 MHz DDR3
Video: nVidia GeForce 210 GTS... it sucks...
Motherboard: ASUS Rampage II Extreme
CPU: Core i7 950 @ 3.06 GHz OC at 3.6 GHz
RAM: 4 GB 1600 MHz DDR3
Video: nVidia GeForce 210 GTS... it sucks...
Re: Multitasking, SMP & Timers
Hi,
Now consider a loop that's used for animation or something:
How often is "do_something()" called? If "do_something()" always takes 2 ms then it would be called every 102 ms. If "do_something()" takes anywhere from 1 ms to 20 ms then it might be called again after 101 ms or after 120 ms. If other tasks are running it might be called after 543 ms. It's a dodgy/inaccurate piece of crud (that would be continually "drifting"). How do you fix it?
Jitter caused by the CPU running other threads might mean that sometimes "do_something()" might be called after 150 ms and sometimes it might be called after 50 ms to catch up; but (as long as the CPU is fast enough) you can guarantee that "do_something()" will be called every 100 ms on average.
For this reason, your kernel needs "sleep_until( when );". You could implement a "sleep( duration )" function like this:
This also means that tasks sleep until "now() > task->wake_time"; and you don't have to decrease a "time remaining" value for every sleeping task every time the timer IRQ occurs.
This also makes the delta queue idea a bit redundant - you just use a list/queue that is sorted in order of wake time.
To reduce the "O(n)" insertion time, you can have buckets - one list for tasks that should wake up in the next 100 ms, one list for tasks that should wake up between 100 and 200 ms, one list for tasks that should wake up in 200 to 300 ms, etc. If you've got 128 lists/queues (representing 100 ms of time each) then it only covers 12.8 seconds. For delays larger than that you'd need a "longer than 12.8 seconds" list/queue too. Every 100 ms your timer IRQ handler would switch to the next list and discard the old (empty) list, then create a new list by splitting the "longer than 12.8 seconds" list.
This means faster insertion, more queues/lists, more locks, less lock contention, etc. In theory, this can also be "per CPU" (e.g. using local APIC timer for the timer IRQ) to further reduce insertion times; with some other timer (e.g. HPET) used as a backup for when the local APIC is turned off to save power (e.g. where you merge a CPU's queues with the HPET's global queue before turning the local APIC timer off).
A CPU can only take tasks off of its own "ready to run" queue; but because any CPU can insert a task into any other CPU's run queue you'd still need a lock for each queue.
Alternatively you could use IPIs. For example, if one CPU wants to put a task onto another CPU's queue, it could send an IPI to the other CPU to ask it to put the task on it's own queue. However, assuming low lock contention, an IPI is more expensive than locks; and because you need to tell the other CPU which task to put on its queue you'd have to store a "thread ID" somewhere and you'd probably need a lock to protect that (so a third CPU doesn't trash your "thread ID" value while you're sending the IPI).
Cheers,
Brendan
If both CPUs acquire the same lock at the same time, then your lock is hideously broken.AlfaOmega08 wrote:That won't solve my problem. If two cores stop executing a task and decide to subtract the time slice from the sleep time, they aquire the lock, both subtract the value (bad), and release the lock. What I meant by "stuck" sleeping threads is that only one AP should be able to decrement the sleep timeout.
Now consider a loop that's used for animation or something:
Code: Select all
while(running) {
do_something();
sleep( 100 ms );
}
Code: Select all
next_time = now() + 100 ms;
while(running) {
do_something();
sleep_until( next_time );
next_time += 100 ms;
}
For this reason, your kernel needs "sleep_until( when );". You could implement a "sleep( duration )" function like this:
Code: Select all
void sleep( duration ) {
sleep_until( now() + duration );
}
This also makes the delta queue idea a bit redundant - you just use a list/queue that is sorted in order of wake time.
To reduce the "O(n)" insertion time, you can have buckets - one list for tasks that should wake up in the next 100 ms, one list for tasks that should wake up between 100 and 200 ms, one list for tasks that should wake up in 200 to 300 ms, etc. If you've got 128 lists/queues (representing 100 ms of time each) then it only covers 12.8 seconds. For delays larger than that you'd need a "longer than 12.8 seconds" list/queue too. Every 100 ms your timer IRQ handler would switch to the next list and discard the old (empty) list, then create a new list by splitting the "longer than 12.8 seconds" list.
This means faster insertion, more queues/lists, more locks, less lock contention, etc. In theory, this can also be "per CPU" (e.g. using local APIC timer for the timer IRQ) to further reduce insertion times; with some other timer (e.g. HPET) used as a backup for when the local APIC is turned off to save power (e.g. where you merge a CPU's queues with the HPET's global queue before turning the local APIC timer off).
Yes. For example, for my last scheduler there were 263 queues for each CPU.AlfaOmega08 wrote:Wouldn't that need separate thread queues for each CPU?Brendan wrote:For SMP, you don't want a global "scheduler lock", as many CPUs might be fighting for this lock at the same time (and wasting CPU time). The more CPUs there are the worse it would get - lock contention would cause poor scalability. If your scheduler has a lock for each CPU then the chance of CPUs fighting for the same lock is a lot less, and performance/scalability would improves. If your scheduler has 6 different locks per CPU (one for each scheduling policy) then there'd be even less lock contention and you'd improve performance/scalability even more.
For my last scheduler, there was 8 scheduling policies with 256 priorities within each scheduling policy. The first (highest) policy was cooperative scheduling and used 256 locks (and 256 FIFO queues), and the remaining 7 policies had one lock each. That's 263 locks per CPU. Even with 1000 CPUs the chance of lock contention was still very low.
These are all "ready to run queues". When a task is actually given CPU time, it's removed from the "ready to run" queue (the task becomes "running" and therefore it's no longer "ready to run"). The scheduler decides which CPU a task should use when a task needs to be put back onto a "ready to run" queue. This is how load balancing works (e.g. one CPU might look at it's own queue and say "I'm too busy!" and put the task on a different CPU's queue).AlfaOmega08 wrote:Also, what's the point of a scheduler lock for a single CPU?
A CPU can only take tasks off of its own "ready to run" queue; but because any CPU can insert a task into any other CPU's run queue you'd still need a lock for each queue.
Alternatively you could use IPIs. For example, if one CPU wants to put a task onto another CPU's queue, it could send an IPI to the other CPU to ask it to put the task on it's own queue. However, assuming low lock contention, an IPI is more expensive than locks; and because you need to tell the other CPU which task to put on its queue you'd have to store a "thread ID" somewhere and you'd probably need a lock to protect that (so a third CPU doesn't trash your "thread ID" value while you're sending the IPI).
It requires a global lock for each queue (but not one global lock for all queues).AlfaOmega08 wrote:Doesn't this require a global lock?
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: Multitasking, SMP & Timers
Brendan has summed it up nicely. Yes, I did not mean a single lock for all threads, each cpu should have it's own list of threads and a method of transferring threads from one cpu to another cpu (or taking threads from another CPU).
- AlfaOmega08
- Member
- Posts: 226
- Joined: Wed Nov 07, 2007 12:15 pm
- Location: Italy
Re: Multitasking, SMP & Timers
Just realized that, if my knowledge isn't totaly wrong, the APIC timer runs at the same frequency of the CPU. So if I calculate the CPU frequency at boot, with full speed, and keep that as the frequency of the LAPIC Timer, if the CPU slows down, the timer will follow it, and threads will have more time.Brendan wrote:If one CPU is running at 2 GHz and another CPU is running at 500 MHz (because it got too hot and needs to cool down, for example); then if 2 tasks have the same priority and one gets 10 ms on the first CPU, then the other should get 40 ms on the other (slower) CPU, so that they both get the same fair share of CPU time.
Well I meant: CPU #0 aquires the lock, decrements, and releases, then CPU #1 aquires it again, decrements and releases. Resulting in a double decrement, where only one was needed.Brendan wrote:If both CPUs acquire the same lock at the same time, then your lock is hideously broken.
Using sleep_until, I don't even need a sleep_queue, right? Just having all the threads in a queue, and if now() > wakeup_time, then it can be runt. What about using the HPET One Shot Timer to move a thread from the sleep queue to the "ready to run" quque? I still don't get the point... sorry.Brendan wrote:For this reason, your kernel needs "sleep_until( when );". You could implement a "sleep( duration )" function like this:
This also means that tasks sleep until "now() > task->wake_time"; and you don't have to decrease a "time remaining" value for every sleeping task every time the timer IRQ occurs.Code: Select all
void sleep( duration ) { sleep_until( now() + duration ); }
However, I'm reading the Silberschatz's OS Concepts chapter on multiprocessing. It tells that one of the most advanced scheduling algoritms is "multilevel feedback queue scheduling", where each process is classified as "system", "interactive", "background", etc. Each class is a queue with it's own scheduling algorithm. Each queue will run its own processes only when the the upper queues are out of processes. For example an interactive process has to wait for all the system's processes. Processes can even be moved around queues, changing their priority. In the multiprocessor section it also describes how a CPU can pull processes from other CPUs when they're idle.
It reminds me of your (Brendan's) scheduler. This has to be quite a hell to implement. Is this what you described?
But most important what scheduling algorithms would you implement for each queue? I believe that a FIFO, round robin or my "variable time slice scheduling", would be fairly enaugh for background processes.
For interactive processes I would divide this in two subcategories: interactions with multimedia (audio and/or videos) which need high responsiveness and that would be just under the system queue, and a non-multimedia interactive process (text editor, imaging editor, directory explorer). I believe that multimedia processes should have all the same (high) priority, so have equal time slices between them with round-robin (in the rare case that the user will watch two videos simultaneusly).
Text editors and such, will spend most of time waiting for keyboard & mouse events, and there's no point in let them busy waiting. So the simple-interactive queue, will be mostly empty I guess, and in the case that two processes are crunching some data, the "variable frequency scheduling" will suffice, where the process or window that has focus gains higher priority.
System processes will not wait for user interaction, but will often spend time in sleeps for disk access or god knows what. And in this case the "variable frequency scheduling" should be good too.
Well, since I can't believe that round robin is an optimal algorithm for anything but toy OS, please dissuade me from using it for background and multimedia process. Do you find flaws in my design?
Edit: also, do you believe that high priority task should stop execution of lower priority tasks when waking up? Eg, task #0 (system) is sleeping, task #1 (normal, interactive) is running. If task #0 is waked up by HPET on timeout, should it stop task #1 and take control of the CPU, or should it wait for the next timer IRQ? Excuse me for excessive curiosity but I'm trying to have a complete image of scheduling algorithms before trying to write one for my last OS.
Edit: phew, my 200th post was quite big...
Last edited by AlfaOmega08 on Wed May 16, 2012 2:50 pm, edited 1 time in total.
Please, correct my English...
Motherboard: ASUS Rampage II Extreme
CPU: Core i7 950 @ 3.06 GHz OC at 3.6 GHz
RAM: 4 GB 1600 MHz DDR3
Video: nVidia GeForce 210 GTS... it sucks...
Motherboard: ASUS Rampage II Extreme
CPU: Core i7 950 @ 3.06 GHz OC at 3.6 GHz
RAM: 4 GB 1600 MHz DDR3
Video: nVidia GeForce 210 GTS... it sucks...
- Griwes
- Member
- Posts: 374
- Joined: Sat Jul 30, 2011 10:07 am
- Libera.chat IRC: Griwes
- Location: Wrocław/Racibórz, Poland
- Contact:
Re: Multitasking, SMP & Timers
Then your algorithm, in general, is broken. Why would two CPUs try to decrement the same variable? Of course, when multiple cores will try to do the same thing, then the job done will be multiplied; that's why such job should be assigned to single CPU (i.e. each CPU decrements only counters in it's own queues).AlfaOmega08 wrote:Well I meant: CPU #0 aquires the lock, decrements, and releases, then CPU #1 aquires it again, decrements and releases. Resulting in a double decrement, where only one was needed.Brendan wrote:If both CPUs acquire the same lock at the same time, then your lock is hideously broken.
Reaver Project :: Repository :: Ohloh project page
<klange> This is a horror story about what happens when you need a hammer and all you have is the skulls of the damned.
<drake1> as long as the lock is read and modified by atomic operations
<klange> This is a horror story about what happens when you need a hammer and all you have is the skulls of the damned.
<drake1> as long as the lock is read and modified by atomic operations
- AlfaOmega08
- Member
- Posts: 226
- Joined: Wed Nov 07, 2007 12:15 pm
- Location: Italy
Re: Multitasking, SMP & Timers
My algorithm was designed for single CPU systems Where the CPU decremented the sleep timeout of each process and woke up those that had a timeout <= 0. I was just pointing out how this method is not good for multiple CPUs...Griwes wrote:Then your algorithm, in general, is broken. Why would two CPUs try to decrement the same variable? Of course, when multiple cores will try to do the same thing, then the job done will be multiplied; that's why such job should be assigned to single CPU (i.e. each CPU decrements only counters in it's own queues).
Please, correct my English...
Motherboard: ASUS Rampage II Extreme
CPU: Core i7 950 @ 3.06 GHz OC at 3.6 GHz
RAM: 4 GB 1600 MHz DDR3
Video: nVidia GeForce 210 GTS... it sucks...
Motherboard: ASUS Rampage II Extreme
CPU: Core i7 950 @ 3.06 GHz OC at 3.6 GHz
RAM: 4 GB 1600 MHz DDR3
Video: nVidia GeForce 210 GTS... it sucks...
- Griwes
- Member
- Posts: 374
- Joined: Sat Jul 30, 2011 10:07 am
- Libera.chat IRC: Griwes
- Location: Wrocław/Racibórz, Poland
- Contact:
Re: Multitasking, SMP & Timers
It's good, as long as you keep list of sleeping processes per CPU*, not in one global queue - then CPU can decrease only it's own processes' counters and everybody's happy. However, sleep_until() is definitely more elegant solution.AlfaOmega08 wrote:My algorithm was designed for single CPU systems Where the CPU decremented the sleep timeout of each process and woke up those that had a timeout <= 0. I was just pointing out how this method is not good for multiple CPUs...Griwes wrote:Then your algorithm, in general, is broken. Why would two CPUs try to decrement the same variable? Of course, when multiple cores will try to do the same thing, then the job done will be multiplied; that's why such job should be assigned to single CPU (i.e. each CPU decrements only counters in it's own queues).
* I mean, each process/thread is tied to a processor (with possibility to be moved, of course) and each core only keeps processes/threads tied to it in it's list.
Reaver Project :: Repository :: Ohloh project page
<klange> This is a horror story about what happens when you need a hammer and all you have is the skulls of the damned.
<drake1> as long as the lock is read and modified by atomic operations
<klange> This is a horror story about what happens when you need a hammer and all you have is the skulls of the damned.
<drake1> as long as the lock is read and modified by atomic operations
Re: Multitasking, SMP & Timers
Yes. task #0 should run immediately. On a multi-core system the scheduler could perhaps check whether another core could be used, although in an ideal world with an ideal scheduler, the HPET interrupt has occurred on the core running the lowest priority task already.do you believe that high priority task should stop execution of lower priority tasks when waking up? Eg, task #0 (system) is sleeping, task #1 (normal, interactive) is running. If task #0 is waked up by HPET on timeout, should it stop task #1 and take control of the CPU, or should it wait for the next timer IRQ?
If a trainstation is where trains stop, what is a workstation ?