Brendan wrote:I thought I'd post an example scenario for people (rdos) to think about.
OK, thanks.
Brendan wrote:Imagine the CPU is a quad-core i7, with hyper-threading and TurboBoost (8 logical CPUs). CPU#0 is currently running a very high priority task and all the rest are idle. CPU#6 and CPU#7 have been idle for a long time and are currently in a deep sleep state. The temperatures for all CPUs/cores is roughly the same (as they're all in the same physical chip) but CPU#0 is a little bit warmer than the rest (as it's the only CPU that isn't idle). A very low priority task was recently running on CPU#1 and blocked waiting for data to be read from disk. That data has arrived from disk and unblocked the very low priority task.
If the scheduler is perfect, what would it do?
Shifting the currently running very high priority task to a different CPU makes no sense at all, so it remains running on CPU#0.
For the very low priority task, you don't want to wake up CPU#6 or CPU#7 because they're helping to keep the temperature of the physical chip (and the other cores) down; and you don't really want to use CPU#2, CPU#3, CPU#4, CPU#5, CPU#6 or CPU#7 because in that case TurboBoost will reduce the speed that CPU#0 is running at, and the performance of the very high priority task would be reduced.
If you put the very low priority task on CPU#1 then you avoid the TurboBoost problem, and it might run faster (as some of the task's data may still be in that CPU's caches); but hyper-threading would mean that the performance of the very high priority task would badly effected, and you don't want that because the performance of the very low priority isn't important at all and the performance of the very high priority task is very important.
The optimal solution is to schedule the very low priority task on the same CPU as the very high priority task; even though this CPU has the most load and is running at the warmest temperature, and even though all other CPUs are idle.
Some reflexions:
1. Waking up CPU-cores (or putting them to sleep) should not be done in the scheduler. This is the job of the power management driver. Therefore, we can exclude the option that the scheduler would wake up CPU #6 or CPU #7.
2. With proper load balancing all cores that are not in deep sleep have equal load over time, and thus similar temperature. That makes the temperature parameter irrelevant. Additionally, if load is sufficiently high to cause significant temperature rise, then all cores should be active. So, just exclude that from the discussion.
Next, you might want to look at the load distribution I posted for dual core Intel Atom (with hyperthreading) in the "what do your OS look like" thread. The load distribution is quite interesting in that CPU #0 and CPU #2 has the highest load, with CPU #1 and CPU #3 having lower load. CPU #0 and CPU #1 have common resources as do CPU #2 and CPU #3. I think this can be explained by the "global queue" algorithm used. The algorithm ensures that load is evenly spread, and when CPU #0 and CPU #1 have common resources, a thread executing on CPU #0 makes it less likely that CPU #1 will grab a task from the global queue (especially a task posted by CPU #0), which is seen in the load diagrams I posted. This is not an automatic benefit of your algorithm.
So, I think I conclude by proposing that avoiding the issues you want to use in your complex scheduler calculations is far better. For instance:
1. When load balancing works, and load over time on different cores are similar, temperature becomes similar, and thus temperature can be excluded as an issue
2. The shared resource problem can be avoided to some extent with global queue scheduling (at least the hyperthreading related problem), and thus is not an issue.
3. Locality can be handled by decreasing post frequency to the global queue, possibly using a kernel-thread in the scheduler to adapt to different conditions on a longer time-scale (seconds).
4. The power manager should ensure that all cores run at the same frequency, and that all cores are started when load is sufficiently high. This eliminates the differential frequency problem and the temperature problem due to deep sleep.
There are still two issues that I have not solved properly. One has to do with the load on the Intel Core Duo, which is on CPU #0 all the time. It doesn't seem like the global queue algorithm works properly on that CPU. The other (probably related) issue is that load over time on CPUs is not identical on my dual core AMD (or probably on any other CPU). On that machine, CPU #0 idle thread currently has 17 hours and CPU #1 idle thread has 11 hours. I need a way to even-out long term load in order to achieve even temperature. One solution might be to have post frequency to global queue per CPU, and increase it for CPUs with more load and decrease it for CPUs with less load. That probably would solve the long-term load issue, but I'm unsure if it would solve the issue on Intel Core Duo.