Hi,
JulienDarc wrote:Now, about the scheduler. I plan to avoid those two cores and let the cpu transparently use it in case of a memory stall on an attached core.
Since they may be 20% of the speed of a real core (value gathered from multiple readings, not myself verified though), I am not sure if I should use them at all for my processes.
I need your opinion : should I assign tasks to hyper threads or not ? In case of a memory stall, isn't it better that they be free ?
First; understand that there's "physical resources" (execution units, caches, etc) and "logical CPUs", where different logical CPUs may share some physical resources.
For Intel's chips without hyper-threading and older AMD chips; most physical resources aren't shared but typically some caches are shared. For example, you might have a quad-core chip where there's one L3 cache that's shared by all 4 cores/logical CPUs, and two L2 caches that are shared by 2 cores/logical CPUs.
For AMD's more recent chips; some caches are shared by different logical CPUs, and pairs of logical CPUs share some execution units (those involved with floating point). For example, a "quad core" chip with 4 logical CPUs would have 4 separate sets of integer execution units and 2 sets of floating point execution units (where a pair of logical CPUs share a set of floating point execution units).
For Intel's chips with hyper-threading; most physical resources are shared by 2 logical CPUs (and some caches may be shared by multiple pairs of logical CPUs). For example, a "quad core" chip with 8 logical CPUs where the resources of a core are shared by 2 logical CPUs.
For all of the above; you can improve performance (and increase power consumption) by reducing "resource sharing". Examples:
- If 2 tasks both use a lot of memory you can improve performance for those 2 tasks by putting them on logical CPUs that don't share caches
- If 2 tasks both use a lot of floating point operations you can improve performance for those 2 tasks by putting them on logical CPUs that don't share floating point execution units
- If 2 tasks both use a lot of integer operations you can improve performance for those 2 tasks by putting them on logical CPUs that don't share integer execution units
The opposite is also true: you can reduce power consumption (and reduce performance) by increasing "resource sharing". For example, if one task uses a lot of memory and another task uses a lot of floating point operations, then putting those 2 tasks on logical CPUs that share caches and execution units won't hurt performance much but will allow other cores to go into "low power" modes.
Now; imagine you've got 3 tasks that all do a lot of floating point operations and you've got a dual core Intel chip with hyper-threading and 4 logical CPUs (or a "quad core" AMD chip where floating point execution units are shared). In this case it comes down to the task's priorities:
- One very high priority task and 2 low priority tasks: you want both of the low priority tasks running on logical CPUs that share execution units, so that the high priority task doesn't have to share
- Two extremely high priority tasks and one very low priority task: so that the extremely high priority tasks don't need to share, the very low priority task isn't given CPU time at all.
- Two medium priority tasks and one low priority task: one medium priority task and the low priority task are run on logical CPUs that share execution units while the other medium priority task runs on a logical CPU without sharing; but you switch CPUs around occasionally so that both medium priority tasks get the same amount of sharing (and the low priority task is always sharing)
- Three medium priority tasks: you want to switch logical CPUs around occasionally to make it fair. E.g. tasks A and B share for a while, then tasks B and C share for a while, then tasks C and A share for a while.
- Three extremely low priority tasks: here power consumption is more important, so one pair or logical CPUs are put in a low power mode and you run 2 tasks at a time on logical CPUs that share execution units (e.g. tasks A and B run and share while task C isn't given CPU time, then tasks B and C run while task A isn't given CPU time, etc).
Of course this is easy when there's always a fixed number of tasks. Reality is never that simple - tasks are frequently stopping/blocking and starting/unblocking, and it's impossible to predict what might be ideal in advance and relatively expensive to shift tasks to other CPUs while they're actually executing, and finding out a task's characteristics (if it uses lots of memory accesses or does lots of floating point) isn't easy either. Because of this, finding the "best" way is still an active research topic.
Cheers,
Brendan