Page 1 of 1

Hyper threading -> strategy ok ?

Posted: Mon Apr 13, 2015 11:46 pm
by JulienDarc
Hello,

I read about hyper threading.

I get the whole cpu topology and detected that two cores are for hyperthreading on my test machine.

Now, about the scheduler. I plan to avoid those two cores and let the cpu transparently use it in case of a memory stall on an attached core.
Since they may be 20% of the speed of a real core (value gathered from multiple readings, not myself verified though), I am not sure if I should use them at all for my processes.

I need your opinion : should I assign tasks to hyper threads or not ? In case of a memory stall, isn't it better that they be free ?

I don't know.

Bye

Julien

Re: Hyper threading -> strategy ok ?

Posted: Tue Apr 14, 2015 2:14 am
by madanra
Treat them as normal cores. It's the hardware's job to switch between them on memory stalls etc., a software context switch would be far too slow to take advantage of that.

Re: Hyper threading -> strategy ok ?

Posted: Tue Apr 14, 2015 2:44 am
by Combuster
Hyperthreading was designed to have a second thread ready to take work from if anything in the core loses individual cycles while the first thread is processing. Generally this means you get your (20%) speed improvement by actually using that other thread instead of ignoring it. Also, if you say you have two cores with hyperthreading, you actually have four virtual CPUs.

There are some things to consider: If you have a more recent processor it'll have turbo-boost which means it'll increase the clock speed when only half a core is running, indicating that your 20% improvement includes an overall lower per-core-half performance. The entire mechanic basically means that you get lower single-thread performance, and if you happen to have something that's CPU-heavy and non-threading, you might notice that as performance drop in that specific case.

Another thing to note is that with hyperthreading the two core-halves share most, if not all the caches on that core. It's much more efficient to have as much content shared between the two tasks if you want to do anything better than effectively halving your cache size. This means you'll want to run the same application/address space in both core halves for best performance.

Re: Hyper threading -> strategy ok ?

Posted: Tue Apr 14, 2015 2:59 am
by JulienDarc
Ok I get it.

I will make my scheduler a bit smarter then :)

Thanks a lot!

Re: Hyper threading -> strategy ok ?

Posted: Tue Apr 14, 2015 3:06 am
by Brendan
Hi,
JulienDarc wrote:Now, about the scheduler. I plan to avoid those two cores and let the cpu transparently use it in case of a memory stall on an attached core.
Since they may be 20% of the speed of a real core (value gathered from multiple readings, not myself verified though), I am not sure if I should use them at all for my processes.

I need your opinion : should I assign tasks to hyper threads or not ? In case of a memory stall, isn't it better that they be free ?
First; understand that there's "physical resources" (execution units, caches, etc) and "logical CPUs", where different logical CPUs may share some physical resources.

For Intel's chips without hyper-threading and older AMD chips; most physical resources aren't shared but typically some caches are shared. For example, you might have a quad-core chip where there's one L3 cache that's shared by all 4 cores/logical CPUs, and two L2 caches that are shared by 2 cores/logical CPUs.

For AMD's more recent chips; some caches are shared by different logical CPUs, and pairs of logical CPUs share some execution units (those involved with floating point). For example, a "quad core" chip with 4 logical CPUs would have 4 separate sets of integer execution units and 2 sets of floating point execution units (where a pair of logical CPUs share a set of floating point execution units).

For Intel's chips with hyper-threading; most physical resources are shared by 2 logical CPUs (and some caches may be shared by multiple pairs of logical CPUs). For example, a "quad core" chip with 8 logical CPUs where the resources of a core are shared by 2 logical CPUs.

For all of the above; you can improve performance (and increase power consumption) by reducing "resource sharing". Examples:
  • If 2 tasks both use a lot of memory you can improve performance for those 2 tasks by putting them on logical CPUs that don't share caches
  • If 2 tasks both use a lot of floating point operations you can improve performance for those 2 tasks by putting them on logical CPUs that don't share floating point execution units
  • If 2 tasks both use a lot of integer operations you can improve performance for those 2 tasks by putting them on logical CPUs that don't share integer execution units
The opposite is also true: you can reduce power consumption (and reduce performance) by increasing "resource sharing". For example, if one task uses a lot of memory and another task uses a lot of floating point operations, then putting those 2 tasks on logical CPUs that share caches and execution units won't hurt performance much but will allow other cores to go into "low power" modes.

Now; imagine you've got 3 tasks that all do a lot of floating point operations and you've got a dual core Intel chip with hyper-threading and 4 logical CPUs (or a "quad core" AMD chip where floating point execution units are shared). In this case it comes down to the task's priorities:
  • One very high priority task and 2 low priority tasks: you want both of the low priority tasks running on logical CPUs that share execution units, so that the high priority task doesn't have to share
  • Two extremely high priority tasks and one very low priority task: so that the extremely high priority tasks don't need to share, the very low priority task isn't given CPU time at all.
  • Two medium priority tasks and one low priority task: one medium priority task and the low priority task are run on logical CPUs that share execution units while the other medium priority task runs on a logical CPU without sharing; but you switch CPUs around occasionally so that both medium priority tasks get the same amount of sharing (and the low priority task is always sharing)
  • Three medium priority tasks: you want to switch logical CPUs around occasionally to make it fair. E.g. tasks A and B share for a while, then tasks B and C share for a while, then tasks C and A share for a while.
  • Three extremely low priority tasks: here power consumption is more important, so one pair or logical CPUs are put in a low power mode and you run 2 tasks at a time on logical CPUs that share execution units (e.g. tasks A and B run and share while task C isn't given CPU time, then tasks B and C run while task A isn't given CPU time, etc).
Of course this is easy when there's always a fixed number of tasks. Reality is never that simple - tasks are frequently stopping/blocking and starting/unblocking, and it's impossible to predict what might be ideal in advance and relatively expensive to shift tasks to other CPUs while they're actually executing, and finding out a task's characteristics (if it uses lots of memory accesses or does lots of floating point) isn't easy either. Because of this, finding the "best" way is still an active research topic.


Cheers,

Brendan

Re: Hyper threading -> strategy ok ?

Posted: Tue Apr 14, 2015 3:18 am
by Brendan
Hi again,
Brendan wrote:Reality is never that simple - tasks are frequently stopping/blocking and starting/unblocking, and it's impossible to predict what might be ideal in advance and relatively expensive to shift tasks to other CPUs while they're actually executing, and finding out a task's characteristics (if it uses lots of memory accesses or does lots of floating point) isn't easy either. Because of this, finding the "best" way is still an active research topic.
Just a quick note here...

For cases where code needs to decide between many different options (e.g. which logical CPU a thread should be run on; which video mode should be selected during boot, etc); I like to calculate a score for each option and choose the option with the best score. This allows you to dramatically change behaviour (or fine tune the scheduler) just by changing a relatively simple calculation.


Cheers,

Brendan