Hyper threading -> strategy ok ?

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
JulienDarc
Member
Member
Posts: 97
Joined: Tue Mar 10, 2015 10:08 am

Hyper threading -> strategy ok ?

Post by JulienDarc »

Hello,

I read about hyper threading.

I get the whole cpu topology and detected that two cores are for hyperthreading on my test machine.

Now, about the scheduler. I plan to avoid those two cores and let the cpu transparently use it in case of a memory stall on an attached core.
Since they may be 20% of the speed of a real core (value gathered from multiple readings, not myself verified though), I am not sure if I should use them at all for my processes.

I need your opinion : should I assign tasks to hyper threads or not ? In case of a memory stall, isn't it better that they be free ?

I don't know.

Bye

Julien
madanra
Member
Member
Posts: 149
Joined: Mon Sep 07, 2009 12:01 pm

Re: Hyper threading -> strategy ok ?

Post by madanra »

Treat them as normal cores. It's the hardware's job to switch between them on memory stalls etc., a software context switch would be far too slow to take advantage of that.
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: Hyper threading -> strategy ok ?

Post by Combuster »

Hyperthreading was designed to have a second thread ready to take work from if anything in the core loses individual cycles while the first thread is processing. Generally this means you get your (20%) speed improvement by actually using that other thread instead of ignoring it. Also, if you say you have two cores with hyperthreading, you actually have four virtual CPUs.

There are some things to consider: If you have a more recent processor it'll have turbo-boost which means it'll increase the clock speed when only half a core is running, indicating that your 20% improvement includes an overall lower per-core-half performance. The entire mechanic basically means that you get lower single-thread performance, and if you happen to have something that's CPU-heavy and non-threading, you might notice that as performance drop in that specific case.

Another thing to note is that with hyperthreading the two core-halves share most, if not all the caches on that core. It's much more efficient to have as much content shared between the two tasks if you want to do anything better than effectively halving your cache size. This means you'll want to run the same application/address space in both core halves for best performance.
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
JulienDarc
Member
Member
Posts: 97
Joined: Tue Mar 10, 2015 10:08 am

Re: Hyper threading -> strategy ok ?

Post by JulienDarc »

Ok I get it.

I will make my scheduler a bit smarter then :)

Thanks a lot!
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Hyper threading -> strategy ok ?

Post by Brendan »

Hi,
JulienDarc wrote:Now, about the scheduler. I plan to avoid those two cores and let the cpu transparently use it in case of a memory stall on an attached core.
Since they may be 20% of the speed of a real core (value gathered from multiple readings, not myself verified though), I am not sure if I should use them at all for my processes.

I need your opinion : should I assign tasks to hyper threads or not ? In case of a memory stall, isn't it better that they be free ?
First; understand that there's "physical resources" (execution units, caches, etc) and "logical CPUs", where different logical CPUs may share some physical resources.

For Intel's chips without hyper-threading and older AMD chips; most physical resources aren't shared but typically some caches are shared. For example, you might have a quad-core chip where there's one L3 cache that's shared by all 4 cores/logical CPUs, and two L2 caches that are shared by 2 cores/logical CPUs.

For AMD's more recent chips; some caches are shared by different logical CPUs, and pairs of logical CPUs share some execution units (those involved with floating point). For example, a "quad core" chip with 4 logical CPUs would have 4 separate sets of integer execution units and 2 sets of floating point execution units (where a pair of logical CPUs share a set of floating point execution units).

For Intel's chips with hyper-threading; most physical resources are shared by 2 logical CPUs (and some caches may be shared by multiple pairs of logical CPUs). For example, a "quad core" chip with 8 logical CPUs where the resources of a core are shared by 2 logical CPUs.

For all of the above; you can improve performance (and increase power consumption) by reducing "resource sharing". Examples:
  • If 2 tasks both use a lot of memory you can improve performance for those 2 tasks by putting them on logical CPUs that don't share caches
  • If 2 tasks both use a lot of floating point operations you can improve performance for those 2 tasks by putting them on logical CPUs that don't share floating point execution units
  • If 2 tasks both use a lot of integer operations you can improve performance for those 2 tasks by putting them on logical CPUs that don't share integer execution units
The opposite is also true: you can reduce power consumption (and reduce performance) by increasing "resource sharing". For example, if one task uses a lot of memory and another task uses a lot of floating point operations, then putting those 2 tasks on logical CPUs that share caches and execution units won't hurt performance much but will allow other cores to go into "low power" modes.

Now; imagine you've got 3 tasks that all do a lot of floating point operations and you've got a dual core Intel chip with hyper-threading and 4 logical CPUs (or a "quad core" AMD chip where floating point execution units are shared). In this case it comes down to the task's priorities:
  • One very high priority task and 2 low priority tasks: you want both of the low priority tasks running on logical CPUs that share execution units, so that the high priority task doesn't have to share
  • Two extremely high priority tasks and one very low priority task: so that the extremely high priority tasks don't need to share, the very low priority task isn't given CPU time at all.
  • Two medium priority tasks and one low priority task: one medium priority task and the low priority task are run on logical CPUs that share execution units while the other medium priority task runs on a logical CPU without sharing; but you switch CPUs around occasionally so that both medium priority tasks get the same amount of sharing (and the low priority task is always sharing)
  • Three medium priority tasks: you want to switch logical CPUs around occasionally to make it fair. E.g. tasks A and B share for a while, then tasks B and C share for a while, then tasks C and A share for a while.
  • Three extremely low priority tasks: here power consumption is more important, so one pair or logical CPUs are put in a low power mode and you run 2 tasks at a time on logical CPUs that share execution units (e.g. tasks A and B run and share while task C isn't given CPU time, then tasks B and C run while task A isn't given CPU time, etc).
Of course this is easy when there's always a fixed number of tasks. Reality is never that simple - tasks are frequently stopping/blocking and starting/unblocking, and it's impossible to predict what might be ideal in advance and relatively expensive to shift tasks to other CPUs while they're actually executing, and finding out a task's characteristics (if it uses lots of memory accesses or does lots of floating point) isn't easy either. Because of this, finding the "best" way is still an active research topic.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Hyper threading -> strategy ok ?

Post by Brendan »

Hi again,
Brendan wrote:Reality is never that simple - tasks are frequently stopping/blocking and starting/unblocking, and it's impossible to predict what might be ideal in advance and relatively expensive to shift tasks to other CPUs while they're actually executing, and finding out a task's characteristics (if it uses lots of memory accesses or does lots of floating point) isn't easy either. Because of this, finding the "best" way is still an active research topic.
Just a quick note here...

For cases where code needs to decide between many different options (e.g. which logical CPU a thread should be run on; which video mode should be selected during boot, etc); I like to calculate a score for each option and choose the option with the best score. This allows you to dramatically change behaviour (or fine tune the scheduler) just by changing a relatively simple calculation.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Post Reply