OSDev.org

Posted: **Sat Sep 13, 2008 4:47 pm**

As far as i know, an HT processor can run two threads from the same process, while the Dual Core can run two different processes.
In a Dual Core there are two processors in one chip, while in the HT there is only one processor. Can anyone explain me how do an HT run two different piece of code?
In the MP tables an HT processor is shown as two different processors?
Can I assign to the "second" HT processor a new process and not a thread?

Thanks

Posted: **Sat Sep 13, 2008 6:38 pm**

To simplify: there are many occasions when a CPU core is "stalled" -- that is, it can't do anything, because it's waiting for something (usually for cache to get filled from main memory, which can take 300 to 500 cpu cycles). During those times, the CPU can do something completely different, theoretically. This is what hyperthreading does. When the CPU gets a cache miss, it switches to the other hyperthread (if you have hyperthreading enabled). When the new thread gets a cache miss, it switches back to the previous hyperthread -- and hopefully the memory request got loaded into cache in the meantime.

Now, in this context, the word "thread" does not mean what you think it does. The CPU can be running ANYTHING in that second hyperthread. It has a complete set of registers, stack ... everything. You can run completely separate processes. It acts like another core or an additional CPU.

HOWEVER (this is the catch), the two hyperthreads share the TLB. And the TLB is the smallest cache on your machine.
So each hyperthreaded process now only gets half of that. Any TLB invalidation by one invalidates the TLB of the other. All the other caches are also shared by the 2 hyperthreads -- L1, L2, code and data, etc.

There was a series of articles posted here that I could find again, that showed a clever way of using the hyperthread exclusively for a memory prefetch, without wasting any TLB space -- for applications that were compiled to make use of it.

Posted: **Sun Sep 14, 2008 12:43 am**

Hi,

AlfaOmega08 wrote:As far as i know, an HT processor can run two threads from the same process, while the Dual Core can run two different processes.

No - hyper-threading behaves almost the same as multi-core (any logical CPU can do anything), except that for hyper-threading logical CPUs share the physical CPU/core's resources, and can effect each other's performance. For example, if one CPU is going flat out and the other CPU is idle, then the first CPU gets %100 of the resources and runs fast, but if the second CPU starts trying to go flat out too then the first CPU will get less CPU resources and start running slower.

Think of it like this...

Inside a modern CPU there's several different execution units - for example, a Pentium 4 (Northwood) has 2 execution units for address generation, 2 double speed execution units used for simple instructions, one execution unit for complex instructions, one execution unit for FPU/SSE/MMX, and one execution unit for FPU/SSE/MMX moves. With all these execution units it's extremely hard to keep them all busy at the same time. For example, if you're not using any complex instructions or FPU/SSE/MMX you'd have 3 execution units doing nothing. There's also things that create small patches of idleness - if one instruction depends on the results of another instruction then the CPU may need to wait for the first instruction to complete, which leaves execution units idle. Then there's things like cache misses, TLB misses, the HLT instruction, etc.

For hyper-threading, the CPU is designed to reduce the chance of execution units being idle by executing 2 completely separate instruction streams at the same time. For example, if one logical CPU is doing integer operations and the other logical CPU is doing floating point operations, then they'd both be using different execution units and you might be able to get twice as much work done. However, if both logical CPUs are trying to use the same execution units then they need to take turns, and you won't get twice as much work done.

The important thing to remember here is that the performance you get from one logical CPU depends on what the other logical CPU is doing. A logical CPU might get 100% of the CPU's resources, or it might get 50% of the CPUs resources, or it might get anything in between.

Because of this there's some important optimizations. The first one is using the "pause" instruction in tight spin-loops, which reduces the resources used by the spinning logical CPU and therefore improves the performance of the other logical CPU.

The other common optimization is scheduling - if there's 2 physical CPUs with hyper-threading (a total of 4 logical CPUs) and the OS only has 2 running tasks, then it's best for performance to schedule one task on each physical CPU (so each task gets 100% of a physical CPU's resources) instead of having both tasks on the same physical CPU (so both tasks need to share a physical CPU's resources while a separate physical CPU does nothing). There's also the reverse optimization (intended to reduce energy consumption, reduce heat and reduce battery life) - schedule both the threads on different logical CPUs in the same physical CPU, so that the other physical CPU is idle and can be put into a low power idle mode.

However, when you start looking at scheduler optimizations there's a lot of other things a modern OS needs to consider - both performance optimizations and heat/energy optimizations (e.g. if a CPU starts to get too hot, schedule work so that the CPU gets a chance to cool down before "thermal throttling" starts); for NUMA, "seperate chip" SMP, multi-core SMP, and hyper-threading; with different caches being shared in different ways.

This is about to get more complicated too, because Intel's newest CPU will be released very soon. It's called "Core i7", and it combines NUMA, multi-core and hyper-threading in the same system. It's going to take some clever engineering to get maximum performance from a quad socket NUMA system with eight cores per CPU and hyper-threading (a total of 64 logical CPUs), but systems like this will start appearing in server rooms in the next 18 months ...

AlfaOmega08 wrote:In the MP tables an HT processor is shown as two different processors?

Because hyper-threading has different performance characteristics (and because there's some optimizations you should do for hyper-threading) the MP tables won't mention logical CPUs at all. Instead you need to use the ACPI tables.

AFAIK the idea here is that old OSs that use the MP table aren't optimized for hyper-threading and won't get much benefit from hyper-threading, and therefore these OSs don't find out about hyper-threading from the MP table; while newer OSs that use the ACPI tables are more likely to be optimized for hyper-threading and more likely to benefit from hyper-threading, and therefore do find out about hyper-threading from the ACPI tables.

Cheers,

Brendan

OSDev.org

Hyper Threading vs Dual Core

Hyper Threading vs Dual Core

Re: Hyper Threading vs Dual Core

Re: Hyper Threading vs Dual Core