Hi,
AlfaOmega08 wrote:As far as i know, an HT processor can run two threads from the same process, while the Dual Core can run two different processes.
No - hyper-threading behaves almost the same as multi-core (any logical CPU can do anything), except that for hyper-threading logical CPUs share the physical CPU/core's resources, and can effect each other's performance. For example, if one CPU is going flat out and the other CPU is idle, then the first CPU gets %100 of the resources and runs fast, but if the second CPU starts trying to go flat out too then the first CPU will get less CPU resources and start running slower.
Think of it like this...
Inside a modern CPU there's several different execution units - for example, a Pentium 4 (Northwood) has 2 execution units for address generation, 2 double speed execution units used for simple instructions, one execution unit for complex instructions, one execution unit for FPU/SSE/MMX, and one execution unit for FPU/SSE/MMX moves. With all these execution units it's extremely hard to keep them all busy at the same time. For example, if you're not using any complex instructions or FPU/SSE/MMX you'd have 3 execution units doing nothing. There's also things that create small patches of idleness - if one instruction depends on the results of another instruction then the CPU may need to wait for the first instruction to complete, which leaves execution units idle. Then there's things like cache misses, TLB misses, the HLT instruction, etc.
For hyper-threading, the CPU is designed to reduce the chance of execution units being idle by executing 2 completely separate instruction streams at the same time. For example, if one logical CPU is doing integer operations and the other logical CPU is doing floating point operations, then they'd both be using different execution units and you might be able to get twice as much work done. However, if both logical CPUs are trying to use the same execution units then they need to take turns, and you won't get twice as much work done.
The important thing to remember here is that the performance you get from one logical CPU depends on what the other logical CPU is doing. A logical CPU might get 100% of the CPU's resources, or it might get 50% of the CPUs resources, or it might get anything in between.
Because of this there's some important optimizations. The first one is using the "pause" instruction in tight spin-loops, which reduces the resources used by the spinning logical CPU and therefore improves the performance of the other logical CPU.
The other common optimization is scheduling - if there's 2 physical CPUs with hyper-threading (a total of 4 logical CPUs) and the OS only has 2 running tasks, then it's best for performance to schedule one task on each physical CPU (so each task gets 100% of a physical CPU's resources) instead of having both tasks on the same physical CPU (so both tasks need to share a physical CPU's resources while a separate physical CPU does nothing). There's also the reverse optimization (intended to reduce energy consumption, reduce heat and reduce battery life) - schedule both the threads on different logical CPUs in the same physical CPU, so that the other physical CPU is idle and can be put into a low power idle mode.
However, when you start looking at scheduler optimizations there's a lot of other things a modern OS needs to consider - both performance optimizations and heat/energy optimizations (e.g. if a CPU starts to get too hot, schedule work so that the CPU gets a chance to cool down before "thermal throttling" starts); for NUMA, "seperate chip" SMP, multi-core SMP, and hyper-threading; with different caches being shared in different ways.
This is about to get more complicated too, because Intel's newest CPU will be released very soon. It's called
"Core i7", and it combines NUMA, multi-core and hyper-threading in the same system. It's going to take some clever engineering to get maximum performance from a quad socket NUMA system with eight cores per CPU and hyper-threading (a total of 64 logical CPUs), but systems like this will start appearing in server rooms in the next 18 months ...
AlfaOmega08 wrote:In the MP tables an HT processor is shown as two different processors?
Because hyper-threading has different performance characteristics (and because there's some optimizations you should do for hyper-threading) the MP tables won't mention logical CPUs at all. Instead you need to use the ACPI tables.
AFAIK the idea here is that old OSs that use the MP table aren't optimized for hyper-threading and won't get much benefit from hyper-threading, and therefore these OSs don't find out about hyper-threading from the MP table; while newer OSs that use the ACPI tables are more likely to be optimized for hyper-threading and more likely to benefit from hyper-threading, and therefore do find out about hyper-threading from the ACPI tables.
Cheers,
Brendan