Hi,
nooooooooos wrote:Do I have to optimize my OS especially for HT or is it the same as for Dual-Core?
All optimizations are always optional.
For hyperthreading the same physical CPU is pretending to be 2 logical CPUs. This doesn't mean you end up with 2 logical CPUs that have 50% of the performance of the physical CPU. If CPU #1 is doing HLT waiting for an IRQ, then CPU #2 will be able to use 100% of the physical CPU's resources, while if CPU #1 is doing something then CPU #2 will only get 50% of the physical CPU's resources.
Even if both CPUs are doing work, hyperthreading still improves performance by hiding latencies, improving pipeline usage and reducing instruction dependancies. For example, without hyperthreading if a CPU needs to wait for data to come from RAM then the physical CPU does nothing until the data arrives. With hyperthreading, if CPU #1 is waiting for data to come from RAM then CPU #2 can still execute instructions and the physical CPU does useful work. If one logical CPU is doing integer operations and another logical CPU is doing floating point operations, then both logical CPUs are using different pipelines and won't be competing for pipelines. If one logical CPU needs the result of one instruction before it can do the next instruction, then the CPU can execute instructions from another logical CPU instead of waiting for the first instruction to complete.
There are also costs - mostly related to scalability and re-entrancy locking. More CPUs means more CPUs trying to acquire the same re-entrancy locks and higher lock contention (e.g. more chance of CPUs doing nothing until they can acquire a lock). If the OS has poor scalability, then hyper-threading might make performance worse.
Also, logical CPUs compete equally for the physical CPU's resources. For example, if CPU #1 is running a high priority thread and CPU #2 is running a low priority thread, then you might want to run HLT on CPU #2 instead of the low priority thread to improve the performance of the high priority thread.
For optimization, always use the PAUSE instruction in spinloops - it reduces the CPU resources used by the spinning CPU and increases the CPU resources that can be used by other logical CPUs, and does no harm on CPUs that don't have hyperthreading, including CPUs that don't have the PAUSE instruction (it's the same as a NOP in that case).
For the scheduler, if there's 2 physical CPUs with 2 logical CPUs each (4 logical CPUs total), then it's better for performance to have one logical CPU in each physical CPU idle (with no logical CPUs competing for resources) than to have both logical CPUs in the same physical CPU idle (with logical CPUs in the other physical CPU competing for resources). For power management the reverse can be true - for e.g. it might be better to have both logical CPUs in the same physical CPU idle so the physical CPU isn't do anything and can be put into an power saving state to save battery power and reduce heat.
There's also cache sharing. If both logical CPUs are using the same address space, then they can share the cache instead of competing for cache (and getting 50% of the cache each, for e.g.). This means that if 2 threads belong to the same process, then running those threads on the same physical CPU can improve cache efficiency.
The scheduler also needs to decide which CPU a thread should run on. This is where things start to get complicated...
For plain SMP, a CPU might still have some data in it's cache from last time a thread was run, and you can improve performance by running the thread on the same CPU the next time it gets a time slice. For hyperthreading, you get the same benefits from running the thread on any logical CPU within the physical CPU it ran on last time. For multi-core some caches can also be shared - for example, (depending on which CPU) the L2 cache might be shared by all cores in the chip, while the L1 caches are shared by all logical CPUs in the same core. This gives an order or preference - it would be better to run the thread on the same core as last time, but if you can't it'd be better to run the thread on the chip as last time.
However, you also need to think about load balancing - it's bad for performance if one CPU has heaps of work to do (because that's where the threads ran last time) while 3 other CPUs are doing nothing.
What I do when I'm deciding which CPU a thread should use for it's next time slice is calculate a score for each CPU and select the CPU with the best score. The code that calculates the score could consider things like the chance of the thread's data still being in a CPUs cache (how long ago the thread got CPU time), cache sharing, CPU load, NUMA domain, the priority of the thread, power management policy, CPU temperature, etc.
nooooooooos wrote:When I don't want to use the shutdown code, is it possible to skip the sending of the Init-IPI?
You always need the Init-IPI (IIRC, if you don't send the Init-IPI the CPU will ignore the Startup-IPI).
nooooooooos wrote:My last question....: Does it make sense to support APIC even when there aren't any SMP or ACPI tables?
For I/O APICs, you can't configure them without MPS or ACPI tables.
For local APICs you can try to enable them and/or manually probe to see if it's present, so it can make sense (if you're careful with your enabling and probing).
Unless you're having several seperate kernels or seperate HALs (e.g. one for SMP, one for single CPU without APICs, one for single CPU with APICs, etc) then the OS would support APICs regardless of whether the APICs are used or not (and regardless of whether there's ACPI and/or MPS tables)...
Cheers,
Brendan