Hi,
Hyperdrive wrote:Brendan wrote:This does mean an OS needs different code for different situations, but you need different code to support x2APIC anyway. Also note that (AFAIK) all previous Intel based NUMA systems have used "hierarchical cluster model" (with "cluster managers" between NUMA nodes). Of course there wasn't very many Intel based NUMA systems either (I can only think of a few very rare systems based on Pentium III).
What do you mean with "an OS needs different code for different situations"?
Um?
If your OS supports any new feature (but doesn't require that feature), then usually it needs some code to setup and use the new feature *plus* some other code that doesn't use the new feature.
For an example consider TLB invalidation. On 80386 you have to reload CR3 because the INVLPG instruction isn't supported. This isn't a bad thing for 80386 because the RAM was the same speed as the CPU; but it is a bad thing for more recent (less ancient?) CPUs because you completely trash the entire TLB. On 80486 you could reload CR3 but most of the time you want to use INVLPG. On modern CPUs you want to use "global" pages for efficiency, but if you do use global pages then reloading CR3 won't work in some situations (e.g. flushing a TLB in kernel space that's marked as "global").
Of course for "multi-CPU" situations you also need to support "TLB shootdown" because sometimes you need flush the TLBs on the local CPU and on other CPUs (but it'd be a silly idea to attempt that when there's only one CPU, especially if there isn't any local APIC).
So, if you combine all of this you end up with at least 4 different pieces of code to invalidate a TLB entry, so that you can get the best performance in each situation (and to avoid crashing when something that isn't supported).
You could use run-time branches, but that idea sucks. Imagine a loop that invalidates lots of pages - do you want a conditional branch in the middle of the loop? A better idea is to have separate versions of the loop and do the conditional branch once to select which version of the loop to use. An alternative is to use an indirect call (or a function pointer, if you prefer the C terminology) and have several different versions of the complete routine/function.
Another way would be to use conditional code (e.g. "#if foo", "#else", "#endif") and expect end users (grandparents and secretaries) to know enough to be able to configure the OS properly before they compile it. IMHO this method is a good way to do things in some situations (e.g. a NAS manufacturer that gets your OS, precompiles it and preinstalls the OS on their products) but for a general purpose desktop/workstation/server OS it's grossly inadequate (most end users have enough trouble finding the power button without figuring out a maze of terminology and acronyms).
An alternative solution is to reduce the range of computers you support. For example, if you don't bother supporting 80386 then half of the "invalidate a TLB entry" routines won't be needed.
Hyperdrive wrote:Brendan wrote:
Lastly, if there's only one cluster (e.g. a computer with one multi-core chip, with 16 or less logical CPUs) you don't need any "cluster manager" - I'd assume in this case the "cluster ID" must be zero (or perhaps the cluster ID is ignored). I'd also assume that the "cluster manager" will be implemented in the memory controller (that's integrated into the CPU) in a transparent manner (rather than being something an OS needs to configure, etc).
In the situation you just described, only one cluster is needed and there would be no real use for more than one cluster. As far as I know, the roadmap has only a 8-core/16-thread on it.
The current roadmap has an 8-core/16-thread chip in it, but you can have 8 of these chips on a motherboard without anything too special (64-cores/128-threads). Sooner or later Intel will release a new roadmap, and sooner or later it'll include 16-core and 32-core CPUs. Also, some companies design special logic into the chipset to go past the "8 chip" limit - for AMD, the
Horus chipset is (was?) one example of this.
Hyperdrive wrote:Yes, you can assume ID 0 for such situations. But you can't be sure. Maybe they use the ID 0x4711 because the developers like the
fragrance. Either a specification/manual tells me to assume some ID or I have to get the information from BIOS (via ACPI tables?). Okay, maybe it will become a de facto standard to use iD 0. (Besides that, it would be sort of freaky to choose an abritrary number out of nothing, so 0 is a good guess.)
I am not aware of anything like that. Do
you know about some spec or anything else?
From (my copy of) the "Intel® 64 Architecture x2APIC Specification":
Intel wrote:A.1 ACPI SPECIFICATION CHANGES TO SUPPORT THE X2APIC ARCHITECTURE
The APIC configuration interfaces described in the Advanced Configuration and Power Interface (ACPI) Specification must be augmented to enable operating system support for platforms employing x2APIC architecture-based components. This appendix describes the required changes to sections of the ACPI 3.0b specification that have been approved for incorporation in the next release of the ACPI specification (ACPI 4.0) to be published on the ACPI web site at:
http://www.acpi.info
The scope of ACPI interfaces that are covered in this appendix include:
- ACPI’s system description tables: The system tables relevant to x2APIC are:
- Multiple APIC description table (MADT)
- System Resource Affinity Table (SRAT)
- ACPI namespace support for x2APIC
This appendix will be removed from this specification when ACPI 4.0 is published.
There's more in the rest of Appendix A...
Note: The x2APIC ID you'd get from ACPI isn't the logical APIC ID. For the logical APIC ID you'd read it from the local APIC's "logical APIC ID" register.
Cheers,
Brendan