x2APIC

Hyperdrive · Post by **Hyperdrive** » Mon Nov 24, 2008 10:00 am

Hi all,

Intel recently released the first Core i7 processors, which have the x2APICs. The updated Intel manuals confuse me a little...

My confusion is about the logical adressing modes.

The 82489DX APICs had only the flat logical destinaion model. The local APICs in Pentium and P6 brought the flat and hierarchical cluster model. Pentium 4/Xeon and newer introduced xAPIC, which don't support the flat cluster model. So we are left with the flat model and the hierarchical cluster model in current Intel processors.

Because hierarchical cluster model needs hardware support in terms of "cluster managers", like Intel calls them, and most of the systems out there don't use them, we are left with the Flat Model.

Now, Intel introduces the x2APICs with Core i7 processors. The manual says in section 9.7.2.3: "[...] Flat logical mode is not supported in the x2APIC mode. [...]" Uhm, wow! If one wants to use the cool new x2APIC mode, the only logical destination model, that is left now, is the unusable "hierarchical cluster model". But hey, we can use 32 bit instead of 8 bit...

Am I right, that there is effectively none working logical destination model, when working with x2APIC mode? That can't be true?! Please help me to see where I get this wrong...

Regards,
Thilo

cyr1x · Post by **cyr1x** » Mon Nov 24, 2008 10:20 am

Having a (very!) quick look at the specification it turns out that you can switch the x2APIC to xAPIC mode, but that also means you can't use any features x2APIC is providing.

Hyperdrive · Post by **Hyperdrive** » Mon Nov 24, 2008 11:22 am

cyr1x wrote:Having a (very!) quick look at the specification it turns out that you can switch the x2APIC to xAPIC mode, but that also means you can't use any features x2APIC is providing.

Sure, you are right. In xAPIC mode you can use the flat logical model. No one forces you to use x2APIC mode. At least, not at the moment or the next 2-5 years. But when someday there are systems with more than 255 CPUs, the BIOS must/has to place the APICs in x2APIC mode. I think still supporting xAPIC mode is just a compatibility issue. Besides, the x2APIC mode has some general benefits over the xAPIC mode, which makes it worth working with it.

But, if I remember correctly, you can't switch from x2APIC mode to xAPIC mode. At least, not directly. Maybe it is possible by first disabling the APIC and then going to xAPIC mode. But for such things I have a healthy mistrust.

Regards,
Thilo

cyr1x · Post by **cyr1x** » Mon Nov 24, 2008 11:28 am

If it comes to that there must be the "cluster managers" then. And I even think

that every x2APIC-compatible system MUST have at least one, so this should be no problem.

Hyperdrive · Post by **Hyperdrive** » Mon Nov 24, 2008 11:53 am

That implies that noone can use this new feature at the moment. They do a new processor and say "This new feature is unusable. Perhaps some day. Please hold the line..." I don't think that makes sense. But, okay, maybe it sounds strange just to me.

01000101 · Post by **01000101** » Mon Nov 24, 2008 12:06 pm

It has its uses, but
A: it is not a necessity at the moment.
B: existing code works fine using the xAPIC
C: it's just too new for code writers to have implemented it yet.

Brendan · Post by **Brendan** » Mon Nov 24, 2008 6:15 pm

Hi,

Hyperdrive wrote:Now, Intel introduces the x2APICs with Core i7 processors. The manual says in section 9.7.2.3: "[...] Flat logical mode is not supported in the x2APIC mode. [...]" Uhm, wow! If one wants to use the cool new x2APIC mode, the only logical destination model, that is left now, is the unusable "hierarchical cluster model". But hey, we can use 32 bit instead of 8 bit...

Am I right, that there is effectively none working logical destination model, when working with x2APIC mode? That can't be true?! Please help me to see where I get this wrong...

I does make sense...

For "single front-side bus" systems, broadcasting a messages to every agent on the bus isn't a problem - they all receive everything anyway. For QuickPath systems broadcasting something to every agent becomes a problem - you'd need to send the data across all of the separate Quickpath interconnects, which would effect latency and use bandwidth.

Consider the recent addition of the "Directed EOI" feature - so that the CPU doesn't need to broadcast the EOI they provided a way to send the EOI directly to the correct I/O APIC to avoid broadcasting the EOIs to everything.

To support "flat cluster model" in QuickPath systems, all APIC interrupts that use logical destination mode would need to be broadcast to everything, and so would most of the EOIs (you can't use directed EOIs if you don't know who sent the interrupt). Therefore, I'd assume x2APIC forces you to use "hierarchical cluster model" to avoid scalability problems.

This does mean an OS needs different code for different situations, but you need different code to support x2APIC anyway. Also note that (AFAIK) all previous Intel based NUMA systems have used "hierarchical cluster model" (with "cluster managers" between NUMA nodes). Of course there wasn't very many Intel based NUMA systems either (I can only think of a few very rare systems based on Pentium III).

Lastly, if there's only one cluster (e.g. a computer with one multi-core chip, with 16 or less logical CPUs) you don't need any "cluster manager" - I'd assume in this case the "cluster ID" must be zero (or perhaps the cluster ID is ignored). I'd also assume that the "cluster manager" will be implemented in the memory controller (that's integrated into the CPU) in a transparent manner (rather than being something an OS needs to configure, etc).

Cheers,

Brendan

Hyperdrive · Post by **Hyperdrive** » Tue Nov 25, 2008 3:50 am

Brendan wrote: Consider the recent addition of the "Directed EOI" feature - so that the CPU doesn't need to broadcast the EOI they provided a way to send the EOI directly to the correct I/O APIC to avoid broadcasting the EOIs to everything.

To support "flat cluster model" in QuickPath systems, all APIC interrupts that use logical destination mode would need to be broadcast to everything, and so would most of the EOIs (you can't use directed EOIs if you don't know who sent the interrupt). Therefore, I'd assume x2APIC forces you to use "hierarchical cluster model" to avoid scalability problems.

Good point.

Brendan wrote: This does mean an OS needs different code for different situations, but you need different code to support x2APIC anyway. Also note that (AFAIK) all previous Intel based NUMA systems have used "hierarchical cluster model" (with "cluster managers" between NUMA nodes). Of course there wasn't very many Intel based NUMA systems either (I can only think of a few very rare systems based on Pentium III).

What do you mean with "an OS needs different code for different situations"?

Brendan wrote: Lastly, if there's only one cluster (e.g. a computer with one multi-core chip, with 16 or less logical CPUs) you don't need any "cluster manager" - I'd assume in this case the "cluster ID" must be zero (or perhaps the cluster ID is ignored). I'd also assume that the "cluster manager" will be implemented in the memory controller (that's integrated into the CPU) in a transparent manner (rather than being something an OS needs to configure, etc).

In the situation you just described, only one cluster is needed and there would be no real use for more than one cluster. As far as I know, the roadmap has only a 8-core/16-thread on it.

Yes, you can assume ID 0 for such situations. But you can't be sure. Maybe they use the ID 0x4711 because the developers like the fragrance. Either a specification/manual tells me to assume some ID or I have to get the information from BIOS (via ACPI tables?). Okay, maybe it will become a de facto standard to use iD 0. (Besides that, it would be sort of freaky to choose an abritrary number out of nothing, so 0 is a good guess.)

I am not aware of anything like that. Do you know about some spec or anything else?

Regards,
Thilo

Brendan · Post by **Brendan** » Tue Nov 25, 2008 6:21 am

Hi,

Hyperdrive wrote:
Brendan wrote:This does mean an OS needs different code for different situations, but you need different code to support x2APIC anyway. Also note that (AFAIK) all previous Intel based NUMA systems have used "hierarchical cluster model" (with "cluster managers" between NUMA nodes). Of course there wasn't very many Intel based NUMA systems either (I can only think of a few very rare systems based on Pentium III).
What do you mean with "an OS needs different code for different situations"?

Um?

If your OS supports any new feature (but doesn't require that feature), then usually it needs some code to setup and use the new feature *plus* some other code that doesn't use the new feature.

For an example consider TLB invalidation. On 80386 you have to reload CR3 because the INVLPG instruction isn't supported. This isn't a bad thing for 80386 because the RAM was the same speed as the CPU; but it is a bad thing for more recent (less ancient?) CPUs because you completely trash the entire TLB. On 80486 you could reload CR3 but most of the time you want to use INVLPG. On modern CPUs you want to use "global" pages for efficiency, but if you do use global pages then reloading CR3 won't work in some situations (e.g. flushing a TLB in kernel space that's marked as "global").

Of course for "multi-CPU" situations you also need to support "TLB shootdown" because sometimes you need flush the TLBs on the local CPU and on other CPUs (but it'd be a silly idea to attempt that when there's only one CPU, especially if there isn't any local APIC).

So, if you combine all of this you end up with at least 4 different pieces of code to invalidate a TLB entry, so that you can get the best performance in each situation (and to avoid crashing when something that isn't supported).

You could use run-time branches, but that idea sucks. Imagine a loop that invalidates lots of pages - do you want a conditional branch in the middle of the loop? A better idea is to have separate versions of the loop and do the conditional branch once to select which version of the loop to use. An alternative is to use an indirect call (or a function pointer, if you prefer the C terminology) and have several different versions of the complete routine/function.

Another way would be to use conditional code (e.g. "#if foo", "#else", "#endif") and expect end users (grandparents and secretaries) to know enough to be able to configure the OS properly before they compile it. IMHO this method is a good way to do things in some situations (e.g. a NAS manufacturer that gets your OS, precompiles it and preinstalls the OS on their products) but for a general purpose desktop/workstation/server OS it's grossly inadequate (most end users have enough trouble finding the power button without figuring out a maze of terminology and acronyms).

An alternative solution is to reduce the range of computers you support. For example, if you don't bother supporting 80386 then half of the "invalidate a TLB entry" routines won't be needed.

Hyperdrive wrote:
Brendan wrote: Lastly, if there's only one cluster (e.g. a computer with one multi-core chip, with 16 or less logical CPUs) you don't need any "cluster manager" - I'd assume in this case the "cluster ID" must be zero (or perhaps the cluster ID is ignored). I'd also assume that the "cluster manager" will be implemented in the memory controller (that's integrated into the CPU) in a transparent manner (rather than being something an OS needs to configure, etc).
In the situation you just described, only one cluster is needed and there would be no real use for more than one cluster. As far as I know, the roadmap has only a 8-core/16-thread on it.

The current roadmap has an 8-core/16-thread chip in it, but you can have 8 of these chips on a motherboard without anything too special (64-cores/128-threads). Sooner or later Intel will release a new roadmap, and sooner or later it'll include 16-core and 32-core CPUs. Also, some companies design special logic into the chipset to go past the "8 chip" limit - for AMD, the Horus chipset is (was?) one example of this.

Hyperdrive wrote:Yes, you can assume ID 0 for such situations. But you can't be sure. Maybe they use the ID 0x4711 because the developers like the fragrance. Either a specification/manual tells me to assume some ID or I have to get the information from BIOS (via ACPI tables?). Okay, maybe it will become a de facto standard to use iD 0. (Besides that, it would be sort of freaky to choose an abritrary number out of nothing, so 0 is a good guess.)

I am not aware of anything like that. Do you know about some spec or anything else?

From (my copy of) the "Intel® 64 Architecture x2APIC Specification":

Intel wrote:A.1 ACPI SPECIFICATION CHANGES TO SUPPORT THE X2APIC ARCHITECTURE

The APIC configuration interfaces described in the Advanced Configuration and Power Interface (ACPI) Specification must be augmented to enable operating system support for platforms employing x2APIC architecture-based components. This appendix describes the required changes to sections of the ACPI 3.0b specification that have been approved for incorporation in the next release of the ACPI specification (ACPI 4.0) to be published on the ACPI web site at: http://www.acpi.info

The scope of ACPI interfaces that are covered in this appendix include:
ACPI’s system description tables: The system tables relevant to x2APIC are:
Multiple APIC description table (MADT)
System Resource Affinity Table (SRAT)
ACPI namespace support for x2APIC
This appendix will be removed from this specification when ACPI 4.0 is published.

There's more in the rest of Appendix A...

Note: The x2APIC ID you'd get from ACPI isn't the logical APIC ID. For the logical APIC ID you'd read it from the local APIC's "logical APIC ID" register.

Cheers,

Brendan

Hyperdrive · Post by **Hyperdrive** » Tue Nov 25, 2008 7:28 am

Brendan wrote:
Hyperdrive wrote:
Brendan wrote:This does mean an OS needs different code for different situations, but you need different code to support x2APIC anyway. Also note that (AFAIK) all previous Intel based NUMA systems have used "hierarchical cluster model" (with "cluster managers" between NUMA nodes). Of course there wasn't very many Intel based NUMA systems either (I can only think of a few very rare systems based on Pentium III).
What do you mean with "an OS needs different code for different situations"?
Um?

If your OS supports any new feature (but doesn't require that feature), then usually it needs some code to setup and use the new feature *plus* some other code that doesn't use the new feature.

For an example consider TLB invalidation. [...]

Okay, I agree in all points. We were somehow talking past each other. Probably I wasn't very clear about my point. But I think I got what you meant.

Brendan wrote:
Hyperdrive wrote:
Brendan wrote: Lastly, if there's only one cluster (e.g. a computer with one multi-core chip, with 16 or less logical CPUs) you don't need any "cluster manager" - I'd assume in this case the "cluster ID" must be zero (or perhaps the cluster ID is ignored). I'd also assume that the "cluster manager" will be implemented in the memory controller (that's integrated into the CPU) in a transparent manner (rather than being something an OS needs to configure, etc).
In the situation you just described, only one cluster is needed and there would be no real use for more than one cluster. As far as I know, the roadmap has only a 8-core/16-thread on it.
The current roadmap has an 8-core/16-thread chip in it, but you can have 8 of these chips on a motherboard without anything too special (64-cores/128-threads). Sooner or later Intel will release a new roadmap, and sooner or later it'll include 16-core and 32-core CPUs. Also, some companies design special logic into the chipset to go past the "8 chip" limit - for AMD, the Horus chipset is (was?) one example of this.

Sure. It's just a roadmap and it's just a current snapshot. They will definitely have more cores and more threads. But by then, they will have the needed "cluster managers". And the OS developers may or may not have to take it into account, depending how transparent the solution will be. For now, they don't need any "cluster managers" necessarily, so there aren't any (I guess).

Brendan wrote:
Hyperdrive wrote:Yes, you can assume ID 0 for such situations. But you can't be sure. Maybe they use the ID 0x4711 because the developers like the fragrance. Either a specification/manual tells me to assume some ID or I have to get the information from BIOS (via ACPI tables?). Okay, maybe it will become a de facto standard to use iD 0. (Besides that, it would be sort of freaky to choose an abritrary number out of nothing, so 0 is a good guess.)

I am not aware of anything like that. Do you know about some spec or anything else?
From (my copy of) the "Intel® 64 Architecture x2APIC Specification":

Intel wrote:A.1 ACPI SPECIFICATION CHANGES TO SUPPORT THE X2APIC ARCHITECTURE
[...]
There's more in the rest of Appendix A...

I know Appendix A, but this gave no answer to what cluster ID has to be assumed. It just extends the APIC-related structers in ACPI tables to support x2APIC IDs. I first thought, maybe the proximity domain number relates to the cluster ID (or vice versa), but this wasn't mentioned anywhere so I dropped this idea.

Actually, the second sentence of your note is exactly what I needed...

Brendan wrote: Note: The x2APIC ID you'd get from ACPI isn't the logical APIC ID. For the logical APIC ID you'd read it from the local APIC's "logical APIC ID" register.

So I was going back to the x2APIC spec and found this:

Intel x2APIC spec wrote: 2.4.2 Logical Destination Register
[...]
To enable cluster ID assignment in a fashion that matches the system topology characteristics
and to enable efficient routing of logical mode lowest priority device interrupts
in link based platform interconnects, the LDR are initialized by hardware based
on the value of x2APIC ID upon x2APIC state transitions. Details of this initialization
are provided in Section 2.4.4.

Thus, the answer is: You can get the cluster ID by reading the LDR and taking the cluster ID portion of it.

Sorry, this was so obvious to see and I missed it. I thought there must be some sort of BiOS data blocks to get this information but as it turns out, it is much simpler. I just overlooked it. Sorry.

Thanks for your time and the nice discussion!

Regards,
THilo

OSDev.org

x2APIC

x2APIC

Re: x2APIC

Re: x2APIC

Re: x2APIC

Re: x2APIC

Re: x2APIC

Re: x2APIC

Re: x2APIC

Re: x2APIC

Re: x2APIC