I was wondering how an OS identifies each core and how the task scheduler chooses a core
to run a certain task. Are there a certain group of x86 instructions which are dedicated to this mechanism in identifying a core and choosing which core to schedule the next task?
I'm talking about only the x86-based architecture. Thanks
Mechanism in Choosing a Core in a Multicore Processor
-
- Posts: 6
- Joined: Thu Sep 06, 2007 8:44 am
Re: Mechanism in Choosing a Core in a Multicore Processor
Hi,
Start by defining an generic model for the 80x86 architecture. This model has 4 basic levels:
For example, with Intel's terminology a "Core I7" is a physical CPU that contains 4 cores and 8 logical CPUs, and an 80486 is a physical CPU that contains 1 core and 1 logical CPU.
NUMA is the only thing that's outside the chip. For example, you could have 4 physical CPUs that are all in the same NUMA domain, or 4 physical CPUs that are all in different NUMA domains, or 4 physical CPUs where 2 are in one NUMA domain and the other 2 are in another NUMA domain. By definition a CPU can access things that are within it's own NUMA domain faster than it can access things that are in other NUMA domains.
In general, for "multi-chip" computers (or computers with 2 or more physical CPUs), all systems based on "Core I7" (and later?) Intel CPUs are NUMA, and all systems based on AMD's Opteron CPUs are also NUMA. Computers based on older CPUs normally aren't NUMA, because all physical CPUs share the same front-side bus; but there are some (large/expensive) servers based on these CPUs that are NUMA.
For an example, IBM is currently selling servers that can handle sixteen 6-core CPUs, where the CPUs are based on Intel's "Core" microarchitecture. To do this they mainly have local front-side buses (e.g. with 4 physical CPUs per front-side bus) where these front-side buses use a special interconnect to talk to each other (which makes the system NUMA, because there's extra latency when a CPU on one front-side bus tries to talk to a CPU or RAM on a different front-side bus).
To detect how many logical CPUs are present, you need to parse the MP specification and/or ACPI tables. After you've done this you don't know the relationships between different logical CPUs. Detecting the relationships between different logical CPUs needs to be done in 2 steps.
The first step is to use CPUID to detect the relationship between logical CPUs, cores and physical CPUs. Unfortunately getting all the information can be painful, because you need to use different methods for different CPUs (even different CPUs from the same manufacturer), especially when you want to know which logical CPUs are sharing which caches.
For example, if there's 8 logical CPUs you might find that CPU0 and CPU1 share core0, CPU2 and CPU3 share core1, CPU4 and CPU5 share core2, and CPU5 and CPU6 share core3; and that core0 and core1 (CPUs 0, 1, 2 and 3) share the same physical chip and L3 cache, and core2 and core3 (CPUs 4, 5, 6 and 7) share the same physical chip and L3 cache. After this is done you don't know the relationships between different physical CPUs/chips.
To detect the relationships between different physical CPUs and each other (and between different physical CPUs and different areas of RAM), you need to look at ACPI tables - the SRAT (System Resource Affinity Table) and the SLIT (System Locality Distance Information Table).
How the task scheduler chooses to use all of this information depends on how the scheduler is designed - that part is mostly up to the OS designer (you).
The first thing to consider is what you're optimizing the scheduler for. Do you want maximum performance, or do you want to limit heat and/or CPU fan noise, or maybe maximize laptop battery life, or something else, or a combination of all of these things?
IMHO an extremely good scheduler would take into account logical CPU load/s; and the relationships between logical CPUs (and each other, and caches, and RAM, etc); and physical CPU fan speeds and temperatures; and things like the "Turbo Mode" in Core I7. Of course I don't think any scheduler that's been written so far is capable of doing all of this...
Cheers,
Brendan
Start by defining an generic model for the 80x86 architecture. This model has 4 basic levels:
- Logical CPUs (I call them "CPUs")
- Cores
- Physical CPUs (I call them "chips")
- NUMA domains
For example, with Intel's terminology a "Core I7" is a physical CPU that contains 4 cores and 8 logical CPUs, and an 80486 is a physical CPU that contains 1 core and 1 logical CPU.
NUMA is the only thing that's outside the chip. For example, you could have 4 physical CPUs that are all in the same NUMA domain, or 4 physical CPUs that are all in different NUMA domains, or 4 physical CPUs where 2 are in one NUMA domain and the other 2 are in another NUMA domain. By definition a CPU can access things that are within it's own NUMA domain faster than it can access things that are in other NUMA domains.
In general, for "multi-chip" computers (or computers with 2 or more physical CPUs), all systems based on "Core I7" (and later?) Intel CPUs are NUMA, and all systems based on AMD's Opteron CPUs are also NUMA. Computers based on older CPUs normally aren't NUMA, because all physical CPUs share the same front-side bus; but there are some (large/expensive) servers based on these CPUs that are NUMA.
For an example, IBM is currently selling servers that can handle sixteen 6-core CPUs, where the CPUs are based on Intel's "Core" microarchitecture. To do this they mainly have local front-side buses (e.g. with 4 physical CPUs per front-side bus) where these front-side buses use a special interconnect to talk to each other (which makes the system NUMA, because there's extra latency when a CPU on one front-side bus tries to talk to a CPU or RAM on a different front-side bus).
To detect how many logical CPUs are present, you need to parse the MP specification and/or ACPI tables. After you've done this you don't know the relationships between different logical CPUs. Detecting the relationships between different logical CPUs needs to be done in 2 steps.
The first step is to use CPUID to detect the relationship between logical CPUs, cores and physical CPUs. Unfortunately getting all the information can be painful, because you need to use different methods for different CPUs (even different CPUs from the same manufacturer), especially when you want to know which logical CPUs are sharing which caches.
For example, if there's 8 logical CPUs you might find that CPU0 and CPU1 share core0, CPU2 and CPU3 share core1, CPU4 and CPU5 share core2, and CPU5 and CPU6 share core3; and that core0 and core1 (CPUs 0, 1, 2 and 3) share the same physical chip and L3 cache, and core2 and core3 (CPUs 4, 5, 6 and 7) share the same physical chip and L3 cache. After this is done you don't know the relationships between different physical CPUs/chips.
To detect the relationships between different physical CPUs and each other (and between different physical CPUs and different areas of RAM), you need to look at ACPI tables - the SRAT (System Resource Affinity Table) and the SLIT (System Locality Distance Information Table).
How the task scheduler chooses to use all of this information depends on how the scheduler is designed - that part is mostly up to the OS designer (you).
The first thing to consider is what you're optimizing the scheduler for. Do you want maximum performance, or do you want to limit heat and/or CPU fan noise, or maybe maximize laptop battery life, or something else, or a combination of all of these things?
IMHO an extremely good scheduler would take into account logical CPU load/s; and the relationships between logical CPUs (and each other, and caches, and RAM, etc); and physical CPU fan speeds and temperatures; and things like the "Turbo Mode" in Core I7. Of course I don't think any scheduler that's been written so far is capable of doing all of this...
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
-
- Posts: 6
- Joined: Thu Sep 06, 2007 8:44 am
Re: Mechanism in Choosing a Core in a Multicore Processor
Thanks a lot for such an insightful answer. Brendan