[SOLVED] How IDs are assigned to cpu cores?
[SOLVED] How IDs are assigned to cpu cores?
Hi everyone!!!
I go stright to the point: are there any assumptions I can make about the way IDs are assigned by the BIOS to cpu cores?
Are they always incremented by one or have I to expect a different assignation schema in certain systems? For example a blade server with four multicore CPUs will have id ranging from 0,1,2,...n or I might find something different?
This apparently innocent question can have a deep impact on code design because I'm feeling "unsafe" to use the cpu id as index in some per-cpu data array...
As always thank you in advance!!!
Regards, Teo
P.S.: have nice holidays!!!
I go stright to the point: are there any assumptions I can make about the way IDs are assigned by the BIOS to cpu cores?
Are they always incremented by one or have I to expect a different assignation schema in certain systems? For example a blade server with four multicore CPUs will have id ranging from 0,1,2,...n or I might find something different?
This apparently innocent question can have a deep impact on code design because I'm feeling "unsafe" to use the cpu id as index in some per-cpu data array...
As always thank you in advance!!!
Regards, Teo
P.S.: have nice holidays!!!
Last edited by nop on Wed Dec 26, 2012 11:42 am, edited 1 time in total.
OS development is the intelligent alternative to drugs
- gravaera
- Member
- Posts: 737
- Joined: Tue Jun 02, 2009 4:35 pm
- Location: Supporting the cause: Use \tabs to indent code. NOT \x20 spaces.
Re: How IDs are assigned to cpu cores?
Yo:
Logical CPUs do not need to be numbered sequentially. The only assumption you can make is that they will be unique (and you can panic if they are not).
--Peace out,
gravaera.
Logical CPUs do not need to be numbered sequentially. The only assumption you can make is that they will be unique (and you can panic if they are not).
--Peace out,
gravaera.
17:56 < sortie> Paging is called paging because you need to draw it on pages in your notebook to succeed at it.
Re: How IDs are assigned to cpu cores?
This is what i expected, because I haven't read any different (but I hoped for a different answer )... Now I'll have a nice time trying to figure out how to map a hardware cpu id to a logical CPU number with a constant time algorithm...gravaera wrote:Logical CPUs do not need to be numbered sequentially
Thank you!!!
Teo
OS development is the intelligent alternative to drugs
Re: How IDs are assigned to cpu cores?
Logical CPU IDs can be anything, so watch out for:
- Do not assume every logical CPU ID is used
- Do not assume a certain logical CPU ID is always used. Logical CPU 0 for example is not used for some AMD Trinity chips.
- From the type of Local APIC you can determine the maximum logical CPU ID - for instance, 4-bit CPU IDs have a maximum of 15; 8-bit CPU IDs have a maximum of 255, etc.
- Logical CPU IDs tend to be in groups, though there is no guarantee. Try to write code that performs well on CPU IDs where the IDs are grouped together (e.g. 193-196 for an example 4-core system), then don't worry about performance or memory wasted for completely random CPU IDs. But do take care to write code that operates correctly on any random CPU IDs.
Re: How IDs are assigned to cpu cores?
This is exactly what I've done so far (I wake up al APs returned by the ACPI without distinctions).sounds wrote:But do take care to write code that operates correctly on any random CPU IDs.[/list]
The problem arises, for example, when the core #123 wants to access a per-cpu structure and there are 4 cores in the system: I need a conversion from the hardware id to a sort of logical-id which can be used for example as an index of an array.
And I need this to be fast and constant in time for all the cores: I cannot permit a cpu to access cached data faster than another (by cached data I mean, for example, array of free pages etc etc)
OS development is the intelligent alternative to drugs
Re: How IDs are assigned to cpu cores?
For small quantity of cores, you may just assume there are 256 cores and only 4 are usable (252 non-existent cores are "faulty / not-usable")nop wrote:The problem arises, for example, when the core #123 wants to access a per-cpu structure and there are 4 cores in the system: I need a conversion from the hardware id to a sort of logical-id which can be used for example as an index of an array.
For more core, the whole kernel should be redesigned and this lookup is the least thing you want to care.
There are many factors affecting the actual timing, including temperature, hyper-threading, etc; you can not assure they are in same speed even they are executing the same instructions.nop wrote:And I need this to be fast and constant in time for all the cores: I cannot permit a cpu to access cached data faster than another (by cached data I mean, for example, array of free pages etc etc)
Re: How IDs are assigned to cpu cores?
Hi,
Cheers,
Brendan
I'm wondering if the original poster meant "constant time" in the "O(1)" sense (e.g. a constant number of operations, where an operation may take a variable length of time) rather than "constant time" in the literal sense (e.g. every case as slow as the worst case).bluemoon wrote:There are many factors affecting the actual timing, including temperature, hyper-threading, etc; you can not assure they are in same speed even they are executing the same instructions.nop wrote:And I need this to be fast and constant in time for all the cores: I cannot permit a cpu to access cached data faster than another (by cached data I mean, for example, array of free pages etc etc)
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: How IDs are assigned to cpu cores?
Since I make no assumptions and my LAPIC id is 32 bits wide (x2APIC mode enabled) I'm taking care of the worst case: luckily I don't have to redesign the whole kernel (in part because I don't have a whole kernel ) but I have two options:bluemoon wrote:For small quantity of cores, you may just assume there are 256 cores and only 4 are usable (252 non-existent cores are "faulty / not-usable")
For more core, the whole kernel should be redesigned and this lookup is the least thing you want to care.
- coding a hash table (directly addressed, memory consuming but I don't think I'll have 65k processors in the system)
- use some trick
You are perfectly right: what I meant was that a routine which wants to access per-cpu cached data array shouldn't be affected by different cpu id lookup timesbluemoon wrote:There are many factors affecting the actual timing, including temperature, hyper-threading, etc; you can not assure they are in same speed even they are executing the same instructions.
For the algorithm I'm thinking of, I intend "constant time" in both O(1) and constant execution timeBrendan wrote:I'm wondering if the original poster meant "constant time" in the "O(1)" sense (e.g. a constant number of operations, where an operation may take a variable length of time) rather than "constant time" in the literal sense (e.g. every case as slow as the worst case).
OS development is the intelligent alternative to drugs
Re: [SOLVED] How IDs are assigned to cpu cores?
In my kernel, CPU uses its LAPIC ID for self-identification only on startup, when it does not really matter if it's O(1), O(n) or O(whatever). It uses other means later on.
I used to store a pointer to CPU-specific data structure at the bottom of the kernel stack. And pointing kernel stack entrypoint (the value of stack pointer in TSS) just above that. Since my stack is fixed to single 4 KiB page, it takes a simple math to round up the value of stack pointer to the next page and then use the previous __SIZEOF_POINTER__ bytes as a pointer to CPU-specific data. Works nice, but requires updating every time you switch stacks. Not a big deal since you need to update a few values (like setting kernel stack in TSS) before returning to usermode anyway. However, the thing I didn't like there was that I had to build a "one single struct" that holds all the necessary values. I do not like to cast types much, so this struct required a lot of #includes, sometimes leading to circular references. But if you're not concerned about that - it's the simplest way to go.
Since then, however, I have moved to use ABI-specific TLS (Thread Local Storage). The one provided by GCC's attribute __thread. If a (global or class’s static) variable is marked by that attribute, the compiler generates an ABI-specific code where the values are independent for each thread. For i386 or amd64 architectures it means that the variable is accessed by using GS or FS segment registers. For example, accessing the first variable boils down to mov %gs:0x0,%eax (for i386) or mov (%fs:0x0,%rax) (for x86_64) respectively. ARM architecture uses a function call to __aeabi_read_tp and then resolving the pointer from there (I will skip the details for now).
Now, it's your call to manage GS and FS offsets. For i386 I modified the GDT and added a base GS entry for each core right after the the TSS entry. This way the GS can be loaded by few instructions:
For x86_64 ABI uses FS segment in a similar way. The idea, however, is to store the relevant values in MSR KernelGS and then restoring it into necessary registers when needed.
Managing an extra entries in the GDT or CPU's MSRs may look like an extra hassle, but in the end they works out pretty well. And if you're about to support the TLS in your userspace programs - it is well worth to checking out.
I used to store a pointer to CPU-specific data structure at the bottom of the kernel stack. And pointing kernel stack entrypoint (the value of stack pointer in TSS) just above that. Since my stack is fixed to single 4 KiB page, it takes a simple math to round up the value of stack pointer to the next page and then use the previous __SIZEOF_POINTER__ bytes as a pointer to CPU-specific data. Works nice, but requires updating every time you switch stacks. Not a big deal since you need to update a few values (like setting kernel stack in TSS) before returning to usermode anyway. However, the thing I didn't like there was that I had to build a "one single struct" that holds all the necessary values. I do not like to cast types much, so this struct required a lot of #includes, sometimes leading to circular references. But if you're not concerned about that - it's the simplest way to go.
Since then, however, I have moved to use ABI-specific TLS (Thread Local Storage). The one provided by GCC's attribute __thread. If a (global or class’s static) variable is marked by that attribute, the compiler generates an ABI-specific code where the values are independent for each thread. For i386 or amd64 architectures it means that the variable is accessed by using GS or FS segment registers. For example, accessing the first variable boils down to mov %gs:0x0,%eax (for i386) or mov (%fs:0x0,%rax) (for x86_64) respectively. ARM architecture uses a function call to __aeabi_read_tp and then resolving the pointer from there (I will skip the details for now).
Now, it's your call to manage GS and FS offsets. For i386 I modified the GDT and added a base GS entry for each core right after the the TSS entry. This way the GS can be loaded by few instructions:
Code: Select all
str %eax
add $8, %eax
mov %ax, %gs
Code: Select all
mov $0xc0000102, %rcx // Read KernelGS
rdmsr
mov $0xc0000100, %rcx // Set FS.base
wrmsr
If something looks overcomplicated, most likely it is.
Re: [SOLVED] How IDs are assigned to cpu cores?
I had very similar ideas!!!Velko wrote:
Since then, however, I have moved to use ABI-specific TLS (Thread Local Storage). The one provided by GCC's attribute __thread. If a (global or class’s static) variable is marked by that attribute, the compiler generates an ABI-specific code where the values are independent for each thread. For i386 or amd64 architectures it means that the variable is accessed by using GS or FS segment registers. For example, accessing the first variable boils down to mov %gs:0x0,%eax (for i386) or mov (%fs:0x0,%rax) (for x86_64) respectively. ARM architecture uses a function call to __aeabi_read_tp and then resolving the pointer from there (I will skip the details for now).
Now, it's your call to manage GS and FS offsets. For i386 I modified the GDT and added a base GS entry for each core right after the the TSS entry. This way the GS can be loaded by few instructions:For x86_64 ABI uses FS segment in a similar way. The idea, however, is to store the relevant values in MSR KernelGS and then restoring it into necessary registers when needed.Code: Select all
str %eax add $8, %eax mov %ax, %gs
Managing an extra entries in the GDT or CPU's MSRs may look like an extra hassle, but in the end they works out pretty well. And if you're about to support the TLS in your userspace programs - it is well worth to checking out.Code: Select all
mov $0xc0000102, %rcx // Read KernelGS rdmsr mov $0xc0000100, %rcx // Set FS.base wrmsr
I'm going to use the SWAPGS instruction on kernel entry which was introduced for this purpose.
OS development is the intelligent alternative to drugs