OSDev.org

Posted: **Wed Apr 13, 2011 9:29 pm**

The title pretty much says it.

Posted: **Wed Apr 13, 2011 9:49 pm**

Hi,

TylerH wrote:How do you tell what processor/core your code is running on?

On boot, the BIOS classifies one logical processor as the BSP, Boot Strap Processor. BIOS runs on this BSP and so does your bootloader. The other processors, termed as Application Processors (APs) are left in a halted state.

If you want to know which core you are running on, you can parse the MPS/ACPI(MADT) tables and then compare each APIC ID with the APIC ID of the processor you are running on. Thus you would know which core your code is running on.

I didn't know exactly what you wanted to know with that question, so if you need more detailed explanation, please say so.

Regards,
Shikhin

Posted: **Wed Apr 13, 2011 9:56 pm**

The point is to tell processors from each other in the scheduler. Your answer is the exact amount of detail I needed. Thanks.

Posted: **Wed Apr 13, 2011 10:21 pm**

Hi,

TylerH wrote:The point is to tell processors from each other in the scheduler. Your answer is the exact amount of detail I needed. Thanks.

Heh, forgot one thing. If you are using the MPS tables (ACPI isn't present), it even has a BSP bit in each entry to specify which one is the BSP. Moreover, x2APIC also has a BSP bit in the IA32_APIC_BASE MSR.

EDIT: Seems like I am forgetting many a things, while typing the post.

You should even build topology information using CPUID to differentiate Logical Processors(HT)/Cores/Processors.

Regards,
Shikhin

Posted: **Thu Apr 14, 2011 1:58 am**

You read its APIC ID. This is how I do it in my scheduler. The APIC ID can then be used to lookup on a core structure that keeps core-specific parameters. That core parameters themselves are built when starting up AP cores, which can use ACPI/MP tables in order to know which cores are present.

Posted: **Thu Apr 14, 2011 3:26 pm**

You may be able to 'hide' the ID in the Task Register of each core. It's nice to number the cores 0 to N-1 and not all machines number their cores that way. It's convenient to number the cores yourself.

At boot each core must write a value to its TR to point to the TSS - like this

Code: Select all

      wr_tr(0x70 + id * 0x10);

So each core when entering the kernel calculates its ID by reading its TR.

Code: Select all

      id = rd_tr() - 0x70 >> 4;

Posted: **Thu Apr 14, 2011 9:14 pm**

There's always an option of just caching it in a static static data segment as well. For instance you could just program the GDT for every processor to use a given segment index as essentially a pointer to a processor specific structure. On a long mode computer the same thing can be accomplished with the KERNEL_GS_BASE Msr, personally this is what I use for caching information about a specific processor. That way you can store fairly large amount of processor specific data at a known location and access it as a series for offsets to a segment register. Using the TR trick is probably quicker for just calculating an APIC id, but you can avoid traversing a lot of data structures. I know for example that a processor's heap manager will always lie at [gs:16], it's scheduler [gs:40], it's APIC id at [gs:32], etc.

Posted: **Thu Apr 14, 2011 11:01 pm**

gerryg400 wrote:You may be able to 'hide' the ID in the Task Register of each core. It's nice to number the cores 0 to N-1 and not all machines number their cores that way. It's convenient to number the cores yourself.

At boot each core must write a value to its TR to point to the TSS - like this
Code: Select all
      wr_tr(0x70 + id * 0x10);
So each core when entering the kernel calculates its ID by reading its TR.
Code: Select all
      id = rd_tr() - 0x70 >> 4;

The use of the TR register is interesting. I would do it like this though:

reading processor block:

Code: Select all

    str ax
    add ax,8
    mov fs,ax

Initialization code would first allocated two consequtive selectors, and initialize the first to an Task selector segment and the second to the processor data selector. It would then load TR with the task segment.

Posted: **Fri Apr 15, 2011 12:24 am**

There are a few issues that argues against the use of TR to identify processor/core ID:

* TR points to a TSS that contains LDT, CR3, SS0/ESP0 and IO permission bitmap.
* Certain faults must be handled with a TSS (stack fault and/or double fault)

The LDT and CR3 contents must be loaded manually with each task-switch when using software taskswitching, but they are still accessed in the TSS under certain cirucumstances. On such is on return from a TSS based fault handler. It is possible to patch the processor TSS before doing the return from the fault handler, but it is a complication. Alternatively, each task-switch could update these fields in the processor TSS from the new thread control block.

The issue for SS0/ESP0 is worse. This field must be loaded for each task-switch if TR does not contain the TSS of the current thread.

The IO permission bitmap might be a minor issue, but I use this actively to protect certain IO-ports from application accesses (and sometimes virtualize them). However, the BIOS mode-set (which executes at ring 3), has a different IO permission bitmap in order for it to be able to access any port without virtualization. If TR does not contain the current threads TSS, there will be some required changes to this logic. Perhaps certain threads only (like the mode-set thread) could load TR while most threads would not.

The issue with TSS-based fault handlers also means that TR will sometimes not contain the current processors TSS. This too might be possible to solve (like making sure processor ID is never read while these handlers are running).

If I would use this, I'd revise the logic like this:

Code: Select all

    str ax
    cmp ax,CORE_TSS_MIN
    jb read_real
    cmp ax,CORE_TSS_MAX
    jae read_real
    add ax,8
    mov fs,ax
    jmp read_done

read_real:
    GetApicId
    FindProcessor

read_done:

This code is probably not much faster than to read the APIC ID directly.

Posted: **Fri Apr 15, 2011 12:58 am**

rdos wrote:* Certain faults must be handled with a TSS (stack fault and/or double fault)

The same faults usually demand a panic and system halt. Even then, the old TR can be found in the new TSS' link field.

Posted: **Fri Apr 15, 2011 2:08 am**

Combuster wrote:
rdos wrote:* Certain faults must be handled with a TSS (stack fault and/or double fault)
The same faults usually demand a panic and system halt. Even then, the old TR can be found in the new TSS' link field.

Not so. Both the stack fault and the double fault are localized to a single thread, and thus need not be panics or system halts. They could occur because of stack overruns in kernel on behalf of application APIs. It is only when stack faults happens in scheduler that they must be considered as panics or system halts.

One of the drawbacks of software task-switching is the handling of stack-faults in kernel. The ideal method of handling stack faults in an application is with a trap gate. This is especially so for SMP-systems that can only have a single TSS per fault handler, and if more than one core hits a TSS-based stack fault handler at the same time, double fault (and possibly tripple fault) will happen. The problem with using a trap gate for stack fault is that if a stack fault occurs in kernel, and the stack fault handler is a trap gate, the processor might issue a double fault instead. If the double-fault handler is also a trap gate, the result is a tripple fault, and an uncontrolled system RESET (if tripple-fault detect logic is present). Therefore, in order to get useful information about stack faults in kernel, the double fault handler must be a TSS.

With hardware task-switching, it is more convinient to handle stack faults with a TSS, but the whole concept of hardware task-switching has severe problems with SMP.

Posted: **Fri Apr 15, 2011 2:25 am**

I have another idea of how to solve this that does not have any (?) complications at all. The idea is simple. Each core will have its own GDTR setting. Most of the GDT will be shared by the use of paging, but the first few entries will be unique. GDT entry 8 could then be linked to the private storage of a core. The GDT will then start 16 bytes before the page-limit, and each core will have a private page that covers two GDT entries. The first is the null selector, and the second is GDT entry 8 for core private storage. The rest of this page could be resued (and mapped) to GDT entry 8, which would then be able to contain up to 4080 bytes of private storage.

This does not intervene with any segmentation, TSS or other issue that I know of. There is usually no reason to change the base of the GDTR. The only drawback I can see is that it will use up to 64K of linear address space per core in the kernel address space as the GDT needs to be mapped in its full length to the linear address space of each core. I can also see some issues with setting this up, as the GDTR needs to be defined very early in the boot process (I think I use fixed linear addresses for it), but if the initial setup is aligned correctly (it starts 16 bytes before the page limit) it should be straight-forward to just copy the pages in the GDT at a later stage when processor ID support is initialized.

The logic to get to the core private data is simple and works on both SMP and non-SMP systems equally well:

Code: Select all

   mov ax,8
   mov ds,ax

Having checked this a little more in RDOS, I can see some other complications, but also that it didn't take more than an hour to implement the basics. In my design, 8 GDT entries are used by the boot-loader (which means these cannot be re-arranged). Thus, the core selector needs to be 40h. OTOH, it was no problem to re-align the GDT as it's linear address in kernel-space is defined in an include-file. For the bootstrap core GDT, thus, it is easy to set things up. It can be kept in the old place, and the only thing that is required is to create selector 40h by mapping it to a fixed linear address. Intiailizing AP cores should be relatively easy as well. The GDTR base & size are already written out to fixed locations when the AP boots. All that is required here is to provide an aliased GDT with the 40h selector set to the correct address.

EDIT: I've finalized this now. The code to get current processor private data and to get current executing thread is now super-simple (and fast). In fact, just as fast as it once were before the time of adding SMP-support. I've tested it on single-core systems, but not on multi-core systems yet.

Posted: **Fri Apr 15, 2011 11:29 am**

For a 64-bit OS (which allows interrupts inside the kernel):

Code: Select all

    // Check if we are coming from the kernel
    testw $CS_RPL_MASK, INTERRUPT_FRAME_CS_OFFSET(%rsp)
    jz 1f

    // Not from the kernel, so execute SWAPGS in order to gain access to the kernel's thread control block
    swapgs
1: // <and so on>

then place your per-CPU information (or information leading towards it - my experience is that per-task information is accessed more often than per-cpu!) at (%gs:0).

(Obviously similar checks will be needed before performing the interrupt return, and you'll need to take care to always swapgs when entering the kernel via other routes.)

Posted: **Fri Apr 15, 2011 2:23 pm**

Owen wrote:then place your per-CPU information (or information leading towards it - my experience is that per-task information is accessed more often than per-cpu!) at (%gs:0).

Yes, but wasting one or two segment registers in kernel in a segmented design is totally out of the question. In a 64-bit kernel it might be ok, but not in a 32-bit segmented.

Posted: **Fri Apr 15, 2011 2:45 pm**

...swapgs is only available on K8 and later CPUs anyway, so somewhat irrelevant.

(In any case, segmented user space does not require a segmented kernel space, though they do often go hand in hand).

OSDev.org

How do you tell what processor/core your code is running on?

How do you tell what processor/core your code is running on?

Re: How do you tell what processor/core your code is running

Re: How do you tell what processor/core your code is running

Re: How do you tell what processor/core your code is running

Re: How do you tell what processor/core your code is running

Re: How do you tell what processor/core your code is running

Re: How do you tell what processor/core your code is running

Re: How do you tell what processor/core your code is running

Re: How do you tell what processor/core your code is running

Re: How do you tell what processor/core your code is running

Re: How do you tell what processor/core your code is running

Re: How do you tell what processor/core your code is running

Re: How do you tell what processor/core your code is running

Re: How do you tell what processor/core your code is running

Re: How do you tell what processor/core your code is running