OSDev.org

Posted: **Sat May 07, 2011 10:02 am**

OK, so now I have both versions of the code debugged and tested.

Using the private GDTR per core gives the following code to lock the scheduler:

Code: Select all

    push 40h
    pop fs
    add fs:ps_nesting,1

Using TR gives the following code to lock the scheduler:

Code: Select all

    push ax
    cli
    str ax
    add ax,10h
    mov fs,ax
    add fs:ps_nesting,1
    pop ax
    sti

To get the current thread looks like this:

Using private GDTR:

Code: Select all

    push ds
    mov ax,40h
    mov ds,ax
    mov ax,ds:ps_curr_thread
    pop ds

Using TR:

Code: Select all

    push ds
    cli
    str ax
    add ax,10h
    mov ds,ax
    mov ax,ds:ps_curr_thread
    sti
    pop ds

I think the GDTR method outcompetes the TR method easily, and it additionally does not require disabling interrupts.

Posted: **Sat May 07, 2011 10:56 am**

Hi,

The most efficient method would be:

Code: Select all

    add ps_nesting,1

Unfortunately that requires you to patch the paging structures during task switches, such that each CPU uses a (slightly) different version of "kernel space". This only works efficiently in some cases (PAE or long mode, or if you're using one virtual address space per thread/CPU anyway).

The second most efficient method would be:

Code: Select all

    add gs:ps_nesting,1

In this case, GS is treated as a constant. If malicious code could attempt to set GS to something else then you have to make sure the "add gs:ps_nesting,1" will cause some sort of exception where the exception handler corrects GS and returns. This makes the "common case" fast (at the expense of making the "worst case that should never happen" slower).

The third most efficient method would be:

Code: Select all

    add gs:ps_nesting,1

In this case you use the "swapgs" instruction every time the CPU switches from CPL=3 to CPL=0, and then restore the user-space value of GS when the CPU returns to CPL=3 from CPL=0. Unfortunately it only works in long mode.

The fourth most efficient method would be to use one of the debugging registers. For example:

Code: Select all

    pushfd
    cli
    mov eax,dr3
    add dword [eax+ps_nesting],1
    popfd

This means you're only able to use 3 of the 4 debugging registers for debugging.

rdos wrote:I think the GDTR method outcompetes the TR method easily, and it additionally does not require disabling interrupts.

The cost of loading a segment register is too high - the GDTR method is is probably tenth on the list of "most efficient ways".

Cheers,

Brendan

Posted: **Sat May 07, 2011 11:30 am**

Yes, but I was primarily comparing using TR with using GDTR. The other methods (always having some segment register point to the core private data) are not applicable in my case (but might be in other designs), and additionally are dangerous as malious / erranous code could accidently modify the segment register. Using an hardware debug-register is not an option for most designs either, as these are scarce resources. It wouldn't help in a segmented design either as there is always a need to reload an segment register no matter what (even if the address is flat, there is a need to load a flat selector into some segment register).

OSDev.org

Optimal method to identify core private data

Optimal method to identify core private data

Re: Optimal method to identify core private data

Re: Optimal method to identify core private data