Spinlocks that disable interrupts...

Colonel Kernel · Post by **Colonel Kernel** » Thu Mar 31, 2005 12:57 am

Another strange question.

I've noticed two different approaches to implementing spinlock primitives that disable interrupts while the lock is held.

NT and Linux return the old interrupt state to the caller on acquire (KeAcquireSpinlock and spin_lock_irqsave, respectively), and the caller must pass that state back in on release. In NT, the state is the IRQL and in Linux x86 the state is the old EFLAGS contents, but they basically achieve the same thing.

QNX, on the other hand, doesn't return anything from InterruptLock and doesn't require that anything be passed to InterruptUnlock, except for the spinlock itself of course.

I'm designing my Lock right now and I'm curious about the advantages/disadvantages of the two approaches. I don't see why I wouldn't just store the old EFLAGS in the lock structure itself while the lock is being held, rather than return it to the caller and rely on the caller to pass it back in properly later. The only reasons I can think of for doing this are performance-related (trying to avoid unnecessary bus traffic if the lock exists in several CPU caches, maybe?), unless I'm missing something.

Thoughts/opinions?

distantvoices · Post by **distantvoices** » Thu Mar 31, 2005 2:05 am

I suppose that's good for the multiprocessor thing - and the two of them being monolithic?

Other than that I can't imagine why one should save off eflags upon acquiring a spinlock.

Stay safe & sorry if I don't sound too elucidating ];->

Pype.Clicker · Post by **Pype.Clicker** » Thu Mar 31, 2005 3:27 am

Let's say you have 2 processors and each of them want to access a critical region to perform stuff. Processor A is doing that in response to an IRQ (thus IF=0) and processor B is doing that on behalf of a system call (thus IF=1)

Both processors should try to clear IF so that nothing preempts the system while in the critical region. That being done, we only have to worry about avoiding the CPUs to enter the critical region while the other is within. A memory location (used properly) can tell if someone already "owns" the critical region and the other CPU will do busy-waiting for the region to be exitted (assuming that the other CPU will be done quickly).

Once done, the cpu should restore the "previous state" of IF. since all cpus clear the IF *before* trying to enter the critical region, the "lock" is not a safe place to store this value, so the easiest place to store it is on the caller's stack.

Colonel Kernel · Post by **Colonel Kernel** » Thu Mar 31, 2005 8:20 am

Pype.Clicker wrote: Once done, the cpu should restore the "previous state" of IF. since all cpus clear the IF *before* trying to enter the critical region, the "lock" is not a safe place to store this value, so the easiest place to store it is on the caller's stack.

It can be safe IMO... there's no reason it needs to be in the stack frame of whoever called acquire(). Here's some very C-ish pseudocode:

Code: Select all

void acquire( Lock* lock )
{
    uint32_t oldEFLAGS = disableInterrupts();
    spin( &lock->flag ); // Spin until you've got the lock...
    lock->oldEFLAGS = oldEFLAGS; // This is fine, you've got the lock.
}

void release( Lock* lock )
{
    uint32_t oldEFLAGS = lock->oldEFLAGS;
    unlock( &lock->flag ); // oldEFLAGS is saved, so this is ok.
    restoreFlags( oldEFLAGS ); // Does popf and all that...    
}

Colonel Kernel · Post by **Colonel Kernel** » Sat Apr 02, 2005 1:25 am

Here's another spinlock question -- has anyone heard of queued spinlocks? This is the first I've heard of them.

Does anyone have any experience with them? Do they help a lot with scalability, or are they more trouble than they're worth...?

Pype.Clicker · Post by **Pype.Clicker** » Sat Apr 02, 2005 3:39 am

hmm ... so they have a queue of CPU identifiers with the spinlock ... That looks more and more like a semaphore to me. Honnestly, i don't know if signalling spinlock release on a per-cpu variable really helps.
Since you now have a 'read-only' waiting loop, that may indeed reduce the amount of LOCK and WRITE cycles on the system bus ... My guess is that the real performance improvement will depend on how frequent actual conflict will occur against the overhead of getting/releasing the resource when noone else wants it.

Their claims lack clear performance tests (or references to these tests) to have any scientific impact. And claiming that "we now scale better to multiple processors with Win2K" while NT4.0 license prevents you to run it on more than 2 processors on a SMP system sounds like an april fool joke to me

Code: Select all

    r0 = processor_id;
    r1 = 1;
    cli
    xchg [spinlock],r1
    cmp r1,1
    jne got_it;
    ;; here, we have to enqueue. but how will we be sure that 
    ;; we enqueue *safely* ? we need a spinlock to protect the queue length or something ...
try_again:
    cmp [processors_signal+r0],1
    jne try_again
    mov [processors_signal+r0],0
got_it:

If your 'critical region' is long enough to have a significant probability of concurrent access, then using hardware-assisted IPI may be more paying. You then have a 'real' semaphore and direct a message towards the next requestor. Locked CPUs will be in a 'halt' state, somehow, that only IPI could re-activate.

(yes, i'm working on local APIC and MSRs atm.

mystran · Post by **mystran** » Sat Apr 02, 2005 8:22 am

It might not be a bad idea to reduce the amount of spinlock protected code as much as possible. Most of the time you can just as well use a short spinlock to implement a real mutex, and use the real mutex to protect actual code. Yes, this might require some other changes to the design, but...

Optimizing spinlock implementation is the wrong way.

Pype.Clicker · Post by **Pype.Clicker** » Sat Apr 02, 2005 9:26 am

yes. Using semaphores instead of spinlocks for large stuff would be my preferred approach too.

Colonel Kernel · Post by **Colonel Kernel** » Sat Apr 02, 2005 9:55 am

According to that article, they actually use these queued spinlocks to protect the scheduler database and important MM structures... things where the lock wouldn't be held for very long, I would think.

In terms of adding to the queue atomically, my guess is they're using some kind of lock-free algorithm to do that... The workings of lock-free queues are still beyond my understanding right now though. ???

OSDev.org

Spinlocks that disable interrupts...

Spinlocks that disable interrupts...

Re:Spinlocks that disable interrupts...

Re:Spinlocks that disable interrupts...

Re:Spinlocks that disable interrupts...

Re:Spinlocks that disable interrupts...

Re:Spinlocks that disable interrupts...

Re:Spinlocks that disable interrupts...

Re:Spinlocks that disable interrupts...

Re:Spinlocks that disable interrupts...