Octocontrabass wrote:[Spin locks can be implemented using atomic_flag from <stdatomic.h>, which has been part of C since C11.
Well yes, but also no. In order for spinlocks to be useful, they really need to disable interrupts before taking the spinlock (and revert spinlocks to the prior state after release). Otherwise it is possible to take a spinlock, be interrupted, and have the interrupt handler try to take the same spinlock, deadlocking the kernel. And it is not possible to disable interrupts in C.
A variant that does not disable interrupts is possible if the spinlock is never shared with interrupt handlers, but that is an optimization and not the rule.
I prefer making small building blocks in assembler, with a portable interface. E.g. interface:
Code: Select all
unsigned long a_irqdisable(void);
void a_irqrestore(unsigned long);
int a_swap(volatile int *, int);
void a_store(volatile int *, int);
void a_spin(void);
void a_exit_spin(void);
AMD64:
Code: Select all
a_irqdisable:
pushfq
popq %rax
retq
a_irqrestore:
pushq %rdi
popfq
retq
a_swap:
lock xchgl %esi, (%rdi)
movl %esi, %eax
retq
a_store:
xorl %eax, %eax
movl %esi, (%edi)
lock cmpxchgl %eax, (%rsp) # prevent processor-side load-reorders across the store instruction above
retq
a_spin:
pause
retq
a_exit_spin:
retq
PowerPC:
Code: Select all
a_irqdisable:
mfmsr 3
rlwinm 4,3,0,~MSR_EE
# skip mtmsr instruction if possible. It is slow.
cmplw 4, 3
beq 1f
mtmsr 4
1: blr
a_irqrestore:
andi. 0, 3, MSR_EE
beq 1f
mtmsr 3
1: blr
a_swap:
sync
lwarx 5, 0, 3
stwcx. 4, 0, 3
bne- a_swap
isync
mr 3, 5
blr
a_store:
sync
stw 4, 0(3)
sync
blr
a_spin:
or 4, 4, 4
blr
a_exit_spin:
or 2, 2, 2
blr
And then it is possible to use those again to build the spinlock in C. Using external functions rather than inline asm has the benefit of creating a well-defined ABI boundary, rather than whatever inline assembler is doing. Yes, it is nice that the compiler can inline and reorder this stuff, but getting the constraints (especially the clobbers) just right is not a thing I want to waste time on. Anyway, the functions are small and easy to verify.
The spinlock code could then be something like
Code: Select all
typedef volatile int spinlock_t;
unsigned long spinlock_irqsave(spinlock_t *lock)
{
unsigned long flg = a_irqsave();
while (a_swap(lock, 1)) a_spin();
a_exit_spin();
return flg;
}
void spinunlock_irqrestore(spinlock_t * lock, unsigned long flg)
{
a_store(lock, 0);
a_irqrestore(flg);
}
C11 atomics also have the significant drawback of utilizing the C memory model. Which is fine if you want to tune it all for the best performance, but also easy to get wrong. I tend to write my atomics simply with a full memory barrier, as that is way easier to understand. Might not perform as well, but as stated in the past I take readable and understandable code that works over fast code that fails sometimes any day of the week, and twice on Sundays.