I'm pretty sure it's my bug or misunderstanding of something, yet it looks like "lock cmpxchg" didn't happen atomically from 2 CPUs.
Context:
It's a simple reader/writer lock (lock for my inode cache) implemented using 1 DWORD, different states represented by values:
// 0xFFFFFFFF: write locked
// 1 to 0xFFFFFFFE: read locked, the number is ref count
// 0: not locked
Atomic instructions are used to modify the values.
Yet I was able to get this sequence to happen on 2 CPUs :
Code: Select all
CPU 1: before lock 2, val 0
CPU 1: after lock 2, val 1
CPU 0: before lock 2, val 0
CPU 0: after lock 2, val 1
Code doing the increment (read lock, which is what both the CPU ran between the pair of prints) is below. I can't seem to see what's wrong with it but something has to be off.
It is also quite ugly (especially the part that it reads in the value from *ecx and then having to do both copy and increment) so would also like some suggestions on improving.
r is pointer to the lock.
_R_LOCK_MAX is 0xFFFFFFFE, but both CPUs should have blasted through the initial spin part as the value is so far from reaching it anyway.
Code: Select all
asm volatile ("_r_keep_spinning: \n\
movl (%%ecx), %%ebx \n\
movl %%ebx, %%eax \n\
addl $1, %%ebx \n\
cmp $" EXSTR(_R_LOCK_MAX) ", %%eax \n\
jb _try_get_r_lock \n\
pause \n\
jmp _r_keep_spinning \n\
_try_get_r_lock: \n\
lock cmpxchg %%ebx, (%%ecx) \n\
jne _r_keep_spinning "
:
: "c"(r)
: "eax", "ebx", "memory", "cc");