[solved]lock cmpxchg didn't happen atomically?
Posted: Sun Jan 17, 2021 10:18 pm
solved, bug is elsewhere (described in this thread, but probably not of much interest as it is a "logic error" in the upper level), lock cmpxchg is doing what it should do, original post below:
I'm pretty sure it's my bug or misunderstanding of something, yet it looks like "lock cmpxchg" didn't happen atomically from 2 CPUs.
Context:
It's a simple reader/writer lock (lock for my inode cache) implemented using 1 DWORD, different states represented by values:
// 0xFFFFFFFF: write locked
// 1 to 0xFFFFFFFE: read locked, the number is ref count
// 0: not locked
Atomic instructions are used to modify the values.
Yet I was able to get this sequence to happen on 2 CPUs :
As you can see both CPU 0 and CPU 1 think it has successfully incremented the value at index 2 from 0 to 1
Code doing the increment (read lock, which is what both the CPU ran between the pair of prints) is below. I can't seem to see what's wrong with it but something has to be off.
It is also quite ugly (especially the part that it reads in the value from *ecx and then having to do both copy and increment) so would also like some suggestions on improving.
r is pointer to the lock.
_R_LOCK_MAX is 0xFFFFFFFE, but both CPUs should have blasted through the initial spin part as the value is so far from reaching it anyway.
I'm pretty sure it's my bug or misunderstanding of something, yet it looks like "lock cmpxchg" didn't happen atomically from 2 CPUs.
Context:
It's a simple reader/writer lock (lock for my inode cache) implemented using 1 DWORD, different states represented by values:
// 0xFFFFFFFF: write locked
// 1 to 0xFFFFFFFE: read locked, the number is ref count
// 0: not locked
Atomic instructions are used to modify the values.
Yet I was able to get this sequence to happen on 2 CPUs :
Code: Select all
CPU 1: before lock 2, val 0
CPU 1: after lock 2, val 1
CPU 0: before lock 2, val 0
CPU 0: after lock 2, val 1
Code doing the increment (read lock, which is what both the CPU ran between the pair of prints) is below. I can't seem to see what's wrong with it but something has to be off.
It is also quite ugly (especially the part that it reads in the value from *ecx and then having to do both copy and increment) so would also like some suggestions on improving.
r is pointer to the lock.
_R_LOCK_MAX is 0xFFFFFFFE, but both CPUs should have blasted through the initial spin part as the value is so far from reaching it anyway.
Code: Select all
asm volatile ("_r_keep_spinning: \n\
movl (%%ecx), %%ebx \n\
movl %%ebx, %%eax \n\
addl $1, %%ebx \n\
cmp $" EXSTR(_R_LOCK_MAX) ", %%eax \n\
jb _try_get_r_lock \n\
pause \n\
jmp _r_keep_spinning \n\
_try_get_r_lock: \n\
lock cmpxchg %%ebx, (%%ecx) \n\
jne _r_keep_spinning "
:
: "c"(r)
: "eax", "ebx", "memory", "cc");