Hi,
Continued from:
Brendan wrote:Please note that there is more involved in designing efficient spinlocks than I've described above - I still haven't discussed lock fairness or cache thrashing prevention (or instrumenting locks to detect bugs and/or heavy contention). These issues are better left until after you've got "slightly less simple" spinlocks working.
But first; there's one thing I didn't mention with the last spinlock (the one that disables IRQs). If you acquire one lock it'll work fine, but if you then need to acquire a second lock then the second lock will enable IRQs while it's spinning. This isn't so good, because often you do need to acquire 2 or more locks. Also, it's possible that IRQs were disabled for some other reason before you acquire the first lock.
To fix that you need to store the previous state of EFLAGS somewhere, and then restore the previous EFLAGS when you release the lock:
Code: Select all
pushfd ;Put previous EFLAGS onto the stack
cli ;Disable IRQs (in the hope that we do acquire the lock)
lock bts [myLock],0 ;Optimistically set bit 0 of "myLock", and return the previous value in the carry flag
jnc .acquired ;Lock was acquired is bit 0 was previously clear
.retry:
popfd ;Restore previous EFLAGS (enable IRQs if they were enabled before)
pushfd ;Put previous EFLAGS onto the stack again
pause ;Don't waste CPU resources
bt [myLock],0 ;Test if bit 0 of "myLock" is already set
jc .retry ;Retry (without doing the more expensive "lock bts [myLock],0") if bit 0 was set
cli ;Disable IRQs (in the hope that we do acquire the lock)
lock bts [myLock],0 ;Set bit 0 of "myLock", and return the previous value in the carry flag
jc .retry ;Retry if bit 0 was already set
.acquired:
mov eax,[esp] ;eax = previous EFLAGS
To release the lock, you'd do something like this:
Code: Select all
mov dword [myLock],0 ;Clear the lock
push eax ;Put previous EFLAGS (still in EAX from acquiring the lock) onto stack
popfd ;Restore previous EFLAGS (enable IRQs if they were enabled before)
Now, this is quite good if there's a only a very small chance of lock contention. However, it'd be nice if you can find out if there's contention or not (especially if you're testing on a system with only 8 CPUs and you're hoping it'll be OK on systems with a lot more CPUs where the chance of contention is a lot higher).
Fortunately, we're only actually using one bit for the lock. This means that we can use the other 31 bits to keep track of the number of times you had to spin. By reading the "contention counter", putting the OS under load and then reading the counter again after a while you can determine how much contention there was. Of course this will add a little overhead, and usually you're not testing lock contention and don't need to overhead, so it's probably a good idea to use conditional code.
For this, you don't want to wipe out the counter when the lock is released, so you'd want to release the lock like this:
Code: Select all
%ifdef CONTENTION_TEST
and dword [myLock],0xFFFFFFFE ;Clear the lock bit (without effecting the counter)
%else
mov dword [myLock],0 ;Clear the lock
%endif
push eax ;Put previous EFLAGS (still in EAX from acquiring the lock) onto stack
popfd ;Restore previous EFLAGS (enable IRQs if they were enabled before)
The spinlock becomes:
Code: Select all
pushfd ;Put previous EFLAGS onto the stack
cli ;Disable IRQs (in the hope that we do acquire the lock)
lock bts [myLock],0 ;Optimistically set bit 0 of "myLock", and return the previous value in the carry flag
jnc .acquired ;Lock was acquired is bit 0 was previously clear
%ifdef CONTENTION_TEST
lock add [myLock],2 ;Add 1 to the counter (where bit 0 of the counter starts at bit 1 of "myLock")
%endif
.retry:
popfd ;Restore previous EFLAGS (enable IRQs if they were enabled before)
pushfd ;Put previous EFLAGS onto the stack again
pause ;Don't waste CPU resources
bt [myLock],0 ;Test if bit 0 of "myLock" is already set
jc .retry ;Retry (without doing the more expensive "lock bts [myLock],0") if bit 0 was set
cli ;Disable IRQs (in the hope that we do acquire the lock)
lock bts [myLock],0 ;Set bit 0 of "myLock", and return the previous value in the carry flag
jc .retry ;Retry if bit 0 was already set
.acquired:
mov eax,[esp] ;eax = previous EFLAGS
To read the counter you'd do something like:
Code: Select all
mov eax,[myLock] ;eax = (contention_counter << 1) | lock_bit
shr eax,1 ;eax = contention_counter
One of the annoying things about writing code that needs locks is that it's easy to make mistakes, and when you do make a mistake it can be virtually impossible to find the problem. For a simple example, if you release a lock by accident then everything may seem to work fine, but occasionally the data that's protected by the lock might be mysteriously corrupted.
Fortunately, it's relatively easy to detect some of the common bugs and avoid excruciating pain. For example, when you release the lock it's trivial to test if the lock was acquired first, and do some sort of kernel panic if your code ever tries to release a lock that wasn't acquired:
Code: Select all
lock btr [myLock],0 ;Clear the lock bit (without effecting the contention counter), and get the previous lock bit
jc .wasLocked ;Don't do kernel panic if the lock was acquired
;** Insert code to do kernel panic here! **
.wasLocked:
push eax ;Put previous EFLAGS (still in EAX from acquiring the lock) onto stack
popfd ;Restore previous EFLAGS (enable IRQs if they were enabled before)
This is better than nothing; but we can do better. If you get a kernel panic you may not know if the CPU that detected the problem actually caused the problem or if some other CPU caused the problem, you'll just know something is wrong. The other problem is that it's possible to not detect the problem at all. For example, if a piece of code releases the lock and then acquires it (instead of acquiring it and then releasing it).
For something better, you can store a 31-bit "CPU number" telling you which CPU is using the lock; and check that the CPU that releases the lock is the same CPU that acquired the lock. In this case, the code to release the lock might look like this:
Code: Select all
mov ebx,0
xchg [myLock],ebx ;"myLock = 0, ebx = (actual_CPU_number << 1) | lock_bit
lea ecx,[ecx*2+1] ;ecx = (expected_CPU_number << 1) | lock_bit
cmp ebx,ecx ;Was everything correct?
je .wasCorrect ; yes, don't do the kernel panic
;** Insert code to do kernel panic here! **
.wasCorrect:
push eax ;Put previous EFLAGS (still in EAX from acquiring the lock) onto stack
popfd ;Restore previous EFLAGS (enable IRQs if they were enabled before)
And the code to acquire the lock:
Code: Select all
lea ecx,[ecx*2+1] ;ecx = (CPU_number << 1) | lock_bit
pushfd ;Put previous EFLAGS onto the stack
cli ;Disable IRQs (in the hope that we do acquire the lock)
mov eax,0
lock cmpxchg [myLock],ecx ;If myLock = eax (zero), acquire the lock and set CPU number
je .acquired ;Lock was acquired
.retry:
popfd ;Restore previous EFLAGS (enable IRQs if they were enabled before)
pushfd ;Put previous EFLAGS onto the stack again
pause ;Don't waste CPU resources
cmp dword [myLock],0 ;Test if myLock = 0
jne .retry ;Retry (without doing the more expensive "lock cmpxchg") if myLock wasn't 0
cli ;Disable IRQs (in the hope that we do acquire the lock)
mov eax,0
lock cmpxchg [myLock],ecx ;If myLock = eax (zero), acquire the lock and set CPU number
jne .retry ;Retry if myLock wasn't 0
.acquired:
mov eax,[esp] ;eax = previous EFLAGS
Note: If you want to do this and have a contention counter at the same time, you can use a pair of 32-bit values for the lock, or use (e.g.) 16 bits for a CPU number and 15 bits for the contention counter.
Another common bug with locking is forgetting to release the lock. This isn't really a bad bug (the OS will lock up, making it easy to notice there's a problem); but it'd be much nicer to get better details if/when it happens (to make finding/fixing the bug easier). There's also another problem that is much harder to detect - code that holds the lock for far too long (even without contention it can be a problem, especially if IRQs are disabled while a lock is held). For this, when acquiring the lock you'd only need to get the current time from somewhere (e.g. TSC) and have a time-out (by checking the time while you're spinning to see if the time out has expired). If you're storing a "CPU number" in the lock then (if the time out expires) you can report which CPU hasn't released the lock quickly enough. You could also store the address of the code that acquired the lock and report that too.
If you're getting the time anyway, you could also use it to improve the contention counter. For example, you could keep track of the "worst case so far" amount of time spent spinning, and/or the total amount of time all CPUs have spent waiting for the lock (and combined with the contention counter, calculate the average amount of time spent spinning when there's contention).
It's important not to forget that this type of spinlock is only really good if there's a very small chance of lock contention. Adding "conveniences" to detect bugs and performance problems does add a little overhead when you're spinning, but that doesn't matter because you shouldn't be spinning much to start with.
Please note that there is still more involved in designing efficient spinlocks than I've described above - I still haven't discussed lock fairness or cache thrashing prevention. These things become important when you expect some lock contention (and are the reason why the spinlocks I've shown so far are only good if there's a very small chance of lock contention).
Cheers,
Brendan