Hi,
Continued from:
Brendan wrote:Also; there is more involved in designing efficient spinlocks than I've described above (the PAUSE instruction, the "test, then atomically test and set" pattern, lock fairness and cache thrashing prevention). These issues are better left until after you've got simpler spinlocks working.
The simplest spinlock might look like this:
Code: Select all
.retry:
lock bts [myLock],0 ;Set bit 0 of "myLock", and return the previous value in the carry flag
jc .retry ;Retry if bit 0 was already set
This is simple, but bad.
Modern CPUs try to do as many instructions as they can in parallel. For small/tight loops this means that the CPU can have many iterations of the loop in progress at the same time, which is a good thing (usually).
However, for tight polling loops (e.g. "while(lock->lock);") it's a bad thing - "multiple iterations of the loop in progress at the same time" wastes CPU resources (which is especially bad when hyper-threading is involved), and also means that when the condition changes there's multiple "future iterations" in progress that need to be discarded (which slows down exiting the loop). To solve this you want a way to tell the CPU not to do multiple iterations of the loop in parallel. This is what the "PAUSE" instruction does.
Note: The "PAUSE" instruction was introduced about 10 years ago, and isn't supported on older CPUs. However, Intel were smart and recycled an encoding that was previously a "NOP" (no operation). This means that if a "PAUSE" instruction is executed on a CPU that doesn't support it, nothing goes wrong, and you can simply assume that all CPUs support the "PAUSE" instruction.
With "PAUSE" the simple spinlock might become:
Code: Select all
.retry:
pause ;Don't waste CPU resources
lock bts [myLock],0 ;Set bit 0 of "myLock", and return the previous value in the carry flag
jc .retry ;Retry if bit 0 was already set
This is still simple, but still not good.
The next problem is that the "lock bts [lock],0" is a bit expensive because it ties up the bus (or cache line); and when CPU/s are constantly retrying it's bad. Fortunately you can check if there's any hope of acquiring the lock (without the expense of doing the "test and set") and then only do the more expensive "test and set" when there is some hope of acquiring the lock. This is the "test, then atomically test and set" pattern.
With "test, then atomically test and set" pattern the spinlock might become:
Code: Select all
.retry:
pause ;Don't waste CPU resources
bt [myLock],0 ;Test if bit 0 of "myLock" is already set
jc .retry ;Retry (without doing the more expensive "lock bts [myLock],0") if bit 0 was set
lock bts [myLock],0 ;Set bit 0 of "myLock", and return the previous value in the carry flag
jc .retry ;Retry if bit 0 was already set
Now this is reasonable for when the you have to wait to acquire the lock. However, if the spinlock can be acquired immediately (no contention) then it's a bit slower that it could be. Often a spinlock can be acquired without waiting, so we should improve the lock to be more optimistic:
Code: Select all
lock bts [myLock],0 ;Optimistically set bit 0 of "myLock", and return the previous value in the carry flag
jnc .acquired ;Lock was acquired is bit 0 was previously clear
.retry:
pause ;Don't waste CPU resources
bt [myLock],0 ;Test if bit 0 of "myLock" is already set
jc .retry ;Retry (without doing the more expensive "lock bts [myLock],0") if bit 0 was set
lock bts [myLock],0 ;Set bit 0 of "myLock", and return the previous value in the carry flag
jc .retry ;Retry if bit 0 was already set
.acquired:
As far as your basic spinlock goes, this isn't that bad. However, what if you want to ensure that interrupts are disabled when the lock is acquired? This sounds easy (or at least it should), but it's a little trickier than it sounds - if you need to spin for ages, then you do not want interrupts to be disabled while you're spinning.
Here's an example of disabling interrupts properly:
Code: Select all
cli ;Disable IRQs (in the hope that we do acquire the lock)
lock bts [myLock],0 ;Optimistically set bit 0 of "myLock", and return the previous value in the carry flag
jnc .acquired ;Lock was acquired is bit 0 was previously clear
.retry:
sti ;Enable IRQs (because we failed to acquire the lock)
pause ;Don't waste CPU resources
bt [myLock],0 ;Test if bit 0 of "myLock" is already set
jc .retry ;Retry (without doing the more expensive "lock bts [myLock],0") if bit 0 was set
cli ;Disable IRQs (in the hope that we do acquire the lock)
lock bts [myLock],0 ;Set bit 0 of "myLock", and return the previous value in the carry flag
jc .retry ;Retry if bit 0 was already set
.acquired:
That's probably enough to think about for now!
Please note that there is more involved in designing efficient spinlocks than I've described above - I still haven't discussed lock fairness or cache thrashing prevention (or instrumenting locks to detect bugs and/or heavy contention). These issues are better left until after you've got "slightly less simple" spinlocks working.
Cheers,
Brendan