Just thought I'd mention that because of this bus locking, using the LOCK prefix can effect the performance of other CPUs, even if those other CPUs aren't doing anything related to the lock. For e.g. if it takes a CPU 60 nS to read, 100 nS to do the operation and then another 60 nS to write, other CPUs won't be able to access the bus for any reason for 220 nS. Consider a simple spinlock:
Code: Select all
get_lock:
lock bts [the_lock],1
jc get_lock
For this reason it's recommended to use "test, test & modify" locks. The idea is to test the lock first without locking the bus using an instruction that doesn't modify so that you don't need to worry about it being atomic. For example:
Code: Select all
get_lock:
test [the_lock],1
jne get_lock
lock bts [the_lock],1
jc get_lock
Then you've got the PAUSE instruction, which benefits hyper-threading CPUs. The lock above would keep one logical CPU busy which effects the speed of the other logical CPU. To improve this, the PAUSE instruction reduces the amount of CPU resources used for the spinlock, which increases the amount of CPU resources used by the other logical CPU (ie. the waiting CPU waits slower while the working CPU works faster). This improves performance for hyper-threading:
Code: Select all
get_lock:
pause
test [the_lock],1
jne get_lock
lock bts [the_lock],1
jc get_lock
Code: Select all
get_lock:
lock bts [the_lock],1
jnc .acquired
.retry:
pause
test [the_lock],1
jne .retry
lock bts [the_lock],1
jc .retry
.acquired:
[continued in next post]