Atomicity

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Icee
Member
Member
Posts: 100
Joined: Wed Jan 08, 2014 8:41 am
Location: Moscow, Russia

Re: Atomicity

Post by Icee »

Structurally the code is correct now, except for the fact that you're working on a register (EAX) instead of a memory location. If this is what you meant by "pseudo-code" than it's fine. But just in case:

(1) moving to a register doesn't establish any "magic" link between the register and the memory location, it's only the value that is copied;
(2) a LOCK prefix on a BT/BTS (or any other instruction) without a memory operand doesn't make sense, is illegal, and will cause an #UD (see 8.2.7 in AMD Manual Vol. 2).
User avatar
KemyLand
Member
Member
Posts: 213
Joined: Mon Jun 16, 2014 5:33 pm
Location: Costa Rica

Re: Atomicity

Post by KemyLand »

Yes, that's why I call it pseudo-code. I already made a final version (and edited) to avoid confusion.
Happy New Code!
Hello World in Brainfuck :D:

Code: Select all

++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++.
[/size]
Icee
Member
Member
Posts: 100
Joined: Wed Jan 08, 2014 8:41 am
Location: Moscow, Russia

Re: Atomicity

Post by Icee »

KemyLand wrote:Yes, that's why I call it pseudo-code. I already made a final version (and edited) to avoid confusion.
Now your address calculations are wrong. Both here:

Code: Select all

lea 8(%ebp), %eax # We use a Size here, not a struct Spinlock
And here:

Code: Select all

lock bts $0, -4(%ebp)
User avatar
KemyLand
Member
Member
Posts: 213
Joined: Mon Jun 16, 2014 5:33 pm
Location: Costa Rica

Re: Atomicity

Post by KemyLand »

Sorry, my mistake (again...)

Now repaired the static variables thing.
But, I will still use 8(%ebp) to refer to the first argument. (Why not?)
Happy New Code!
Hello World in Brainfuck :D:

Code: Select all

++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++.
[/size]
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Atomicity

Post by Brendan »

Hi,

Continued from:
Brendan wrote:Please note that there is more involved in designing efficient spinlocks than I've described above - I still haven't discussed lock fairness or cache thrashing prevention (or instrumenting locks to detect bugs and/or heavy contention). These issues are better left until after you've got "slightly less simple" spinlocks working. ;)
But first; there's one thing I didn't mention with the last spinlock (the one that disables IRQs). If you acquire one lock it'll work fine, but if you then need to acquire a second lock then the second lock will enable IRQs while it's spinning. This isn't so good, because often you do need to acquire 2 or more locks. Also, it's possible that IRQs were disabled for some other reason before you acquire the first lock.

To fix that you need to store the previous state of EFLAGS somewhere, and then restore the previous EFLAGS when you release the lock:

Code: Select all

     pushfd                  ;Put previous EFLAGS onto the stack
     cli                     ;Disable IRQs (in the hope that we do acquire the lock)
     lock bts [myLock],0     ;Optimistically set bit 0 of "myLock", and return the previous value in the carry flag
     jnc .acquired           ;Lock was acquired is bit 0 was previously clear

.retry:
     popfd                   ;Restore previous EFLAGS (enable IRQs if they were enabled before)
     pushfd                  ;Put previous EFLAGS onto the stack again
    
     pause                   ;Don't waste CPU resources
     bt [myLock],0           ;Test if bit 0 of "myLock" is already set
     jc .retry               ;Retry (without doing the more expensive "lock bts [myLock],0") if bit 0 was set

     cli                     ;Disable IRQs (in the hope that we do acquire the lock)
     lock bts [myLock],0     ;Set bit 0 of "myLock", and return the previous value in the carry flag
     jc .retry               ;Retry if bit 0 was already set

.acquired:
     mov eax,[esp]           ;eax = previous EFLAGS
To release the lock, you'd do something like this:

Code: Select all

     mov dword [myLock],0    ;Clear the lock
     push eax                ;Put previous EFLAGS (still in EAX from acquiring the lock) onto stack
     popfd                   ;Restore previous EFLAGS (enable IRQs if they were enabled before)
Now, this is quite good if there's a only a very small chance of lock contention. However, it'd be nice if you can find out if there's contention or not (especially if you're testing on a system with only 8 CPUs and you're hoping it'll be OK on systems with a lot more CPUs where the chance of contention is a lot higher).

Fortunately, we're only actually using one bit for the lock. This means that we can use the other 31 bits to keep track of the number of times you had to spin. By reading the "contention counter", putting the OS under load and then reading the counter again after a while you can determine how much contention there was. Of course this will add a little overhead, and usually you're not testing lock contention and don't need to overhead, so it's probably a good idea to use conditional code.

For this, you don't want to wipe out the counter when the lock is released, so you'd want to release the lock like this:

Code: Select all

%ifdef CONTENTION_TEST
     and dword [myLock],0xFFFFFFFE  ;Clear the lock bit (without effecting the counter)
%else
     mov dword [myLock],0           ;Clear the lock
%endif
     push eax                       ;Put previous EFLAGS (still in EAX from acquiring the lock) onto stack
     popfd                          ;Restore previous EFLAGS (enable IRQs if they were enabled before)
The spinlock becomes:

Code: Select all

     pushfd                  ;Put previous EFLAGS onto the stack
     cli                     ;Disable IRQs (in the hope that we do acquire the lock)
     lock bts [myLock],0     ;Optimistically set bit 0 of "myLock", and return the previous value in the carry flag
     jnc .acquired           ;Lock was acquired is bit 0 was previously clear

%ifdef CONTENTION_TEST
    lock add [myLock],2      ;Add 1 to the counter (where bit 0 of the counter starts at bit 1 of "myLock")
%endif

.retry:
     popfd                   ;Restore previous EFLAGS (enable IRQs if they were enabled before)
     pushfd                  ;Put previous EFLAGS onto the stack again
    
     pause                   ;Don't waste CPU resources
     bt [myLock],0           ;Test if bit 0 of "myLock" is already set
     jc .retry               ;Retry (without doing the more expensive "lock bts [myLock],0") if bit 0 was set

     cli                     ;Disable IRQs (in the hope that we do acquire the lock)
     lock bts [myLock],0     ;Set bit 0 of "myLock", and return the previous value in the carry flag
     jc .retry               ;Retry if bit 0 was already set

.acquired:
     mov eax,[esp]           ;eax = previous EFLAGS
To read the counter you'd do something like:

Code: Select all

     mov eax,[myLock]      ;eax = (contention_counter << 1) | lock_bit
     shr eax,1             ;eax = contention_counter
One of the annoying things about writing code that needs locks is that it's easy to make mistakes, and when you do make a mistake it can be virtually impossible to find the problem. For a simple example, if you release a lock by accident then everything may seem to work fine, but occasionally the data that's protected by the lock might be mysteriously corrupted.

Fortunately, it's relatively easy to detect some of the common bugs and avoid excruciating pain. For example, when you release the lock it's trivial to test if the lock was acquired first, and do some sort of kernel panic if your code ever tries to release a lock that wasn't acquired:

Code: Select all

lock btr [myLock],0          ;Clear the lock bit (without effecting the contention counter), and get the previous lock bit
     jc .wasLocked           ;Don't do kernel panic if the lock was acquired

     ;** Insert code to do kernel panic here! **

.wasLocked:
     push eax                ;Put previous EFLAGS (still in EAX from acquiring the lock) onto stack
     popfd                   ;Restore previous EFLAGS (enable IRQs if they were enabled before)
This is better than nothing; but we can do better. If you get a kernel panic you may not know if the CPU that detected the problem actually caused the problem or if some other CPU caused the problem, you'll just know something is wrong. The other problem is that it's possible to not detect the problem at all. For example, if a piece of code releases the lock and then acquires it (instead of acquiring it and then releasing it).

For something better, you can store a 31-bit "CPU number" telling you which CPU is using the lock; and check that the CPU that releases the lock is the same CPU that acquired the lock. In this case, the code to release the lock might look like this:

Code: Select all

     mov ebx,0
     xchg [myLock],ebx       ;"myLock = 0, ebx = (actual_CPU_number << 1) | lock_bit
     lea ecx,[ecx*2+1]       ;ecx = (expected_CPU_number << 1) | lock_bit
     cmp ebx,ecx             ;Was everything correct?
     je .wasCorrect          ; yes, don't do the kernel panic

     ;** Insert code to do kernel panic here! **

.wasCorrect:
     push eax                ;Put previous EFLAGS (still in EAX from acquiring the lock) onto stack
     popfd                   ;Restore previous EFLAGS (enable IRQs if they were enabled before)
And the code to acquire the lock:

Code: Select all

     lea ecx,[ecx*2+1]         ;ecx = (CPU_number << 1) | lock_bit
     pushfd                    ;Put previous EFLAGS onto the stack
     cli                       ;Disable IRQs (in the hope that we do acquire the lock)
     mov eax,0
     lock cmpxchg [myLock],ecx ;If myLock = eax (zero), acquire the lock and set CPU number
     je .acquired              ;Lock was acquired

.retry:
     popfd                     ;Restore previous EFLAGS (enable IRQs if they were enabled before)
     pushfd                    ;Put previous EFLAGS onto the stack again
    
     pause                     ;Don't waste CPU resources
     cmp dword [myLock],0      ;Test if myLock = 0
     jne .retry                ;Retry (without doing the more expensive "lock cmpxchg") if myLock wasn't 0

     cli                       ;Disable IRQs (in the hope that we do acquire the lock)
     mov eax,0
     lock cmpxchg [myLock],ecx ;If myLock = eax (zero), acquire the lock and set CPU number
     jne .retry                ;Retry if myLock wasn't 0

.acquired:
     mov eax,[esp]             ;eax = previous EFLAGS
Note: If you want to do this and have a contention counter at the same time, you can use a pair of 32-bit values for the lock, or use (e.g.) 16 bits for a CPU number and 15 bits for the contention counter.

Another common bug with locking is forgetting to release the lock. This isn't really a bad bug (the OS will lock up, making it easy to notice there's a problem); but it'd be much nicer to get better details if/when it happens (to make finding/fixing the bug easier). There's also another problem that is much harder to detect - code that holds the lock for far too long (even without contention it can be a problem, especially if IRQs are disabled while a lock is held). For this, when acquiring the lock you'd only need to get the current time from somewhere (e.g. TSC) and have a time-out (by checking the time while you're spinning to see if the time out has expired). If you're storing a "CPU number" in the lock then (if the time out expires) you can report which CPU hasn't released the lock quickly enough. You could also store the address of the code that acquired the lock and report that too.

If you're getting the time anyway, you could also use it to improve the contention counter. For example, you could keep track of the "worst case so far" amount of time spent spinning, and/or the total amount of time all CPUs have spent waiting for the lock (and combined with the contention counter, calculate the average amount of time spent spinning when there's contention).

It's important not to forget that this type of spinlock is only really good if there's a very small chance of lock contention. Adding "conveniences" to detect bugs and performance problems does add a little overhead when you're spinning, but that doesn't matter because you shouldn't be spinning much to start with.

Please note that there is still more involved in designing efficient spinlocks than I've described above - I still haven't discussed lock fairness or cache thrashing prevention. These things become important when you expect some lock contention (and are the reason why the spinlocks I've shown so far are only good if there's a very small chance of lock contention). ;)


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Post Reply