max wrote:So I've implemented a reentrant mutex implementation and wanted to hear some opinions whether if I have covered all the edge-cases.
It's multiprocessor safe and modification is protected from preemption (hence the interrupt disabling meanwhile).
taskingGetLocal() returns a processor-local structure - as long as locksHeld variable is > 0, no tasks are scheduled to avoid kernel deadlocking.
Well, that won't help with the stated problem. A deadlock can happen as soon as two processes can hold two locks each, and you have SMP support, so that is sort of a given. So the only thing you managed to do is increase latency.
Code: Select all
void mutexAcquire(g_mutex* mutex, bool smp)
{
if(mutex->initialized != G_MUTEX_INITIALIZED)
mutexErrorUninitialized(mutex);
while(!mutexTryAcquire(mutex, smp)) {
asm("pause");
}
}
Well, that's going to burn CPU. As long as one thread is holding the lock, all the other ones are left, uselessly spinning in vain, hoping for a free lock each time, but never finding it.
The rest looks solid enough. So it is a recursive mutex implementation. I have never actually seen a use for recursive mutexes that wasn't a design error. But to each their own, I suppose.
max wrote:What are your best practices for mutual exclusion?
For short-time locks, I have spinlocks. Similar to your version, that you buried in the mutex implementation. Spinlocks disable interrupts while held. You are never allowed to hold more than one spinlock, and you are not allowed to go to userspace or to schedule another task while holding a spinlock. This is typically accomplished by returning the lock in the function that takes it, but there is an exception: Since signal handlers are declared on a process level, I have to guard against setting them from multiple threads, which I do with a spinlock. However, if a CPU exception makes my kernel send a signal to the current thread, I have to prevent the thread from just ignoring the signal. So I have a function force_signal(), which resets the signal handler for the signal that is about to be forced on the thread, if the signal was blocked or ignored. But the function that actually handles signals is called from the code that returns to userspace. I can't return the lock in force_signal(), since then another thread might change signal handlers while the current thread transitions to handle_sigpending(). So in this one instance, force_signal() is allowed to take a spinlock and not return it, because handle_sigpending() will return it instead.
For long-time locks, I have mutexes. Non-recursive, mind. These mutexes use futexes internally. Which means that only system calls and kernel tasks can use it, but not interrupts. But that's OK: Interrupts aren't supposed to do much, anyway. Mostly just to schedule in a kernel task.
Futexes are basically pointers to int, with two major operations to them: Wait and wake. When waiting on a futex, the kernel ensures the futex still has a certain value (given as argument), and if so, puts the calling process to sleep, in such a way, that if another process changes the value and calls wake(), the current process is not put to sleep.
Wake then obviously wakes a process up.
For the locks themselves, I stole the design from musl. That is, both the lock and the waiter count are kept in a single int. If that int is zero, the lock is free and there are no waiters. If it is a positive number x, the lock is free, but there are x waiters. If it is a negative number INT_MIN + x, the lock is held and there are x - 1 waiters.
Code: Select all
void lll_lock(volatile int* lock, int priv) {
int l;
/* 1st: Try locking an uncontended lock */
l = a_cas(lock, 0, INT_MIN + 1);
if (!l) return;
/* 2nd: Spinlock */
for (int try = 10; try; try--) {
if (l < 0) /* lock is held: Hope the other thread gave it back when we didn't look. */
l -= INT_MIN + 1;
int t = a_cas(lock, l, l + INT_MIN + 1);
if (t == l)
return;
l = t;
}
/* 3rd: Bite the bullet and just wait already. */
l = a_fetch_add(lock, 1) + 1;
for (;;) {
if (l < 0) {
futex_wait(lock, l, 0, priv);
l = (l - INT_MIN) - 1;
}
int t = a_cas(lock, l, l + INT_MIN);
if (t == l)
return;
l = t;
}
}
lll_unlock(volatile int *lock, int priv) {
if (a_fetch_add(lock, -(INT_MIN + 1)) != INT_MIN + 1)
futex_wake(lock, 1, priv);
}
The futex functions themselves then first translate the address into a physical address (possibly triggering a page load in the process, and locking the page into memory, so it can't be swapped out). This way, multiple processes can talk about the same futex in shared memory. Then locate the correct queue (there's a global futex queue and a queue in each process, and the priv flag tells the functions which to use), take a spinlock on it, ensure the value is still correct, then put an object in the queue, before releasing the spinlock and going to the scheduler.
The wake function takes the spinlock and proceeds to wake the thread in there. If the wake happens to get between the waiter releasing the spinlock and going to the scheduler, then the waiter looses a single round through the scheduler. Not the end of the world. And with a little modification, I can make this interruptible, so a signal can be delivered while waiting for a lock.
And yes, the new atomics are just the old sync functions. But I just keep my own assembly atomics around. That way, I at least know the memory order semantics.