SMP and the lost + wake-up problem
Posted: Wed Jan 27, 2021 10:27 pm
I have decided that now is the time to get rid of my "big kernel lock" and start addressing a number of things I have been postponing. One of them is the infamous "last wake-up" problem where you basically check for a condition and put the thread to sleep. The problem is that the condition might change between the time you check for it and the time you put your thread to sleep.
Consider this mutex implementation for example:
Inside the "Mutex::lock()" method, there is a comment showing where the problem is:
Q: what happens if the owner of the mutex unlocks it between the while() condition and the call to suspend the thread?
A: a lost wake-up. The thread is put to sleep waiting on the mutex to be released, but that will never happen because the mutex is in fact not locked anymore.
Now the solution to this without SMP is pretty straightforward: you disable interrupts between checking the condition and suspending the thread.
How do you handle this with SMP though? Well use a spinlock I suppose... So basically you need to lock/unlock the spinlock like you would disable/enable interrupts in the non-SMP case.
So how can I accomplish that? What I think needs to happen is this: you lock the spinlock, you call Suspend(reason, spinlock) and after the thread switch (i.e. sched_schedule()/sched_switch()), you unlock that spinlock. This way no other thread running on some other CPU can release the mutex before your thread is probably put to sleep.
How do you pass the spinlock to the other thread? I am thinking that it will need to be stored in a per-cpu data structure (I have one that I access with the fs or gs register). So basically:
1) Add a spinlock inside the mutex class
2) When unlocking the mutex: lock the spinlock, wakeup a waiting thread if any, unlock the spinlock
3) When locking the mutex: lock the spinlock and try to get the mutex
3a) if that succeeds, great, unlock the spinlock and we are done
3b) if that fails, call suspend with a reference/pointer to the spinlock. Save the spinlock pointer in the per-cpu data structure, sched_schedule()/sched_switch(), and one of the first thing to do when resuming the new thread is check if there was a spinlock held by the previous thread, and if so unlock it.
Sounds like this would work... But I was wondering if people had some feedback / other ideas.
Consider this mutex implementation for example:
Code: Select all
class Mutex
{
public:
Mutex();
void lock();
bool try_lock();
void unlock();
private:
std::atomic_bool m_lock;
int m_owner;
WaitQueue m_waiters;
};
Code: Select all
Mutex::Mutex()
: m_owner(-1)
{
}
void Mutex::lock()
{
while (!try_lock())
{
// TODO: we are suffering from the lost wake-up problem here!
m_waiters.Suspend(TaskState::Mutex);
}
}
bool Mutex::try_lock()
{
if (!m_lock.exchange(true, std::memory_order_acquire))
{
m_owner = cpu_get_data(task)->m_id;
return true;
}
else
{
return false;
}
}
void Mutex::unlock()
{
assert(m_owner == cpu_get_data(task)->m_id);
m_owner = -1;
m_lock.store(false, std::memory_order_release);
m_waiters.WakeupOne();
}
Q: what happens if the owner of the mutex unlocks it between the while() condition and the call to suspend the thread?
A: a lost wake-up. The thread is put to sleep waiting on the mutex to be released, but that will never happen because the mutex is in fact not locked anymore.
Now the solution to this without SMP is pretty straightforward: you disable interrupts between checking the condition and suspending the thread.
How do you handle this with SMP though? Well use a spinlock I suppose... So basically you need to lock/unlock the spinlock like you would disable/enable interrupts in the non-SMP case.
So how can I accomplish that? What I think needs to happen is this: you lock the spinlock, you call Suspend(reason, spinlock) and after the thread switch (i.e. sched_schedule()/sched_switch()), you unlock that spinlock. This way no other thread running on some other CPU can release the mutex before your thread is probably put to sleep.
How do you pass the spinlock to the other thread? I am thinking that it will need to be stored in a per-cpu data structure (I have one that I access with the fs or gs register). So basically:
1) Add a spinlock inside the mutex class
2) When unlocking the mutex: lock the spinlock, wakeup a waiting thread if any, unlock the spinlock
3) When locking the mutex: lock the spinlock and try to get the mutex
3a) if that succeeds, great, unlock the spinlock and we are done
3b) if that fails, call suspend with a reference/pointer to the spinlock. Save the spinlock pointer in the per-cpu data structure, sched_schedule()/sched_switch(), and one of the first thing to do when resuming the new thread is check if there was a spinlock held by the previous thread, and if so unlock it.
Sounds like this would work... But I was wondering if people had some feedback / other ideas.