Page 2 of 2

Re: SMP and the lost + wake-up problem

Posted: Fri Jan 29, 2021 12:34 pm
by kzinti
Sounds reasonable. Essentially you are adding a spinlock in the TCB to say that the thread is actually running.

Re: SMP and the lost + wake-up problem

Posted: Fri Jan 29, 2021 1:15 pm
by thewrongchristian
kzinti wrote:I might have found another problem... You could end up with a CPU picking up a "suspended" thread that is not yet really suspended:

Code: Select all

CPU 1 thread A                  CPU 2 thread B

A: enter monitor
A: try to get mutex (fail)
A: queue in wait list
A: leave monitor
                                 B: enter monitor
                                 B: release mutex
                                 B: wakeup thread A
                                 B: leave monitor
                                 B: schedule()
                                 A: thread A starts to run
A: schedule()
C: thread C starts to run
How would your code prevent the above from happening? This is basically revisiting something I previously mentioned: when you add the current thread the the monitor's wait list, it is still running. The CPU is still using it's stack and address space at a minimum. That stack should not be used by another CPU until your thread switch completes on the current CPU.

PS: I am not trying to point out flaws in your implementation, I am just trying to fix mine and understand how to do it :).
I'm just glad I've got other eyes on my code. Much of this is virgin code that has not seen the light of day in the real world, it being unreleased as it is.

I think you're right, and your later suggestion of a spin lock in the TCB itself would be a perfect solution to this.

Thanks!

Re: SMP and the lost + wake-up problem

Posted: Fri Jan 29, 2021 6:59 pm
by kzinti
Of course the problem with this spinlock or flag inside the TCB is that after the switch, you need to know what was the task that was running before the switch. I am not sure I like this.

I've also looked into using futexes and I am not sure it solves the problem. But in light of the above, I might try to implement futexes nevertheless since I want them in the end for my user space locking needs.