Page 4 of 4
Re: Constructing a multi-core monitor / debugger
Posted: Fri Apr 15, 2011 4:22 pm
by Combuster
*cough* Although I know in advance this remark is futile at your address, but to all the other readers I want to remark that hiding symptoms should never be an equivalent of a fix.
Re: Constructing a multi-core monitor / debugger
Posted: Sat Apr 16, 2011 12:19 am
by rdos
Combuster wrote:*cough* Although I know in advance this remark is futile at your address, but to all the other readers I want to remark that hiding symptoms should never be an equivalent of a fix.
Of course not. It was just a remark that the changed logic for getting current core/thread is much faster (and with less contention) after modifying the scheduler to get current core from a fixed selector. I will certainly remove the cause of the deadlock when I've figured out a better method for handling IPI wakeups of blocked cores.
BTW, the problem is still sufficiently common with 3 cores in order to make it reproducible. If that wasn't the case, I'd have a problem.
Re: Constructing a multi-core monitor / debugger
Posted: Sat Apr 16, 2011 10:53 am
by rdos
The primary problem is solved (after a complex analysis of the stack-frames of the 3 cores). Turns out that the unblock function enabled interrupts while still holding the spinlock. This resulted in it accepting an unblock-int, which deadlocked the system when it tried to acquire the same lock again. After this is fixed, some of my more complex test programs run quite well (like the GUI demo with two threads). I could even stress it even more by starting 3 instances of this program.
But there are still bugs. The system still locks-up in some situations, and there are faults in the GUI-demo app that I've not seen on single-core systems. There are also still a lock-up in the initialization code when 4 cores are started, but now the monitor seems to catch this situation (it is possible to break it).
Update: The primary problem rather was that spinlocks were protecting too much of the code. The IPIs should run outside of the spinloop rather than inside it. With this done, I get more "friendly" errors in the kernel debugger or monitor, and no more system hangs. The major problem now is that the schduler lock sometimes seem not to work. I know it works in the single-core version, but apparently the multi-core locks does not always work, which leads to synchronization primitives malfunctioning.