Hi,
rdos wrote:OK, so the load balancer seem to work ok, but some synchronization primitives seem to fail. Now that the scheduler-locks are per core (and not per system), some things that could previously be handled with locking the scheduler, no longer can be handled like this.
That is good - some things that should never have been handled by taking a global scheduler lock need to be found and fixed so they don't rely on the existence of any single global lock.
rdos wrote:Mostly it seems to be critical sections that are broken. These need to use the global list spinlock. However, the problem is when a thread is blocked on a critical section. It needs to keep the spinlock until it has been queued in the wait-list, but with the current scheduler-design this is not possible. IOW, I need a new blocking function that presumes the global list spinlock is already taken, and that puts the thread in the target list, and releases the spinlock. Ideally, it should first put the thread in the section blocked list, release the spinlock, save the thread context (registers) and reschedule. Alternatively, save thread cotext, put thread in section blocked list, release the spinlock, and reschedule. The latter scenario would create more lock-contention, but it would be faster when there is no lock-contention. Maybe critical sections should even contain a spinlock field so each section could have a private spinlock instead. This would increase size of critical section data for non-SMP systems, but it would scale better on SMP-systems.
Not sure I understand this properly. For example:
"
However, the problem is when a thread is <blocked or spinning?> on a critical section. It needs to keep the spinlock until it has been queued in <the global wait list or one of the many different wait-lists?> , but with the current scheduler-design this is not possible."
For threads that are blocked on a critical section; "blocked" implies some sort of mutex where the thread is given no CPU time at all until the mutex is released. In this case you have (for e.g.) one queue for each mutex, where all tasks that are blocked/waiting to acquire a mutex are put onto the queue for that mutex. When a thread releases the mutex it checks the mutex's queue and unblocks the first thread that is waiting for the mutex (if necessary). The mutex's queue needs to be protected by a spinlock; so for a system with 123 mutexes you have 123 queues and 123 splinlocks.
If a thread is spinning on a critical section; then the scheduler doesn't necessarily need to know or care. The main problem is avoiding deadlocks where 2 (or more) things on the same CPU try to use the same spinlock at the same time. A simple example is a spinlock that is acquired by an IRQ handler - if the lock is held by something else on the same CPU when the IRQ occurs, then the IRQ handler can't acquire the lock until the IRQ handler returns, so the IRQ handler spins forever trying to acquire the spinlock. The same deadlock problem can occur for locks that the scheduler requires to perform task switches - something else running on the CPU acquires the lock, then the scheduler tries to do a task switch but can't because something has a lock it requires, so the scheduler spins forever.
There is another problem though. If a spinlock is never used by an IRQ handler or by the scheduler; one thread can acquire the spinlock, then the scheduler can switch to another thread, then all other threads will be unable to acquire that lock until the first thread gets CPU time and releases it. This will work correctly, but it will also cause major performance problems (threads could waste lots of CPU time spinning). There's 2 ways to avoid this problem - use mutexes instead (which are relatively expensive and should only be used where the critical sections are relatively long), or disable/postpone task switches (for that CPU) while a thread has acquired a spinlock (on that CPU).
This means that to avoid deadlocks and performance problems, the kernel needs 2 different types of spinlocks:
- Spinlocks that aren't acquired by IRQ handlers, where task switches (for that CPU) are postponed while the spinlock is held
- Spinlocks that are acquired by IRQ handlers, where task switches (for that CPU) are postponed and IRQs are disabled (for that CPU) while the spinlock is held
For the first type of spinlock, I have a per-CPU "number of spinlocks held" counter and a per-CPU "task switch was postponed" flag. The code to acquire a spinlock goes like this:
- disable IRQs for the CPU
- attempt to acquire the spinlock
- if the lock was not acquired, re-enable IRQs and do a NOP or PAUSE (so it doesn't mess up IRQ latency), then disable IRQs and try to acquire the lock again
- increase the "number of spinlocks held" counter for the CPU
- re-enable IRQs
The code in the scheduler to perform a task switch goes like this (interrupts are assumed to be disabled by caller):
- if the "number of spinlocks held" counter" is non-zero then:
- set the "task switch was postponed" flag for this CPU
- return without doing the task switch
- else:
The code to release a spinlock does this:
- disable IRQs for the CPU
- releases the actual spinlock
- decrease the "number of spinlocks held" counter for the CPU
- if the "number of spinlocks held" was decreased to zero; and if the "task switch was postponed" flag is set:
- clear the "task switch was postponed" flag
- tell the scheduler to do a task switch
- re-enable IRQs
For the second type of spinlock (used for all spinlocks that are acquired by IRQ handlers), the code is almost exactly the same as the first type of spinlock; except that the code to acquire the spinlock leaves IRQs disabled, and the code to release the spinlock restores IRQs. Note that "restores IRQs" isn't the same as "enables IRQs" - e.g. "popfd" to restore the previous interrupt flag's state and not "sti".
If you think about this, you'll notice that code running in a critical section inside an IRQ handler can ask the scheduler to do a task switch without any problem (the task switch will be delayed until it's safe).
For threads that block there's 2 different situations. The first is when the thread has been put on some sort of queue or data structure corresponding to some sort of resource, and something else will put the thread onto a "ready to run" queue when whatever it's waiting for occurs (e.g. a timer IRQ will unblock the thread eventually, another thread releasing a mutex will unblock the thread eventually, etc).
The other situation is that the thread is blocked for some reason where no queue or data structure is involved. In this case all you need is a set of flags for each thread that says what reason/s it's blocked. An example would be "waiting for IPC", where the kernel clears the thread's "waiting for IPC" flag when the thread receives some IPC and then puts the thread onto a "ready to run" queue if none of the flags are set.
For the "set of flags that say if/why a thread is blocked", I'd include one flag for all of the "thread is put on some sort of queue or data structure" cases. Then the scheduler can provide 2 generic functions - one to block the current task (e.g. "block_thread(int flags);") that sets the "why a thread is blocked" flag/s and causes a task switch; and another function (e.g. "clearFlag(threadID thread, int flags);") that clears flags and takes care of unblocking threads and pre-emption.
If you combine all of the above; you get the following primitives:
- 2 slightly different types of spinlocks
- 1 function to block threads
- 1 function to unblock threads
Mutexes/semaphores, per CPU timer queues (used for "sleep()", etc), waiting for IPC, and everything else is built on top of those primitives.
Cheers,
Brendan