rdos wrote:The use of timer hardware also determines if timer queues are global (PIT) or per core (LAPIC).
If you don't have LAPIC, you also can't have SMP, so I don't think the distinction matters.
rdos wrote:Another bigger issue with SMP scheduling is how to assign threads to cores. One way is to have lists of threads per core (threads are assigned to a specific core). I currently use this method. This needs to be combined with load balancing that moves threads between cores.
Same here. Although I currently pay no attention to the previously assigned CPU of a task if it had to go sleep. Yes, this likely does mean sub-par cache performance, but I haven't really noticed it so far.
rdos wrote:Threads can be woken up by other means than timers, and this needs to use IPIs if the thread is not assigned to the current core.
Currently, I have three OS-defined IPIs in use: One is for rescheduling, and causes the receiving CPU to set the timeout flag on the current task (as would an incoming timer interrupt). The second is for halting, and it just runs the HLT instruction in an infinite loop. That IPI is used for panic and shutdown. And finally, the third is the always awful TLB shootdown. There is just no way around it. I can minimize the situations I need it in, but some are just always going to remain.
rdos wrote:There is also a need to use spinlocks for IRQ protection, as cli/sti will not work for this. Generally, cli/sti are never safe on SMP.
It was this sentence that made me want to respond: Who are you writing this response to? Because if it is to me, then I already know that. And if it is to the OP, then you are abridging the truth here a bit too much.
Basically, your failure here is to distinguish between parallelism and concurrency. Parallelism is a property of the source code to both support and be safe under multiple threads of execution. Concurrency is then an attribute of the hardware to allow for the simultaneous execution of these threads. Now obviously, source code that assumes that "CLI" is the same as taking a lock is not parallel. In order to be parallel, you must protect all accesses to global variables with some kind of synchronization that ensures only one access takes place at a time.
I've made sure from day one that my kernel code was parallel. All accesses to global variables are behind atomic operations or locks. Spinlocks clear the interrupt flag, yes, but that is to avoid deadlocks from locking the same spinlock in system-call and interrupt contexts.
rdos wrote:Perhaps the toughest issue to solve is how to synchronize the scheduler, particularly if it can be invoked from IRQs.
As I've learned from the musl mailing list, fine-grained locking is fraught with its own kind of peril, so I just have a big scheduler spinlock that protects all the lists, and that seems to work well enough for now.
rdos wrote:I've tried many different solutions, but the current one probably is the most effective (and doesn't consume any ordinary registers like fs or gs). I setup my GDT so the first few selectors are at the end of a page boundary, and then map the first part to different memory areas on different CPUs with paging. Thus, the GDT is still shared among cores and can be used for allocating shared descriptors up to the 8k limit. To get to the core area, the CPU will load a specific selector which will point directly to the core area. The core area also contains the linear address and the specific (shared) GDT selector allocated for each core.
This is for protected mode, but in long mode the CPU has to load a fixed linear address instead. Which is a bit less flexible.
See, this is precisely what I meant. Now you have a complicated memory model where the same kernel-space address points to different things on different CPUs. Leaving alone that this means you need multiple kernel-space paging structures, this means you cannot necessarily share a pointer to a variable with another thread that might execute on a different core for whatever reason, and this is a complication that was just unacceptable to me.
Also, FS and GS aren't ordinary registers, they are segment registers, and their only job in long mode is to point to something program-defined. So in kernel-mode I make them point to CPU-specific data. That doesn't really consume anything that isn't there anyway and would be unused otherwise.