Hi,
Octacone wrote:I need to save EFlags because of virtual 8086 mode and some other flags such as carry and auxiliary that the CPU needs.
No, you don't.
Sometimes EFLAGS needs to be saved when the CPU switches from user-space to kernel (and loaded again when the CPU switches from kernel back to user-space), and for all of the cases where it's needed (IRQs and exceptions) the CPU automatically saves and loads EFLAGS for you. However; switching from user-space to kernel has nothing to do with task switches at all, and switching from kernel to user-space has nothing to do with task switches at all; and the code that does task switches never needs to save or load EFLAGS because (in cases where EFLAGS does need to be saved and loaded somewhere) it has already been taken care of by code that has nothing to do with task switching (as part of the "user-space <-> kernel" privilege level changes).
Octacone wrote:I also wanted to ask you about spinlocks, semaphores, mutex stuff. What is the best thing to use for a singe core monolithic kernel?
For Kernel
For multi-CPU I'd have 2 types of spinlocks where one disables IRQs and the other doesn't, but where both postpone task switches. For single-CPU I'd do it exactly the same except that I wouldn't actually have any lock - e.g. just disable IRQs or not, and the "postpone task switches" logic. This makes it easier to support multi-CPU later because you can mostly just define a few macros and use conditional code (like "#ifdef SMP") to skip the lock itself (but I'd recommend adding support for multi-CPU as soon as you can because it's easier to find mistakes one at a time as you create them than it is to deal with many mistakes at once).
For the "postpone task switching" logic; you'd increment a "task switches disabled" counter at the start of a critical section (where you'd acquire a lock if was multi-CPU); then before doing a task switch the scheduler would check if this counter is zero, and if the counter is not zero the scheduler would set a "task switch/es were postponed" flag instead of doing any task switch. Then at the end of a critical section (where you'd release a lock if was multi-CPU) you'd (atomically) decrement the "task switches disabled" counter and see if it was decremented to zero; and if it did become zero you'd tell the scheduler to do the task switch it postponed.
For mutexes and semaphores; they're mostly just a list of tasks waiting for the mutex/semaphore that is protected by a spinlock. When you acquire the mutex/semaphore you begin by acquiring the spinlock, and then check if the mutex/semphore can be acquired. If the mutex/semaphore can be acquired you acquire it and release the spinlock. If the mutex/semaphore can't be acquired you put the task on the list of tasks waiting for the mutex/semaphore, then block the task (where the task switch will be postponed) then release the spinlock (causing the postponed task switch to happen).
When you release the mutex/semaphore you begin by acquiring the spinlock again, then release the mutex/semaphore, then check if the list of tasks waiting for the mutex/semaphore is empty and remove and unblock a task if it's not, then release the spinlock.
For User-Space
For user-space; spinlocks don't make sense. For mutexes/semaphores I'd have a kernel API function that does an "atomically check value in memory and block task if value in memory wasn't zero" operation that uses the same "list of tasks waiting for that mutex/semaphore" that is used for kernel's mutexes/semaphores (with some additional code to associate the list in kernel-space with the memory location in user-space, so the kernel can figure out which list corresponds to which mutex/semaphore). Then I'd have another kernel API function that does "unblock a task on the list of tasks associated with this memory location".
The user-space side of things would use these kernel API functions; so that if the mutex/semaphore can't be acquired the "atomically check value in memory and block task if value in memory wasn't zero" kernel API function is called; and when the mutex/semaphore is released it'd call the "unblock a task on the list of tasks associated with this memory location" kernel API function. The main idea here is that most of the time (if there's no contention) neither kernel API function would be used (the whole acquire and release would happen purely in user-space).
Octacone wrote:They are used to lock the actual resource not the thread, right?
Yes - all kinds of locks (spinlocks, mutexes, semaphores) are used to protect data and not code (unless code is treated as data, in a "self-modifying code" way).
Octacone wrote:Since we're talking about thread blocking, is it as simple as having an enum with different "blocked reasons" and not letting the thread run until it gets unblocked?
That's the basic idea; but there's a whole pile of race conditions. For example, you don't want to check if the lock can be acquired (and then have a different task release the lock and wake up any waiting tasks) and then block until the lock is released (after it's too late and the task won't be unblocked for ages).
Octacone wrote:What about user mode threads, how hard is to implement that? Any significant changes required?
For user-space threads all of the multi-tasking remains the same; you just add extra/unrelated code to do the "user-space <-> kernel" switching and extra/unrelated code for the kernel API and extra/unrelated code for an executable loader.
Octacone wrote:What about stack alignment, currently my stacks are page aligned, is that okay? Doe my main kernel stack isn't, could that cause a significant performance hit?
That depends on how unaligned it is. For a 32-bit kernel the stack probably only needs to be 4-byte aligned (unless you use SSE in the kernel and need stack to be 16-byte aligned for that, but using SSE in the kernel is a bad idea anyway).
Octacone wrote:Once I actually implement processes, do you think that mapping all the stacks to the same location (same virtual addresses, different physical addresses) would be okay?
For multiple (single-threaded) processes; putting the stacks at the same virtual address (in different virtual address spaces) is fine.
For multi-threaded processes normally you can't have 2 threads in the same process (in the same virtual address space) using the same address for their stacks; but that depends on your OS (e.g. whether you implement "thread specific storage" or just "thread local storage").
Octacone wrote:Also you didn't comment anything on my quantums, they're okay then?
They're fine for now. Later you'll probably change scheduling algorithms and/or make the quantums depend on how fast the CPU is and/or make other changes; and eventually you'll have enough user-space stuff done to be able to test the OS under various conditions; so you should assume that whatever you use now is a temporary place-holder.
Cheers,
Brendan