Hi,
rwosdev wrote:So this is the second method...
rwosdev wrote:1. Have a separate kernel stack set up and mapped in the SAME virtual location per-process (i.e. per-collection of threads, not per-thread)
The second method is "one kernel stack (per CPU)". You don't have multiple kernel stacks mapped at the same virtual location, you just have one kernel stack.
For this case you'd do something like this (without worrying about things like FPU/MMX/SSE/AVX again):
Code: Select all
exit_kernel:
cli
mov esi,[current_task_TCB] ;Address for the task control block for the task we're returning to
;Change virtual address space
mov eax,[esi+TCB.cr3]
mov cr3,eax
;Prepare for IRET
mov eax,[esi+TCB.esp]
mov ebx,[esi+TCB.eflags]
mov ecx,[esi+TCB.eip]
push dword 0x00000023 ;User data segment (for SS)
push eax
push ebx
push dword 0x0000001C ;User code segment (for CS)
push ecx
;Load user-space state
mov eax,[esi+TCB.eax]
mov ebx,[esi+TCB.ebx]
mov ecx,[esi+TCB.ecx]
mov edx,[esi+TCB.edx]
mov edi,[esi+TCB.edi]
mov ebp,[esi+TCB.ebp]
mov esi,[esi+TCB.esi]
;Return to user-space
iretd
While a CPU is running user-space code its kernel stack is empty; and when something causes a switch to CPL=0 the CPU switches to that CPU's kernel stack and pushes some stuff onto it, then kernel has do the reverse of the above to shift all of the user-space state to the tasks' "task control block" and this removes the information that the CPU pushed (return SS:ESP, return EFLAGS, return EIP) so that the kernel stack is empty again.
For example, for a page fault exception it might be:
Code: Select all
page_fault_exception:
push esi
mov edi,[current_task_TCB]
pop dword [edi+TCB.edi]
mov [edi+TCB.eax],eax
mov [edi+TCB.ebx],ebx
mov [edi+TCB.ecx],ecx
mov [edi+TCB.edx],edx
mov [edi+TCB.esi],esi
mov [edi+TCB.ebp],ebp
pop dword [edi+TCB.errorCode]
mov eax,cr2
mov [edi+TCB.cr2],eax
pop dword [edi+TCB.eip]
add esp,4 ;Remove return CS
pop dword [edi+TCB.eflags]
pop dword [edi+TCB.esp]
add esp,4 ;Remove return SS
;Add "handle page fault for this task" joblette to the kernel's prioritised queue/s of things to do
mov eax,JOBLETTE_TYPE_PAGE_FAULT
mov ebx,edi ;ebx = 1st piece of joblette data (address of TCB for task that needs its page fault handled)
call add_new_joblette
;** At this point, all user-space state has been saved, kernel stack is now empty, and nothing useful is in any of the registers **
;Enter the kernel's "do whatever joblette is most important" loop.
sti
jmp kernel_entry
Note that because there's only one kernel stack (per CPU) the "ESP0" field in the TSS never changes - instead it's set once during boot (e.g. during the AP CPU startup sequence).
While the kernel is running it only changes the "current_task_TCB" variable (which effects the task that the kernel would return to if the kernel has nothing more important to do).
For the kernel's prioritised queue/s of things to do ("joblettes"); the kernel might do many joblettes for many different tasks (and for itself), and then (when there's no more joblettes to do, and maybe also when a task that is ready to run has a higher priority than the highest priority remaining joblette) the kernel would leave its "joblette loop" and jump to the "exit_kernel" code above.
rwosdev wrote:Am I mostly correct?
No; either it's very wrong for "kernel stack per task" or its very wrong for "one kernel stack (per CPU)".
I think (and hope) you're using "kernel stack per task" (and not the second method - "one kernel stack (per CPU)"; and in that case you've ignored everything I said last time and still think that IRQs have something to do with task switching when they do not.
I also think you're making a second mistake - thinking that the scheduler's timer (PIT) is the only thing that causes task switches. For all operating systems most task switches are caused by tasks blocking (e.g. because they have to wait for data from disk, from network, from user, from another task, etc) and by tasks unblocking (e.g. because the data that they were were waiting for arrived).
For "kernel stack per task" you'd implement a low level "go to task" routine (like the example code I provided last time); then you'd implement a "find task to switch to and switch to it" routine (which figures out which task should get CPU time next and then calls the "go to task" routine). When a task blocks and you have to find another task to run you'd call the "find task to switch to and switch to it" routine; and when a task unblocks that has higher priority than the currently running task you'd call the "go to task" routine directly (bypassing the "find task to switch to and switch to it" code) so that the higher priority task preempts the currently running task immediately; and when the currently running task has used too much CPU time you'd call the "find task to switch to and switch to it" routine.
For that last part; the timer IRQ would occur, the timer interrupt handler would do stuff (wake up sleeping tasks, update a "ticks since boot" variable, etc); then (optionally, only for some kinds of scheduling) check if the currently running task used too much CPU time and call the "find task to switch to and switch to it" routine if it did; then IRET. The timer interrupt handler would not do any of the actual task switching itself, would not save or load user-space state, and would not touch CR3.
Cheers,
Brendan