A simple preemptive scheduler and Brendan's tutorial

8infy · Post by **8infy** » Sat Jun 27, 2020 8:03 am

Hi, I'm currently working on my scheduler/multitasking, just trying out the general pipeline, I'm using software switching so 1 single TSS, and a kernel stack per thread.

Now, looking at Brendan's tutorial https://wiki.osdev.org/Brendan%27s_Mult ... g_Tutorial, his switch_to_task example:

Code: Select all

;C declaration:
;   void switch_to_task(thread_control_block *next_thread);
;
;WARNING: Caller is expected to disable IRQs before calling, and enable IRQs again after function returns
 
switch_to_task:
 
    ;Save previous task's state
 
    ;Notes:
    ;  For cdecl; EAX, ECX, and EDX are already saved by the caller and don't need to be saved again
    ;  EIP is already saved on the stack by the caller's "CALL" instruction
    ;  The task isn't able to change CR3 so it doesn't need to be saved
    ;  Segment registers are constants (while running kernel code) so they don't need to be saved
 
    push ebx
    push esi
    push edi
    push ebp
 
    mov edi,[current_task_TCB]    ;edi = address of the previous task's "thread control block"
    mov [edi+TCB.ESP],esp         ;Save ESP for previous task's kernel stack in the thread's TCB
 
    ;Load next task's state
 
    mov esi,[esp+(4+1)*4]         ;esi = address of the next task's "thread control block" (parameter passed on stack)
    mov [current_task_TCB],esi    ;Current task's TCB is the next task TCB
 
    mov esp,[esi+TCB.ESP]         ;Load ESP for next task's kernel stack from the thread's TCB
    mov eax,[esi+TCB.CR3]         ;eax = address of page directory for next task
    mov ebx,[esi+TCB.ESP0]        ;ebx = address for the top of the next task's kernel stack
    mov [TSS.ESP0],ebx            ;Adjust the ESP0 field in the TSS (used by CPU for for CPL=3 -> CPL=0 privilege level changes)
    mov ecx,cr3                   ;ecx = previous task's virtual address space
 
    cmp eax,ecx                   ;Does the virtual address space need to being changed?
    je .doneVAS                   ; no, virtual address space is the same, so don't reload it and cause TLB flushes
    mov cr3,eax                   ; yes, load the next task's virtual address space
.doneVAS:
 
    pop ebp
    pop edi
    pop esi
    pop ebx
 
    ret                           ;Load next task's EIP from its kernel stack

Why are TCB.ESP and TCB.ESP0 different values? What is in ESP0? Aren't they supposed to be the same since we want to get to the current kernel stack top when jumping from ring3 -> ring0?
I've tried looking on this forum and some people suggesting just setting it to the top of the stack, and never touching it again. I tried both approaches, when I dynamically change the top of the stack (setting it in the tss each time before switching to the task) I get random page faults that I cannot trace, probably something to do with stack corruption because the EIP was set to pop gs in the syscall handler when the page fault happened.

When I never touch the ESP0 value, it works perfectly fine, heres a screenshot (its only set once at the beginning to the top of the stack, plus the IRET and schedule() frame, so top - 40 bytes)
As you can see the userland thread successfully returns to the syscall in which it was interrupted.

And here's me updating the ESP0 in the TSS before switching to the task each time:
As you can see, it's a page fault at a completely random address, and both the faulty address and the address of the instruction are different each time...
(That happens after the scheduler tries to switch from ring3 task to any other)

So then my question is what should I set the ESP0 to? The current kernel stack top for the thread? When do I update it? The Brendan's tutorial doesn't show any code where he updates the ESP0 ever.
Would really appreciate a detailed response on how to handle this, my brain is melting at this point...

UPD: after thinking about it more, I think I get it now. ESP0 always points to the beginning of the stack because whenever it's fetched from the TSS it's 100% empty, so it doesn't make sense for it to point anywhere other than the beginning of the stack. Because if it's not empty that means that we're already in ring0 therefore it's never fetched from the TSS. Is this correct?

nullplan · Post by **nullplan** » Sat Jun 27, 2020 11:17 am

8infy wrote:UPD: after thinking about it more, I think I get it now. ESP0 always points to the beginning of the stack because whenever it's fetched from the TSS it's 100% empty, so it doesn't make sense for it to point anywhere other than the beginning of the stack. Because if it's not empty that means that we're already in ring0 therefore it's never fetched from the TSS. Is this correct?

That is correct. That's why you only update ESP0 when switching tasks. Not when switching between kernel and user space.

8infy · Post by **8infy** » Sat Jun 27, 2020 11:23 am

nullplan wrote:
8infy wrote:UPD: after thinking about it more, I think I get it now. ESP0 always points to the beginning of the stack because whenever it's fetched from the TSS it's 100% empty, so it doesn't make sense for it to point anywhere other than the beginning of the stack. Because if it's not empty that means that we're already in ring0 therefore it's never fetched from the TSS. Is this correct?
That is correct. That's why you only update ESP0 when switching tasks. Not when switching between kernel and user space.

Thanks man, this realization cost me a few days and thousands of page faults

LukeyTheKid · Post by **LukeyTheKid** » Thu Jul 30, 2020 6:53 am

To clarify (question, not a statement):

ESP0 = top/beginning of the per-task kernel stack. Never changes for the process (once we set that in the TCB, it is never updated), but is used to replace the task state segment value TSS.ESP0 value on a task switch
ESP = actual per-task stack pointer, which will vary based on whatever work the task was doing when it was pre-empted

Is that what you mean?

nullplan · Post by **nullplan** » Thu Jul 30, 2020 9:36 am

LukeyTheKid wrote:Is that what you mean?

Yes, it is.

OSDev.org

A simple preemptive scheduler and Brendan's tutorial

A simple preemptive scheduler and Brendan's tutorial

Re: A simple preemptive scheduler and Brendan's tutorial

Re: A simple preemptive scheduler and Brendan's tutorial

Re: A simple preemptive scheduler and Brendan's tutorial

Re: A simple preemptive scheduler and Brendan's tutorial