OSDev.org

Posted: **Wed Aug 09, 2017 9:31 am**

Hello,

I know I asked a loooot of multitasking, but still I can't understand it fully. It shouldn't have been so hard.

I finished implementing memory managing (both physical and virtual), system calls, synchronization between tasks, etc already. I've already implemented multitasking, but things gets really complicated when I put "functions" of my multitasking implementation to interrupt handlers.

My multitasking has two functions; switch_task(task_t* task) and find_and_switch_task()

switch_task simply saves the current task's state and loads the task specified for switch_task. It looks like this:

Code: Select all

pusha
... #push extra things, save stack to current task
mov esp, [target_task + 44] #target esp
popa
ret

find_and_switch_task() simply decides which task to switch and calls switch_task.

But how can I call these in interrupt handler? Calling them in interrupt handler would look like this:

Code: Select all

push cs #pushed by cpu
push eip #pushed by cpu

push error
push int_no

cld
pusha

push esp

call interrupt_handler

# C code:

...

pusha
... #push extra things, save stack to current task
mov esp, [target_task + 44] #target esp
popa
ret

...

# End of C code

add esp, 4

popa

add esp, 8 #clear error code and int no

iret #pops cs and eip

After the ret, new task will start and the instructions below ret will be dead, that going to result with a stack overflow. I don't want to make two version of these functions: one for interrupt handlers and one for direct switch...

When I add switching between ring-0 and ring-3 tasks things gets much more complicated as I need to calculate the value of esp when I interrupt ring-0.

Thanks in advance.

Posted: **Wed Aug 09, 2017 10:00 am**

When you call switch_task, you'll have the return EIP on top of the stack. So switch_task() does something like:

Code: Select all

# push preserved regs here
mov [current_task_struct + 44], esp
# now perform task switching

current_task_struct is the task_t of the calling process. Then, when the next task is found, you do:

Code: Select all

mov esp, [new_task_struct + 44]
# pop preserved regs here
ret

If switch_task() is called syncrhonously (outside an interrupt handler), then once it switches back to the calling process it will appear as a normal return (preserved registers will be saved, volatile registers wil lbe destroyed, stack pointer and EIP will be at the correct location).

In an interrupt handler, you:
1. push all volatile registers
2. Call switch_task() assuming that it will preserve the "preserved" registers.
3. pop the volatile registers back and iret

This way, ALL registers will be saved during an interrupt. Volatile vs preserved registers are ABI-specific (for System VA ABI on i386, I believe "esi" and "edi" are preserved and all other registers are volatile).

And you don't need 2 versions of the functions.

Posted: **Wed Aug 09, 2017 10:44 am**

mariuszp wrote:When you call switch_task, you'll have the return EIP on top of the stack. So switch_task() does something like:
Code: Select all
# push preserved regs here
mov [current_task_struct + 44], esp
# now perform task switching
current_task_struct is the task_t of the calling process. Then, when the next task is found, you do:
Code: Select all
mov esp, [new_task_struct + 44]
# pop preserved regs here
ret
If switch_task() is called syncrhonously (outside an interrupt handler), then once it switches back to the calling process it will appear as a normal return (preserved registers will be saved, volatile registers wil lbe destroyed, stack pointer and EIP will be at the correct location).

In an interrupt handler, you:
1. push all volatile registers
2. Call switch_task() assuming that it will preserve the "preserved" registers.
3. pop the volatile registers back and iret

This way, ALL registers will be saved during an interrupt. Volatile vs preserved registers are ABI-specific (for System VA ABI on i386, I believe "esi" and "edi" are preserved and all other registers are volatile).

And you don't need 2 versions of the functions.

That makes sense, but I didn't fully understand what did you mean.

If switch_task() is called outside interrupt handler, stack looks like that:

Code: Select all

cs - pushed by call instruction
eip - pushed by call instruction

eax -
ecx  \
edx   \
ebx    |  pushed by pusha
esp    |  
ebp   /
esi  /
edi -

Then I switch the new task's stack, that looks same.
Then edi, esi, ebp, ebx, edx, ecx, eax gets popped by popa. eip and cs gets popped by ret.

All registers of old task are saved and all registers of new task are restored.

Without modifying task switch code and interrupt handler, is it really possible to do that? How would look the interrupt-specific version of switch_task look like?

Thanks in advance.

Posted: **Wed Aug 09, 2017 11:34 am**

You don't need to have a separate switch_task() for interrupt handlers. Just call it from the interrupt handler, like follows:

Code: Select all

pusha
call switch_task
popa
iret

Posted: **Wed Aug 09, 2017 11:35 am**

Agola wrote:After the ret, new task will start and the instructions below ret will be dead, that going to result with a stack overflow.

Those instructions aren't dead, they're just sleeping. Eventually, some other task will call switch_task() and you'll load the original stack, and on that stack will be the return address pointing to them.

Posted: **Wed Aug 09, 2017 1:28 pm**

mariuszp wrote:You don't need to have a separate switch_task() for interrupt handlers. Just call it from the interrupt handler, like follows:
Code: Select all
pusha
call switch_task
popa
iret

Calling switch_task() from interrupt handler doesn't work. No task switch for 4-5 seconds, then crashes. The Bochs dump is like that:

Code: Select all

pusha

push esp
call interrupt_handler
...#long c code
add esp, 4

call switch_task ---> pusha
                      mov [current_task + 44], esp
                      mov esp, [next_task + 44]
                      popa
                      ret
                              
popa

add esp, 8
iret

And I still can't understand why this code should work. There's another popa after switch_task's popa so all registers that restored will be lost.

Posted: **Wed Aug 09, 2017 3:57 pm**

I do not understand what you are even trying to say here.

switch_task() should ONLY push edi, esi, ebx, ebp (the "preserved" registers). NEVER use pusha/popa as they needlessly push ESP.

Also, this should work because you are preserving "esp" in a task structure, then jumping to another task. When jumping into a task, you restore it's ESP, which then points to the saved registers again, which you pop back.

And I have no idea what your diagram is supposed to be showing.

Also, the 4-5 second crash is probably related to how you switch the task. Do you save the current task pointer in a global variable and set it? Are your links ("next" pointers) valid and initiailised? etc

Posted: **Thu Aug 10, 2017 4:56 am**

mariuszp wrote:I do not understand what you are even trying to say here.

switch_task() should ONLY push edi, esi, ebx, ebp (the "preserved" registers). NEVER use pusha/popa as they needlessly push ESP.

Also, this should work because you are preserving "esp" in a task structure, then jumping to another task. When jumping into a task, you restore it's ESP, which then points to the saved registers again, which you pop back.

And I have no idea what your diagram is supposed to be showing.

Also, the 4-5 second crash is probably related to how you switch the task. Do you save the current task pointer in a global variable and set it? Are your links ("next" pointers) valid and initiailised? etc

That is my interrupt handler. It is simply:

Code: Select all

push error_code
push int_no

cld

pusha

push esp
call irq_common
add esp, 4

popa

add esp, 8
iret

in irq_common, I call find_task(), that updates current_task and next_task.

If I call switch_task in my irq handler, these things happen:

Code: Select all

push error_code
push int_no

cld

pusha

push esp
call irq_common
add esp, 4

call switch_task ---> pusha
                      mov [current_task + 44], esp
                      mov esp, [next_task + 44]
                      popa
                      ret

popa

add esp, 8
iret

Stack before switch_task:

[cs] (pushed by cpu)
[eip] (pushed by cpu)
[flags] (pushed by cpu)
[error code]
[interrupt number]
[pusha regs]

Stack after switch_task's pusha instruction:

[cs] (pushed by cpu)
[eip] (pushed by cpu)
[flags] (pushed by cpu)
[error code]
[interrupt number]
[pusha regs]
[cs] (pushed by call)
[eip] (pushed by call)
[pusha regs (current task's regs?)] (duplicated)

--- Stack switch to new task's stack ---

Stack after stack switch:

[unknown]
[unknown]
[unknown]
[unknown]
[unknown]
[unknown]
[cs] (next task's cs)
[eip] (next task's eip)
[pusha regs (next task's regs)]

Stack after switch_task:

[unknown]
[unknown]
[unknown]
[unknown]
[unknown]

--- Code execution starts in next task after switch_task ---

Code: Select all

popa

add esp, 8
iret

These instructions are dead, or maybe sleeping? as they will executed when I switch that task again...

Octocontrabass wrote:Those instructions aren't dead, they're just sleeping. Eventually, some other task will call switch_task() and you'll load the original stack, and on that stack will be the return address pointing to them.

Also switch_task left the interrupt without using iret, I read this isn't a good thing and affects NMIs:
"Concurrent NMIs are delivered to the CPU one by one. IRET signals to the NMI circuitry that another NMI can now be delivered. No other instruction can do this signalling." - https://stackoverflow.com/questions/104 ... -interrupt

And Bochs dump says this way destroys the stack.

Lastly, this is the stack model of the tasks:

Code: Select all

cs - pushed by call instruction
eip - pushed by call instruction

eax -
ecx  \
edx   \
ebx    |  pushed by pusha
esp    |  
ebp   /
esi  /
edi -

What am I missing? I really can't understand.
This way is suggested by the many people that very knowledgeable about osdeving in this forum, so I'm sure I'm missing something important.

Thanks in advance.

Posted: **Thu Aug 10, 2017 6:18 am**

There are actually different ways to implement task switching. You could explicitly save and restore the eip as well, inside switch_task, just after esp and exactly analogously to it. This enables you to switch tasks from multiple functions if that was desirable for some reason, but it is a different approach and I believe you are making connections to it somehow. Alternatively, in the simpler and more streamlined implementation that you show, switch_task is the only function that performs task switching. Every thread is effectively switched out at the mov esp line, and is already switched back in on the following line. There are multiple issues that are not addressed in your code, such as handling the gs selector (which is usually pointed at a per-cpu and per-thread state) and the virtual address space in cr3. They do not change the principle however, because the kernel part of the address space is mostly shared, and thus switch_task will not be affected.

Let's suppose that my task is switched out at "mov esp, [next_task + 44]". Eip will be incremented with the size of the last decoded instruction as usual. Which means that the following instruction in switch_task (popa in your case) will continue to execute and unwind the state of the newly switched in task from the swapped in stack. If at some much later point the original task is switched in again (after timer interrupt, prioritization, etc) by moving its stack pointer into esp, the net result of the entire switch will appear to it as a time discontinuity. The control flow will be again on the instruction following the one that was last used to switch out the task (again the same popa) and will begin to unwind its state.

Overall, you should remember that tasks do not really get scheduled in and out other than conceptually. In reality, there is just one task that executes at all times on the cpu and its instruction flow proceeds normally. What happens is that the logical flow is switched from one data set to another data set, and the instructions follow suit by means of some iret or ret (which choose target based on the saved caller address on the restored stack.) But from the cpu's point of view changing the stack pointer does not really constitute a special event. It is as if the instruction stream schizophrenically changes its mind from time to time, about what is it that it was doing. But that doesn't stop the instruction decoder to proceed as if nothing important has happened.

Edit: There is one special case that is not addressed in the discussion. New tasks need to be dealt with specifically. I removed some text about Linux forking, which is incorrect.

Posted: **Thu Aug 10, 2017 10:48 am**

Unfortunately, I was apparently confused about the technique used in Linux forking. In fact my idea was very naive at hindsight. The kernel stack is not duplicated, but carefully populated with specific values in order to facilitate the desired unwinding outcome. I had suspicions, because copying the stack could cause issues with rbp based frame addressing, which requires the stack addresses to stay permanent. Anyway, sorry for the confusion.

The actual code is here. You can see how the unwinding return address is pointed at ret_from_fork, so that when __switch_to_asm switches the new task in and the execution of __switch_to finishes, the return jumps into ret_from_fork. It is a much more sophisticated dance with the scheduler than I originally thought.

Posted: **Fri Aug 11, 2017 3:19 pm**

Ah, thanks to everyone, finally I got understand this "concept". I've fixed my implementation and everything works.

But, there's a little thing left about it that I still don't understand:
This works with two or more ring0 tasks, but what should I do when I add also ring3 tasks? I will need to replace ret with iret as I need privilege level switch, but iret pops eflags register in stack also.

After "call switch_task", stack is like that:

[cs]
[eip]

but if I want to use iret, stack should be like that:

[eflags]
[cs]
[eip]

and if I want to make a privilege level switch, stack should be like that:

[ss]
[esp]
[eflags]
[cs]
[eip]

so I will need to check should I push ss and esp to stack. How should I edit my code to add both ring0 and ring3 task support?

Posted: **Sat Aug 12, 2017 8:26 am**

If the switched in task was originally switched out after being interrupted in user mode, the interrupt handler was the caller of switch_task. In which case, the return address on the new stack points to the instruction following the call in the handler. After switching the stack pointers and ultimately leaving switch_task, the handler will resume. In a nutshell, switch_task could have entered through any kernel code, and could exit through completely different kernel code. It could be called as part of explicit kernel thread yield, but exit in interrupt handler. The latter will terminate with iret. Which means that the CS selector from the stack will be popped, with the CPL in it. If the CPL indicates user mode, iret will know that it has to restore ss and esp off the stack as well.

You need to deal with IF somehow, because the scheduler in switch_task could be called from interrupt handler or from normal code. And you don't want to switch into another thread that was running normal code, but leave the interrupts disabled. You also do not want to resume interrupt handling code with the interrupts enabled, because this has a chance of causing stack overflow. What Linux does is, perform a sti before switch_task and a cli after (local_irq_enable/disable), but prohibits reentry into the scheduler first. What I would do instead, is remember the old IF value and save it on the stack in switch_task, because this avoids a temporary stack overflow hazard. In any case, you need to prevent reentrance into switch_task as well.

Note that you must handle the interrupt and acknowledge it to the APIC (signal EOI) before calling into switch_task. switch_task comes last in the interrupt handler. Also, you would never call it from nested interrupt handlers, which means that you need to check IF in EFLAGS of the interrupted code (EFLAGS pushed on the stack by the cpu).

In summary, the basic idea:

Code: Select all

IRQ:
  if (you successfully handle this IRQ)
    EOI
  if (interrupted code is NOT another interrupt)
    if (switch_task is not executing already)
      switch_task(IF=false)
  iret

kernel_yield()
  switch_task(IF=true)

Edit: All of this assumes that you don't use hardware interrupt stacks, i.e. interrupt stacks configured through IST. At least not for interrupts that call into switch_task.

Posted: **Sat Aug 12, 2017 11:43 am**

I don't mean to be rude, but if this type of stuff is too difficult, then...

OSDev.org

Again, multitasking

Again, multitasking

Re: Again, multitasking

Re: Again, multitasking

Re: Again, multitasking

Re: Again, multitasking

Re: Again, multitasking

Re: Again, multitasking

Re: Again, multitasking

Re: Again, multitasking

Re: Again, multitasking

Re: Again, multitasking

Re: Again, multitasking

Re: Again, multitasking