Switching to Ring 0 task with iret

Lagor · Post by **Lagor** » Tue Jun 15, 2021 12:57 pm

Hi all, after a bit of a hiatus I went back to the development of my Os.
I did a pretty good memory manager that seems to work well and now, the next logic step for me, would be to implement processes.
Since I have no filesystem from which to load executables, I figured I'd start by implementing kernel (ring 0) processes. The thought process behind this is that the code to be executed is just the code of one of my functions that already gets compiled with my kernel and also, since the only privilege level I'll be dealing with is 0, I don't need to bother with user processes Page Directories and such. I'll just use the one setup by the kernel for its own usage.
My process struct is basically a string for the name, an int for a pid, a pointer to a page dir (the same used for the kernel), some virtual memory regions allocated to the process (again, the same I use for the kernel), a struct containing all the registers values (notably eip for context switching) and a pointer to be used for the 'per process' stack.
I have setup and tested all the stuff that orbits around the actual running of the processes such as a scheduler, an interrupt that gets triggered every n milliseconds that will call the scheduler and all the logic of creating, killing and putting processes to sleep.
The problem that i have is that, at the moment, my timer/scheduler interrupt handler only goes from kernel code (outside any process context) to some other kernel code (also outside process context) and it's implemented with iret.
This is what I understand it's happening:
clock interrupt gets triggered, no matter what was happening at the time, the cpu pushes a bunch of stuff relative to the 'interrupted' context so that i can restore it later.
On top of the stuff already pushed by the cpu, I push some more stuff and call my handler routine.
My handler will save the info pushed on the stack earlier (as they are related to something that was running and, possibly, needs to be scheduled away), will choose a process to run and 'substitute' its values on the stack so that iret will switch to this process' context instead of the one previously interrupted.
If I didn't talk BS up until this point, here goes my problem.
Afaik, iret will NOT pop the ss:esp from the stack if the new process' rpl is the same as my current.
Since I'm already in ring 0 and I want to switch to some other code running in ring 0, no stack will be restored and, therefore, my new process will be executing using the stack that was already used by my kernel.
I managed to successfully execute my plan of running/stopping/sleeping/restoring some kernel code contained in my kernel (elf file) with the stack that i generally use for my 'regular kernel'.
This stops working after I run a few processes as, I assume, they all share the same stack and DO NOT clean up after themselves (example: if I get interrupted before i return from a call) resulting in corruptions and wrong memory regions being accessed.
How am I supposed to do this? Do I need to move away from iret?
More importantly, am I missing some fundamental theory here?
Any help that would shed more light on this would be very much appreciated.

Cheers,
F

nexos · Post by **nexos** » Tue Jun 15, 2021 1:17 pm

How most OSes multitask kernel threads is quite different. Your method is the same as my old method, and many problems resulted. I could explain for you, but it is already covered in the wiki at https://wiki.osdev.org/Brendan%27s_Mult ... g_Tutorial

rdos · Post by **rdos** » Tue Jun 15, 2021 2:44 pm

I think a more reasonable view of a process is that it is a new address space that happens to have a thread executing within it (and might create more). The initial thread of a new process is nothing special and should be viewed as similar to any other thread in the system.

I also think that before creating processeses you should have userspace functionality. A process basically is a new userspace address space executing in your normal (shared) kernel space. Processes in kernel space doesn't make a lot of sense since kernel space usually is mostly shared between processes.

Lagor · Post by **Lagor** » Tue Jun 15, 2021 4:17 pm

nexos wrote:How most OSes multitask kernel threads is quite different. Your method is the same as my old method, and many problems resulted. I could explain for you, but it is already covered in the wiki at https://wiki.osdev.org/Brendan%27s_Mult ... g_Tutorial

Hi, thank you so much for your answer, I've not come across that tutorial yet.
I still have some doubts though, mainly about the context switch itself.
Basically, all I need to solve my problem is in the first part, the little piece of asm code that sets up the next task and rets to it.
It kinda makes sense to me if the switch_to_task gets called by a running task itself. In his tutorial, he starts with a coop multitasking so the assumptions would be true. The code pushes register onto its own stack and then saves esp into its task struct so to be restored later.
I struggle to understand how this could work if executed by the PIT handler. He calls switch_to_task inside of his schedule() function.
Wouldn't this just save the out-of-context-kernel-timer-handler-registers into the current running process? If this is correct, why?

Cheers,
F

nexos · Post by **nexos** » Tue Jun 15, 2021 5:58 pm

One thing to know is an interrupt is different from a context switch. What happens with the PIT is that the interrupt handler gets called, and then the register state gets saved. It then calls a function that checks if there is task that is running should be preempted. If so, it calls switch_to_task. Then, when the new task runs, it restores its interrupted context. This may seem slower then what you are doing right now, but, most context switches end up occuring because a task pauses itself, and in this case, this method is faster.

Lagor · Post by **Lagor** » Wed Jun 16, 2021 1:57 am

nexos wrote:One thing to know is an interrupt is different from a context switch. What happens with the PIT is that the interrupt handler gets called, and then the register state gets saved. It then calls a function that checks if there is task that is running should be preempted. If so, it calls switch_to_task. Then, when the new task runs, it restores its interrupted context. This may seem slower then what you are doing right now, but, most context switches end up occuring because a task pauses itself, and in this case, this method is faster.

I have to say, I'm a little confused. I understood things just the way you explained them but I don't see how the code in the tutorials implements these concepts.
When a PIT interrupt gets triggered, CPU uses the IDT to locate the eip of my timer-handler and then uses the value contained in TSS.esp0 to setup the stack for my kernel code (which should be my responsibility to be kept uptaded every time I use the stack in an int-handler right?).
Now, correct me if I'm wrong, my timer-handler will never return to the code immediatly after. This is because i setup a stack with the eip of the next task and then ret to it.
So in the example i have interrupt code -> schedule() -> switch_to_task() and if I understood correctly, switch_to_task will not just return to schedule which will return to the interrupt code but will, instead, jump into the new task code.

Maybe I can't read assembly that well but in the prologue of the switch_to_task there's this code:

Code: Select all

    push ebx
    push esi
    push edi
    push ebp
    mov edi,[current_task_TCB]    ;edi = address of the previous task's "thread control block"
    mov [edi+TCB.ESP],esp

to me, that code saves the registers onto the kernel stack (I am in interrupt code) and then saves this value into the current_task state??? Why? Does it need to be restored later somehow?
Besides, since switch_to_task gets called by schedule(), wouldn't ebx, esi and so forth contain the values as set by the callee (schedule())?
Shouldn't we save the values of the registers that we got as soon as the interrupt occurred (aka, values of regs from the previously, and now preempted, task) ?

Cheers,
F

Octocontrabass · Post by **Octocontrabass** » Wed Jun 16, 2021 8:54 am

It works like an ordinary function call. Eventually switch_to_task() will return to its caller. The caller expects those registers to be unchanged, so switch_to_task() has to save them.

Let's say thread A calls switch_to_task() to switch to thread B. Then thread B runs for a while and calls switch_to_task() to switch to thread A. From thread A's perspective, all that happened is that it called switch_to_task(), some time passed, and switch_to_task() returned.

Lagor · Post by **Lagor** » Wed Jun 16, 2021 10:46 am

Octocontrabass wrote:It works like an ordinary function call. Eventually switch_to_task() will return to its caller. The caller expects those registers to be unchanged, so switch_to_task() has to save them.

Let's say thread A calls switch_to_task() to switch to thread B. Then thread B runs for a while and calls switch_to_task() to switch to thread A. From thread A's perspective, all that happened is that it called switch_to_task(), some time passed, and switch_to_task() returned.

This is what I do not agree with.
switch_to_task does NOT get called by a thread. It gets called by my interrupt handler running in ring 0.
As i understand it, interrupt handlers are not processes nor threads or tasks or however you wanna call them. They are just kernel code that runs outside any context or, if you will, in interrupt context.
This is why i don't understand why we save some data that belongs to the kernel into some process tcb.
I am not developing a microkernel nor a cooperative multitasking system.
I am trying to write a monolithic kernel with a preemptive system and, possibly, supporting kernel threads.
Tell me if what I just wrote is totally wrong because between not getting the technicalities right and having, perhaps, a theory gap, I'm going nuts

Cheers,
FG

neon · Post by **neon** » Wed Jun 16, 2021 12:04 pm

Hi,

switch_to_task would get called by the current thread. Kernel code is shared between all threads that enter it. Just because a thread enters kernel code does not make it no longer a thread.

Lagor · Post by **Lagor** » Wed Jun 16, 2021 12:28 pm

Hi, can you elaborate a little more on that? I'm very confused.
How can a thread call switch_to_task? Isn't that what happens in coop multitasking?
What if a thread never calls it? Does it run on the cpu forever?

When an int gets triggered, the cpu jumps to the entry point of my handler (found in the IDT) and restores the stack that finds in tss.esp0. To me, that is no longer the thread being executed, it is my handler.
So in any call to any function, the callee is my int handler.
The cpu doesn't know anything about processes, it's just an abstraction that I created in software, essentially by updating a structure that contains info about the current running thread.

Is this totally off?

Thank you (all) for your time.

Cheers,
F

thewrongchristian · Post by **thewrongchristian** » Wed Jun 16, 2021 12:51 pm

Lagor wrote:Hi, can you elaborate a little more on that? I'm very confused.
How can a thread call switch_to_task? Isn't that what happens in coop multitasking?
What if a thread never calls it? Does it run on the cpu forever?

When an int gets triggered, the cpu jumps to the entry point of my handler (found in the IDT) and restores the stack that finds in tss.esp0. To me, that is no longer the thread being executed, it is my handler.
So in any call to any function, the callee is my int handler.
The cpu doesn't know anything about processes, it's just an abstraction that I created in software, essentially by updating a structure that contains info about the current running thread.

Is this totally off?

Thank you (all) for your time.

Cheers,
F

I think what is confusing you is how tss.esp0 is used. %esp is set to the value of tss.esp0 only when a privilege transition occurs. For example, if you're in ring 3, and the PIT fires, the IRQ transitions the CPU to ring 0 to handle the interrupt, and that transition loads the new %esp from tss.esp0 (privilege transition from ring 3 to ring 0.)

If you're already in ring 0, because you have kernel thread code running, then no privilege transition occurs, and %esp is just used as is and the interrupt context is formed on the existing kernel stack.

Once tss.esp0 is set for a thread, it doesn't need to change until you switch to another thread, if it has its own kernel thread. Part of the task switch on x86 is to update tss.esp0 to point to the top of the desired kernel stack for the thread.

Also, I'd get out of your head that an interrupt handler is a separate thread. Sure, on x86, you can have task gates in the IDT, so that an interrupt does trigger a hardware task switch, but it's a non-portable construct, too complex and too slow. So it's best to think of an interrupt handler having an undefined context, it operates within the context of the thread that was interrupted. As such, it can rely on little other than static or global context.

neon · Post by **neon** » Wed Jun 16, 2021 1:23 pm

Hi,

How can a thread call switch_to_task? Isn't that what happens in coop multitasking?

Yep. Except with preemptive multitasking, the thread is told when its time is up. The interrupt interrupts the current thread so it continues execution in the interrupt handler. The only time a new thread will execute is when a new thread context is pushed on the stack for IRET. Each thread has its own user space and kernel space stack.

Lagor · Post by **Lagor** » Wed Jun 16, 2021 4:05 pm

It took me a few answers from you kind people but I feel I'm getting it now.

Two more questions, is it mandatory to setup a kernel stack for each user process? I think I've heard the some os don't do this but I might be wrong.
If I choose (or have) to have a kernel stack for every user process, does this mean that I need to manage a ESP0 value for each process it its TCB and then update accordingly the value in the only TSS that my system has? If this is the case, I have a few issues picturing the situation in my mind but I'll deal with that later...

Cheers,
F

neon · Post by **neon** » Wed Jun 16, 2021 4:19 pm

Hi,

Stacks are not per process they are per thread. Each thread must have its own kernel stack.

Lagor · Post by **Lagor** » Wed Jun 16, 2021 4:26 pm

Yes yes, when I said process I think i meant task or, in general, any thread of execution that has its own context.
So how would that work exactly? I have one TSS (only 1 cpu) that holds one value of esp0 that gets loaded into esp every time an int gets called. Who's the owner of that stack? And how and when (and by whom) it gets updated/restored?
If you have any link that could explain this in detail, I'd appreciate it very much.
Thank you.

Cheers,
F

OSDev.org

Switching to Ring 0 task with iret

Switching to Ring 0 task with iret

Re: Switching to Ring 0 task with iret

Re: Switching to Ring 0 task with iret

Re: Switching to Ring 0 task with iret

Re: Switching to Ring 0 task with iret

Re: Switching to Ring 0 task with iret

Re: Switching to Ring 0 task with iret

Re: Switching to Ring 0 task with iret

Re: Switching to Ring 0 task with iret

Re: Switching to Ring 0 task with iret

Re: Switching to Ring 0 task with iret

Re: Switching to Ring 0 task with iret

Re: Switching to Ring 0 task with iret

Re: Switching to Ring 0 task with iret

Re: Switching to Ring 0 task with iret