Context switch location

AliceNBob · Post by **AliceNBob** » Wed Jan 09, 2019 3:00 pm

I'm at the part of implementing preemptive context switching and looking at how other hobby operating systems and the Linux kernel do it, they all have a switch_context that saves registers and changes stacks that can happen during an interrupt or if a kernel thread calls it. The downside is that kernel threads are not pre-emptible, they must yield the cpu.

I was thinking of making all context switches happen only in interrupts (ex. there would be a yield syscall that a kernel thread would use) and a context switch would involve changing stacks (to the trap frame of the new process) followed immediately by an iret. Interrupts would stay disabled during an interrupt keeping the code simple and kernel threads would be interruptible.

The only downside I see is that there can be no code in the interrupt after the context switch (because of iret) but I don't see a need for it either.

What am I missing/what's the downside of this approach?

Octocontrabass · Post by **Octocontrabass** » Thu Jan 10, 2019 3:07 am

AliceNBob wrote:I'm at the part of implementing preemptive context switching and looking at how other hobby operating systems and the Linux kernel do it, they all have a switch_context that saves registers and changes stacks that can happen during an interrupt or if a kernel thread calls it. The downside is that kernel threads are not pre-emptible, they must yield the cpu.

That's a design choice, not a limitation of the method. You can allow kernel threads to be preempted just by enabling interrupts.

Of course, you have to be careful to only enable interrupts when the kernel thread may be safely interrupted.

nullplan · Post by **nullplan** » Thu Jan 10, 2019 2:46 pm

The obvious downside is, you're adding an interrupt overhead to something that could be done with a simple function call. Why? I don't quite see the benefit of going from "switch_context()" to "int $0x21" (or whatever). Not even conceptually.

kzinti · Post by **kzinti** » Thu Jan 10, 2019 7:00 pm

nullplan wrote:you're adding an interrupt overhead to something that could be done with a simple function call.

Not all interrupts are software interrupts / system calls...

AliceNBob · Post by **AliceNBob** » Thu Jan 10, 2019 7:45 pm

Octocontrabass wrote:
AliceNBob wrote:I'm at the part of implementing preemptive context switching and looking at how other hobby operating systems and the Linux kernel do it, they all have a switch_context that saves registers and changes stacks that can happen during an interrupt or if a kernel thread calls it. The downside is that kernel threads are not pre-emptible, they must yield the cpu.
That's a design choice, not a limitation of the method. You can allow kernel threads to be preempted just by enabling interrupts.

Of course, you have to be careful to only enable interrupts when the kernel thread may be safely interrupted.

nullplan wrote: The obvious downside is, you're adding an interrupt overhead to something that could be done with a simple function call. Why? I don't quite see the benefit of going from "switch_context()" to "int $0x21" (or whatever). Not even conceptually.

Well, if thread A turns on interrupts and does a context switch right after, then thread B will have interrupts on so the flags register has to be saved/loaded as well.

It just seems redundant to save (almost the same) registers during an interrupt and in a context switch but I guess it does give more flexibility.

kzinti · Post by **kzinti** » Thu Jan 10, 2019 9:04 pm

You only need to save 4 registers on ia32 for a context switch vs 17 for an interrupt. So it's not the same thing.

Context switch:
- You are already in kernel space and switching to another thread in kernel space. So the segments registers don't change
- eip is already saved on the stack when you call your context_switch() function
- assuming you are using the Sys V ABI, you don't need to save any scratch register
- you are left with only having to save (and restore) ebp, edi, esi and ebx.
- example: https://github.com/kiznit/rainbow-os/bl ... read.S#L34

Interrupt:
- The processor itself will save (ss), esp, eflags, cs, and eip on the stack (5 regs)
- You have to save everything every other register yourself (12 regs)
- example: https://github.com/kiznit/rainbow-os/bl ... rupt.S#L54

Korona · Post by **Korona** » Fri Jan 11, 2019 1:25 am

Code: Select all

The downside is that kernel threads are not pre-emptible, they must yield the cpu.

As said before, that's not a consequence of saving state in interrupt handlers. Linux saves all state on kernel threads and is still preemptible in the kernel.

Code: Select all

You only need to save 4 registers on ia32 for a context switch vs 17 for an interrupt. So it's not the same thing.

That's true if you only ever want to restore the context by returning to the caller. For example, in this design, dispatching signals to a preempted user-space threads is not possible anymore. This is not necessarily a problem (they can still be dispatched after returning to the caller), you might want to thing about whether you want to be able to introspect (or modify) registers of preempted threads.

You might also want to take into account that not all interrupts will run from per-thread kernel stacks and you might still want to context switch from those, so it can make sense to store the preempted state in a struct (with enough space for all registers) instead of the stack. You can still avoid saving all registers when the context switch does run from a per-thread kernel stack.

Last but not least, you can think of designs that avoid per-thread kernel stacks altogether. This is something I would not recommend: It makes blocking in the kernel much harder: Basically all locations where a thread can block must be enumerated in some way because you need to be able to somehow save the state at that point. That is harder than it sounds - for example, kernels might sometimes want to block until enough memory is available to satisfy an internal allocation. That might be desirable in context where an allocation must not fail because there is no meaningful way to return failure to userspace, e.g., in some asynchronous driver operation.

AliceNBob · Post by **AliceNBob** » Fri Jan 11, 2019 11:14 am

Korona wrote: You might also want to take into account that not all interrupts will run from per-thread kernel stacks and you might still want to context switch from those, so it can make sense to store the preempted state in a struct (with enough space for all registers) instead of the stack. You can still avoid saving all registers when the context switch does run from a per-thread kernel stack.

When does that occur? I thought that after the kernel is set up, only user processes (with a the kernel stack set in the tss) and kernel threads are running.

Korona · Post by **Korona** » Fri Jan 11, 2019 11:31 am

It happens if you want to run some interrupts from IST (interrupt stack table) stacks. For example, NMIs and MCEs are generally run from IST stacks for several reasons (most importantly avoiding invalid stacks after syscall/before sysret, but also: reducing potential stack depth, allowing Linux-like NMI nesting, being able to output useful debugging information after kernel stack corruption). Depending on your kernel's design, you might want to run more interrupts from IST stacks, e.g. to allow smaller kernel stacks.

kzinti · Post by **kzinti** » Fri Jan 11, 2019 1:06 pm

Korona wrote:That's true if you only ever want to restore the context by returning to the caller. For example, in this design, dispatching signals to a preempted user-space threads is not possible anymore. This is not necessarily a problem (they can still be dispatched after returning to the caller), you might want to thing about whether you want to be able to introspect (or modify) registers of preempted threads.

This doesn't seem to make sense to me. The context_switch() function only has one caller: the kernel scheduler. You can simply record signals in the ThreadBlock structure of the target thread and check for them in the scheduler after context_switch() returns (or anywhere else that is convenient). Register manipulations required for signal handling would be done when transitioning back from kernel space to user space (or if you are not going back to user mode, you handle them right then).

Korona wrote:You might also want to take into account that not all interrupts will run from per-thread kernel stacks and you might still want to context switch from those, so it can make sense to store the preempted state in a struct (with enough space for all registers) instead of the stack. You can still avoid saving all registers when the context switch does run from a per-thread kernel stack.

I saw your other post about NMI / MCE... Does it even make sense to do a context switch (calling the scheduler) inside a NMI/MCE interrupt handler?

Korona wrote:Last but not least, you can think of designs that avoid per-thread kernel stacks altogether. This is something I would not recommend: It makes blocking in the kernel much harder: Basically all locations where a thread can block must be enumerated in some way because you need to be able to somehow save the state at that point. That is harder than it sounds - for example, kernels might sometimes want to block until enough memory is available to satisfy an internal allocation. That might be desirable in context where an allocation must not fail because there is no meaningful way to return failure to userspace, e.g., in some asynchronous driver operation.

I agree with this, but the OP explicitly stated that he is thinking of switching stack between processes (threads).

OSDev.org

Context switch location

Context switch location

Re: Context switch location

Re: Context switch location

Re: Context switch location

Re: Context switch location

Re: Context switch location

Re: Context switch location

Re: Context switch location

Re: Context switch location

Re: Context switch location