Task switching difficulties between user and kernel

iman · Post by **iman** » Tue Sep 08, 2020 2:04 am

Hi.

I'm experimenting with user mode to kernel mode and vice versa switches. There are some pitfalls that I would like to ask and hopefully get some answers.
The processor is in protected mode.

1. It shows setting a valid TSS is necessary for the switch. If I simply inactive the TSS, any trial to change from user mode to kernel mode give me a #GP.
So literally, you would need such a TSS, in first place, to be able to switch between different privilege level rings. On the other hand, there are many explanations that a lot of modern operating systems don't bother using hardware context switches due to the performance not being even comparable with that of software context switch.
If you would need to setup a TSS and simply neglect many fields in it and only taking care of ESP0, SS0, and EIP, then you are touching the hardware context switch, aren't you?

2. It is advised to have user mode and kernel mode stacks per each task. When you switch from user mode to kernel mode, you only switch from user mode task stack to the kernel mode task stack. If an interrupt or an exception happens, then you go from kernel mode task stack to your main kernel stack. Is it the way it works?
What if I simply would not have a kernel mode task stack? Once an interrupt preempted the kernel handling my task, both IRQ handler and task handler share the same stack, therefore, everything would be safe, true?

Best regards.
Iman.

Octocontrabass · Post by **Octocontrabass** » Tue Sep 08, 2020 2:43 am

iman wrote:If you would need to setup a TSS and simply neglect many fields in it and only taking care of ESP0, SS0, and EIP, then you are touching the hardware context switch, aren't you?

It's not a hardware task switch if you're not using a task gate.

The Intel manual, volume 3A section 7.3, lists the four situations where a hardware task switch occurs.

iman wrote:If an interrupt or an exception happens, then you go from kernel mode task stack to your main kernel stack. Is it the way it works?

That depends on how you set up the IDT for your interrupt handlers. If you use interrupt gates, they don't switch to a different stack.

iman wrote:What if I simply would not have a kernel mode task stack?

It's possible to use only one stack in the kernel, but I think it makes task switching more difficult. (It's one stack per CPU once you support multiple CPUs.)

iman wrote:Once an interrupt preempted the kernel handling my task, both IRQ handler and task handler share the same stack, therefore, everything would be safe, true?

As long as the stack has enough space, yes.

thewrongchristian · Post by **thewrongchristian** » Tue Sep 08, 2020 2:51 am

iman wrote:Hi.

I'm experimenting with user mode to kernel mode and vice versa switches. There are some pitfalls that I would like to ask and hopefully get some answers.
The processor is in protected mode.

1. It shows setting a valid TSS is necessary for the switch. If I simply inactive the TSS, any trial to change from user mode to kernel mode give me a #GP.
So literally, you would need such a TSS, in first place, to be able to switch between different privilege level rings. On the other hand, there are many explanations that a lot of modern operating systems don't bother using hardware context switches due to the performance not being even comparable with that of software context switch.
If you would need to setup a TSS and simply neglect many fields in it and only taking care of ESP0, SS0, and EIP, then you are touching the hardware context switch, aren't you?

You just need SS0 and ESP0. EIP will come from the interrupt/trap gate.

More than likely, you'll have a single SS shared for all tasks (same as DS, ES etc.), so it'll only be ESP0 that will change when you switch tasks in software. As part of your software task switch, you'll just set the TSS.ESP0 for the current processor (each processor will need its own TSS.)

You're just using the TSS to hold the kernel stack pointer for the switch to supervisor mode. All the other aspects of the TSS, including the hardware task switching, can be ignored.

iman wrote: 2. It is advised to have user mode and kernel mode stacks per each task. When you switch from user mode to kernel mode, you only switch from user mode task stack to the kernel mode task stack. If an interrupt or an exception happens, then you go from kernel mode task stack to your main kernel stack. Is it the way it works?
What if I simply would not have a kernel mode task stack? Once an interrupt preempted the kernel handling my task, both IRQ handler and task handler share the same stack, therefore, everything would be safe, true?

If you get an interrupt while already in kernel mode, you won't switch stacks. You'll just build on the existing kernel stack, do the interrupt handling, and iret back to where you were before in kernel mode. The TSS is only consulted for a new stack for *changes* in privilege level.

Yes, you can switch stacks if you configure the interrupt as a task gate, but that would require that hardware task switching you want to avoid. This might be useful, for example, to handle faults that would relate to stack overflow events (in kernel mode, if you've overflowed the stack, how can you handle faults?)

The only other use for the TSS is the I/O port permission bitmap. I don't use this myself, but it's probably not too onerous to maintain this in software for processes that need it.

iman · Post by **iman** » Tue Sep 08, 2020 5:21 am

Octocontrabass wrote:The Intel manual, volume 3A section 7.3, lists the four situations where a hardware task switch occurs.

Now I see the border between hardware and software task switch.

Octocontrabass wrote:That depends on how you set up the IDT for your interrupt handlers. If you use interrupt gates, they don't switch to a different stack.

They are interrupt gates.

iman · Post by **iman** » Tue Sep 08, 2020 5:31 am

thewrongchristian wrote:As part of your software task switch, you'll just set the TSS.ESP0 for the current processor (each processor will need its own TSS.)

I have, say, three AP cpus and I must set up four (BSP and APs) separate TSS. If the number of processor cores are even higher, then the same number of TSS is required.

thewrongchristian wrote:The TSS is only consulted for a new stack for *changes* in privilege level.

For curiosity: does it mean if I have a user mode task which owns the same stack as the main kernel, the cpu, by checking the TSS.ESP0, safely switches?

thewrongchristian · Post by **thewrongchristian** » Tue Sep 08, 2020 7:40 am

iman wrote:
thewrongchristian wrote:The TSS is only consulted for a new stack for *changes* in privilege level.
For curiosity: does it mean if I have a user mode task which owns the same stack as the main kernel, the cpu, by checking the TSS.ESP0, safely switches?

If you mean the kernel stack corresponding to the user mode process, what do you mean by "main kernel"? Do you envisage a seperate "main kernel" thread? Then no, they can't share stacks, especially if you have multiple CPUs which may be executing them concurrently.

If you mean can user processes somehow share a single kernel stack (per CPU), then yes. In fact, I believe some minimal microkernels do just that (you can check that in L4, for example), if all the kernel is doing is passing messages between user processes, or doing some privileged operation on behalf of a user process. Then, a context switch is just a case of pointing to the corresponding user context to restore upon exit from a syscall/interrupt, in which case you would only have to update the TSS to change the I/O permission bitmap for the next process, if required.

But it'd mean your kernel cannot sleep other than to idle waiting for the next interrupt.

iman · Post by **iman** » Tue Sep 08, 2020 8:29 am

thewrongchristian wrote:If you mean can user processes somehow share a single kernel stack (per CPU), ...

Yes it was what I had in mind as an abstract example.

sj95126 · Post by **sj95126** » Tue Sep 08, 2020 9:20 am

thewrongchristian wrote:If you get an interrupt while already in kernel mode, you won't switch stacks. You'll just build on the existing kernel stack, do the interrupt handling, and iret back to where you were before in kernel mode.

I'm rather curious exactly how the CPU handles this. The Intel programming guide only says:

"If a stack switch occurred when calling the handler procedure, the IRET instruction switches back to the interrupted procedure’s stack on the return."

However, it doesn't specify HOW it decides whether or not to restore the stack.

Given that you could have the following:
- user mode process generates a fault (stack switch to pl0 stack)
- exception handler experiences a fault (say, a page fault) (no stack switch)
- page fault handler experiences another fault (also no stack switch)

(note that this sequence is NOT a double fault)

How does the IRET know whether to restore the stack or not? I assume it performs a privilege check of the saved CS (that's probably why CS/EIP is always on top) but if you were using the same segment for user and system code, CS will be the same. Is it tracking the exception "depth" so that the "oldest" return gets a stack switch?

nullplan · Post by **nullplan** » Tue Sep 08, 2020 11:33 am

sj95126 wrote:However, it doesn't specify HOW it decides whether or not to restore the stack.

It goes by the available data. What data is available to tell it whether or not a stack switch has happened? The interrupt frame only consists of CS, EIP, and EFLAGS. Which of these could identify whether a stack switch took place?

CS of course. A stack switch only (and always) happens when escalating to a higher level of privilege, so if a stack switch happened, the CS in the interrupt frame will belong to a lower privilege level. So when IRET sees that the new CS is of lower privilege, it knows to also look for stack information.

sj95126 wrote:Given that you could have the following:
- user mode process generates a fault (stack switch to pl0 stack)
- exception handler experiences a fault (say, a page fault) (no stack switch)
- page fault handler experiences another fault (also no stack switch)

(note that this sequence is NOT a double fault)

How does the IRET know whether to restore the stack or not?

In that case, the outermost stackframe will have the user's CS and the inner ones will have the kernel's CS.

sj95126 wrote: I assume it performs a privilege check of the saved CS (that's probably why CS/EIP is always on top) but if you were using the same segment for user and system code, CS will be the same.

That is impossible. User and kernel CS must differ in the DPL so the RPL can be adjusted accordingly. If the kernel's CS is loaded, it must be loaded with RPL 0, and a successfully loaded CS RPL is the CPL. If you run user code with CPL 0, you just gave up all hardware protection. Also, if your user code does run at CPL 0, then even the outermost exception you showed here causes no stack switch.

Honestly, I don't remember how exactly DPL, RPL, and CPL work. All I do know is that you must have a user CS with a DPL of 3 and load it with an RPL of 3, in order to get both a CPL of 3 and no GPF. And since the DPL must be different, so must be the CS.

sj95126 · Post by **sj95126** » Tue Sep 08, 2020 2:46 pm

nullplan wrote:
sj95126 wrote:However, it doesn't specify HOW it decides whether or not to restore the stack.
It goes by the available data.

I figured that was the case, but you have to admit it's very unlike Intel not to specify that in nauseating detail. The 10-volume combined programmer's guide is over 5,000 pages. Sometimes they repeat the same conceptual sequence dozens of times.

Especially considering a decision based on CPL vs. RPL vs. DPL, where they usually go into an almost dizzying decision tree, I was very surprised they don't outline how this decision is made. "Isn't it obvious?" is not their usual way of things.

nullplan · Post by **nullplan** » Wed Sep 09, 2020 7:44 am

Indeed it is Intel's occasional tendency to witter on a bit that made me lose all memory of how protected mode rings work in detail. However, if you want nauseating detail, look at the description of IRET, which answers your question in the pseudo-code section: https://www.felixcloutier.com/x86/iret:iretd

So it was the RPL that makes it switch stacks!

OSDev.org

Task switching difficulties between user and kernel

Task switching difficulties between user and kernel

Re: Task switching difficulties between user and kernel

Re: Task switching difficulties between user and kernel

Re: Task switching difficulties between user and kernel

Re: Task switching difficulties between user and kernel

Re: Task switching difficulties between user and kernel

Re: Task switching difficulties between user and kernel

Re: Task switching difficulties between user and kernel

Re: Task switching difficulties between user and kernel

Re: Task switching difficulties between user and kernel

Re: Task switching difficulties between user and kernel