SSE / FPU instructions in scheduler code

Oxmose · Post by **Oxmose** » Thu Jul 04, 2024 1:35 pm

Hi everyone,

I recently implemented FPU / SSE context management (save / restore) using the NM exception to prevent useless save and restore.
However, in my scheduling routine, I have some FPU code, so I ends up triggering the NM during scheduling.
Thing is, I don't have nested interrupt capability in my kernel. I could implement it, but what's the point if the scheduling itself uses SIMD, I would just end-up saving / restoring the context at every task switch anyway.

I have the following solutions in my head and I was wondering is someone with more experience could help me choose:

1. Modify a bit my thread virtual CPU to use the thread's stack to save context so I can have nested interrupts.
2. Always save/restore FPU/SSE context on task switch, don't use NM exception.
3. At the moment all interrupts / exceptions generate a scheduling, should I decouple that and only schedule when the timer triggers, otherwise return from interrupt with the same thread? (here I only talk about IRQs, a thread might be asking to actually schedule itself, because it got blocked on a semaphore for instance).
4. Force the scheduler code to avoid FPU / SSE. I am doing 64 bits divisions to compute the current CPU load, maybe I could do something better here? -> That is what I went with for the moment. BUT what if an interrupt handler uses FPU / SSE code? (maybe 3. would be the second part of the solution?)

Sorry for the long text, I hope it is clear enough

Thanks!
Alexy

Octocontrabass · Post by **Octocontrabass** » Thu Jul 04, 2024 5:30 pm

Oxmose wrote: ↑Thu Jul 04, 2024 1:35 pmI recently implemented FPU / SSE context management (save / restore) using the NM exception to prevent useless save and restore.

It's not always useless. If most of your programs are using SSE registers, using #NM could take more time than saving and restoring those registers during the context switch.

Oxmose wrote: ↑Thu Jul 04, 2024 1:35 pmI am doing 64 bits divisions to compute the current CPU load, maybe I could do something better here?

Using 64-bit integer division shouldn't be a problem. In 32-bit mode, you'll need libgcc.

Oxmose wrote: ↑Thu Jul 04, 2024 1:35 pmBUT what if an interrupt handler uses FPU / SSE code?

Normally kernels are built using -mgeneral-regs-only to prevent that.

nullplan · Post by **nullplan** » Thu Jul 04, 2024 9:26 pm

Oxmose wrote: ↑Thu Jul 04, 2024 1:35 pm I recently implemented FPU / SSE context management (save / restore) using the NM exception to prevent useless save and restore.

If you plan to implement multiple CPU support later on, you should at the very least implement eager FPU saving. You can still use the NM exception restore the registers, and to set a flag that tells you to save them when the task gets switched out. Eagerly saving them means that you can more easily migrate a task to a different CPU.

You could implement a pull-architecture, where a CPU notices it is about to run out of work to do, but there is this other task laying around, runnable, and it can just pick up the task. With lazy FPU saving, that CPU would then have to send an IPI to get the other CPU to save FPU first, and the other CPU is already busy with some other tasks, so this is not helping matters!

Oxmose wrote: ↑Thu Jul 04, 2024 1:35 pm 3. At the moment all interrupts / exceptions generate a scheduling, should I decouple that and only schedule when the timer triggers, otherwise return from interrupt with the same thread? (here I only talk about IRQs, a thread might be asking to actually schedule itself, because it got blocked on a semaphore for instance).

Not necessarily "only" when the timer triggers, but also not on every IRQ. I have a flag in my task structure (easily reachable from assembler) to tell me to swap out the task. The flag gets set in the system timer, but also whenever a different task becomes runnable(yes,yes, in future with will be a higher priority task only, but at the moment I am not fiddling with priorities) . On interrupt return, I check whether that flag is set and simultaneously preemption is not disabled (there are situations where handling an interrupt is fine, but preemption is not), and then I schedule.

Oxmose wrote: ↑Thu Jul 04, 2024 1:35 pm 4. Force the scheduler code to avoid FPU / SSE. I am doing 64 bits divisions to compute the current CPU load, maybe I could do something better here? -> That is what I went with for the moment. BUT what if an interrupt handler uses FPU / SSE code? (maybe 3. would be the second part of the solution?)

Kernels have to avoid using FPU / SSE at all, or else have to save FPU on every context switch (i.e. every interrupt start and return). That's why most people compile their kernels with -mgeneral-regs-only, as Octo wrote before. That is generally a good way to do it!

Oxmose · Post by **Oxmose** » Fri Jul 05, 2024 4:22 am

Thanks to both of you for your answers! That's exactly what I was hoping for

I will use -mgeneral-regs-only from now on.
I already have multicore support, so always saving and restoring the context seems better. I will also update my scheduling control flow to prevent switching tasks for every single interrupt (that's really a flaw I intended to fix for ages but never got the time to do it).

Also I was tired and was talking about 64 bits division but it was actually a floating point division.

OSDev.org

SSE / FPU instructions in scheduler code

SSE / FPU instructions in scheduler code

Re: SSE / FPU instructions in scheduler code

Re: SSE / FPU instructions in scheduler code

Re: SSE / FPU instructions in scheduler code