Hi,
thxbb12 wrote:Thanks for the link, but I'm not sure I understand how to handle it.
What steps do I need to implement in order to handle it properly?
From the Intel manuals, it's not clear at all what's actions need to be taken.
In general; when a hardware task switch occurs the CPU doesn't touch FPU state and only sets the TS flag. If the FPU is used when the TS flag is set it causes an exception.
Note: For software task switching you can do everything described below by setting the TS flag yourself during task switches.
For single-CPU; when this exception occurs you'd check if "FPUowner == currentTask", and if it's not you'd save the previous task's FPU state and load the current task's FPU state. In either case you'd clear the TS flag. The idea of this is to avoid loading/saving the FPU state when it can be avoided - e.g. if you have 5 tasks where only 2 of them use FPU, then you might do 3 task switches (between tasks that don't use FPU) and avoid loading/save the FPU state 3 times; and if only one task uses FPU then you might never load/save FPU state because that task is always the "FPU owner".
For multi-CPU this doesn't work; because a task that was running on CPU#0 last time might be run on CPU#1 next time, and that task's FPU state might still be in CPU#0 when it's needed by CPU#1. To work around that you have to do it differently. When the exception occurs you load the current task's state and set some sort of "FPU state was loaded" flag (e.g. in your thread data structure or whatever), and when you do a task switch you check that "FPU state was loaded" and save the FPU state if it was set. That means you still avoid some FPU state loading/saving; but a thread's FPU state is in RAM if/when it's needed by a different CPU.
For both of these cases (single-CPU and multi-CPU); when you create a thread you could just set a "thread has no FPU state" flag (and avoid creating an initial FPU state for the thread). Then, instead of just loading the thread's FPU state you'd check if it has an FPU state first, and if it doesn't you'd create the thread's "initial FPU state" and change the "thread has no FPU state" flag. That way if there's lots of threads that never use FPU you avoid creating unnecessary data when creating the thread (and the first time the state is needed you get to do "FNINIT" instead of loading).
Also for both cases (single-CPU and multi-CPU); the exception handler can keep track of how often a thread uses the FPU; and if you know a thread always (or nearly always) uses the FPU then you can load the thread's FPU state during task switches to avoid the overhead of an "extremely likely" exception later.
However...
If you do all of this then it gets a little complicated and the overhead of loading/saving the FPU state is fairly small anyway; so all the complications probably won't make any difference to performance; especially when a task switch involves a lot of work (calculating/updating the amount of time the thread has spent executing, loading/saving debug register state, loading/saving performance monitoring counter state, etc) that makes the FPU state saving/loading seem even more insignificant.
It's not until you include SSE and AVX state that it starts looking beneficial because there's a lot more state being saved/loaded (and the overhead you're avoiding is much higher). For example, for AVX2 there's 32 (256-bit) registers which adds up to 1 KiB of data (plus a little more for status/control) being saved/loaded. For AVX512 this doubles to 2 KiB (and there's a set of 7 "opmask registers" on top of that).
Cheers,
Brendan