Context switching and FPU

Brendan · Post by **Brendan** » Tue Sep 28, 2010 2:52 pm

Hi,

matute81 wrote:But I have an answer, must I do FPU context switch like described in Intel software dev man?

You have about 5 choices:

do the "automatically saving FPU/MMX/SSE state on task switches" thing as described in Intel's manual; which does not work for multi-CPU
do the "automatically saving FPU/MMX/SSE state on task switches" thing as described in Intel's manual; but use IPIs to force it to work for multi-CPU
adapt the "automatically saving FPU/MMX/SSE state on task switches" thing described in Intel's manual, so that the state is saved during the task switch if it was used, and the state is loaded during the "device not available" exception if it's needed (not during task switches)
adapt the "automatically saving FPU/MMX/SSE state on task switches" thing described in Intel's manual, so that the state is always saved during the task switch if it was used, and loaded during the task switch (if the task has used the FPU/MMX/SSE previously)
just save the FPU/MMX/SSE state during task switches (regardless of whether it's used or not); and forget about the TS flag and "device not available" exception

For all of the choices, performance will vary. For example, for the second choice, if all tasks always use FPU/MMX/SSE then you'd have the overhead of the "device not available" exception and the overhead of the IPI/s for no reason (but if no tasks use FPU/MMX/SSE then you avoid overhead); and for the last choice, if all tasks don't use FPU/MMX/SSE then you'd have the overhead saving the FPU/MMX/SSE state for no reason (but if all tasks always use FPU/MMX/SSE then you avoid overhead).

Of course you can pick more than one of these choices. For example, you could use one method on single-CPU and a different method on multi-CPU; or maybe even dynamically switch between methods based on current load.

Cheers,

Brendan

gerryg400 · Post by **gerryg400** » Tue Sep 28, 2010 3:06 pm

t's simple. You can clean TS during task switching but it is not effective. You would use FXSAVE/FXRSTOR but only after "clts" instruction. It's seems to me that Intel manual has error in action sequence.

The Intel docs look correct to me. Where is the error ?

egos · Post by **egos** » Tue Sep 28, 2010 4:44 pm

I saw that the step with using "clts" instruction was specified after steps for context change. If I shall find I shall say about this.

gerryg400 · Post by **gerryg400** » Tue Sep 28, 2010 5:14 pm

You mean this from Intel 3A - section 13.5.1

Intel doc wrote:On a task switch, the operating system task switching code must execute the following pseudo-code to set the TS flag according to the current owner of the x87 FPU/MMX/SSE/SSE2/SSE3/SSSE3/SSE4 state. If the new task (task B in this example) is not the current owner of this state, the TS flag is set to 1; otherwise, it is set to 0.

Code: Select all

IF Task_Being_Switched_To ≠ x87FPU_MMX_XMM_MXCSR_StateOwner
   THEN
      CR0.TS ← 1;
   ELSE
      CR0.TS ← 0;
FI;

Intel doc wrote: If a new task attempts to access an x87 FPU, MMX, XMM, or MXCSR register while the TS flag is set to 1, a device-not-available exception (#NM) is generated. The device-not-available exception handler executes the following pseudo-code.

Code: Select all

FXSAVE “To x87FPU/MMX/XMM/MXCSR State Save Area for Current x87FPU_MMX_XMM_MXCSR_StateOwner”;
FXRSTOR “x87FPU/MMX/XMM/MXCSR State From Current Task’s x87FPU/MMX/XMM/MXCSR State Save Area”;
x87FPU_MMX_XMM_MXCSR_StateOwner ← Current_Task;
CR0.TS ← 0;

This looks okay to me.

Owen · Post by **Owen** » Tue Sep 28, 2010 5:58 pm

Brendan wrote: For all of the choices, performance will vary. For example, for the second choice, if all tasks always use FPU/MMX/SSE then you'd have the overhead of the "device not available" exception and the overhead of the IPI/s for no reason (but if no tasks use FPU/MMX/SSE then you avoid overhead); and for the last choice, if all tasks don't use FPU/MMX/SSE then you'd have the overhead saving the FPU/MMX/SSE state for no reason (but if all tasks always use FPU/MMX/SSE then you avoid overhead).

If you try to "lock" tasks to a CPU, then the expense of IPIs should be significantly minimized (It should only be needed if the thread gets migrated). I think, overall, I would use an algorithm for context switches which accounts for the following:

Moving average of FPU use by this thread (0 = no FPU use, 1 = FPU used in every timeslice)
Moving average of thread FPU use on this core in average reporting period (0 = no threads use the FPU, 1 = one thread uses the FPU in reporting period, and so on)
Realtime nature of thread

To the following ends:

For realtime threads, the FPU state is always always saved immediately; this allows the thread to start executing on whichever node is most promptly available without requiring IPIs to fetch state.
For threads which have an average FPU usage of 0.5 or greater (or whatever is experimentally determined to be most efficient), FPU state is restored at the start of the time slice
For threads on cores which have an average FPU usage of 1.5 or greater (or whatever is experimentally determined to be most efficient), FPU state is saved at the end of a time slice

It would obviously be your responsibility to select the granularity of a "reporting period" for efficiency, and to figure out the best method of ascertaining whether the FPU is used for tasks for which FPU state is restored at the start of the time slice. One option would be to slowly erode the tasks average FPU usage each time a restoration occurs, causing such tasks to be occasionally probed.

matute81 · Post by **matute81** » Wed Sep 29, 2010 3:12 am

Thank you all for replies!
I understand a little bit more about FPU.
My kernel is specifically designed for a dual core CPU (INTEL), but in my system (it MUST be SIL4, for rail automation) tasks are locked to specific core and scheduled sequentially, without time slot! It will be probable that I need to use FPU only on one core and only for one task, probable but not sure... so I'd like to pick up a system that can handle FPU context switching without too much overhead.
I think that the best choice is number 4 of Brendan's list:

adapt the "automatically saving FPU/MMX/SSE state on task switches" thing described in Intel's manual, so that the state is always saved during the task switch if it was used, and loaded during the task switch (if the task has used the FPU/MMX/SSE previously)

What do you think about?
Any other suggestion for an efficient implementation?

matute81 · Post by **matute81** » Thu Sep 30, 2010 2:34 am

To close the topic:
I think that the best way for my kernel is to supply 2 API (one for save and one for restore FPU context) and tasks that use floating point will call these API in their code.

My last answer is:
in my system tasks always do all the things that they need to do, without being interrupted by kernel, so why must they save FPU context? It's enough if they initialize FPU and disable (setting EM) it at the end of the cycle.
My tests confirm this sentence, for now.

Owen · Post by **Owen** » Thu Sep 30, 2010 8:58 am

How can they call them when the kernel takes the time slice away from them? ...

Combuster · Post by **Combuster** » Thu Sep 30, 2010 9:18 am

Read the previous post: it doesn't. He's apparently using cooperative scheduling.

matute81 · Post by **matute81** » Fri Oct 01, 2010 7:28 am

Combuster wrote:Read the previous post: it doesn't. He's apparently using cooperative scheduling.

Yes! cooperative scheduling, I'm sorry but I sometimes have difficult to talk in english, I am rusted!

OSDev.org

Context switching and FPU

Re: Context switching and FPU

Re: Context switching and FPU

Re: Context switching and FPU

Re: Context switching and FPU

Re: Context switching and FPU

Re: Context switching and FPU

Re: Context switching and FPU

Re: Context switching and FPU

Re: Context switching and FPU

Re: Context switching and FPU