Context switching and FPU

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Context switching and FPU

Post by Brendan »

Hi,
matute81 wrote:But I have an answer, must I do FPU context switch like described in Intel software dev man?
You have about 5 choices:
  • do the "automatically saving FPU/MMX/SSE state on task switches" thing as described in Intel's manual; which does not work for multi-CPU
  • do the "automatically saving FPU/MMX/SSE state on task switches" thing as described in Intel's manual; but use IPIs to force it to work for multi-CPU
  • adapt the "automatically saving FPU/MMX/SSE state on task switches" thing described in Intel's manual, so that the state is saved during the task switch if it was used, and the state is loaded during the "device not available" exception if it's needed (not during task switches)
  • adapt the "automatically saving FPU/MMX/SSE state on task switches" thing described in Intel's manual, so that the state is always saved during the task switch if it was used, and loaded during the task switch (if the task has used the FPU/MMX/SSE previously)
  • just save the FPU/MMX/SSE state during task switches (regardless of whether it's used or not); and forget about the TS flag and "device not available" exception
For all of the choices, performance will vary. For example, for the second choice, if all tasks always use FPU/MMX/SSE then you'd have the overhead of the "device not available" exception and the overhead of the IPI/s for no reason (but if no tasks use FPU/MMX/SSE then you avoid overhead); and for the last choice, if all tasks don't use FPU/MMX/SSE then you'd have the overhead saving the FPU/MMX/SSE state for no reason (but if all tasks always use FPU/MMX/SSE then you avoid overhead).

Of course you can pick more than one of these choices. For example, you could use one method on single-CPU and a different method on multi-CPU; or maybe even dynamically switch between methods based on current load.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
gerryg400
Member
Member
Posts: 1801
Joined: Thu Mar 25, 2010 11:26 pm
Location: Melbourne, Australia

Re: Context switching and FPU

Post by gerryg400 »

t's simple. You can clean TS during task switching but it is not effective. You would use FXSAVE/FXRSTOR but only after "clts" instruction. It's seems to me that Intel manual has error in action sequence.
The Intel docs look correct to me. Where is the error ?
If a trainstation is where trains stop, what is a workstation ?
egos
Member
Member
Posts: 612
Joined: Fri Nov 16, 2007 1:59 pm

Re: Context switching and FPU

Post by egos »

I saw that the step with using "clts" instruction was specified after steps for context change. If I shall find I shall say about this.
If you have seen bad English in my words, tell me what's wrong, please.
gerryg400
Member
Member
Posts: 1801
Joined: Thu Mar 25, 2010 11:26 pm
Location: Melbourne, Australia

Re: Context switching and FPU

Post by gerryg400 »

You mean this from Intel 3A - section 13.5.1
Intel doc wrote:On a task switch, the operating system task switching code must execute the following pseudo-code to set the TS flag according to the current owner of the x87 FPU/MMX/SSE/SSE2/SSE3/SSSE3/SSE4 state. If the new task (task B in this example) is not the current owner of this state, the TS flag is set to 1; otherwise, it is set to 0.

Code: Select all

IF Task_Being_Switched_To ≠ x87FPU_MMX_XMM_MXCSR_StateOwner
   THEN
      CR0.TS ← 1;
   ELSE
      CR0.TS ← 0;
FI;
Intel doc wrote: If a new task attempts to access an x87 FPU, MMX, XMM, or MXCSR register while the TS flag is set to 1, a device-not-available exception (#NM) is generated. The device-not-available exception handler executes the following pseudo-code.

Code: Select all

FXSAVE “To x87FPU/MMX/XMM/MXCSR State Save Area for Current x87FPU_MMX_XMM_MXCSR_StateOwner”;
FXRSTOR “x87FPU/MMX/XMM/MXCSR State From Current Task’s x87FPU/MMX/XMM/MXCSR State Save Area”;
x87FPU_MMX_XMM_MXCSR_StateOwner ← Current_Task;
CR0.TS ← 0;
This looks okay to me.
If a trainstation is where trains stop, what is a workstation ?
User avatar
Owen
Member
Member
Posts: 1700
Joined: Fri Jun 13, 2008 3:21 pm
Location: Cambridge, United Kingdom
Contact:

Re: Context switching and FPU

Post by Owen »

Brendan wrote: For all of the choices, performance will vary. For example, for the second choice, if all tasks always use FPU/MMX/SSE then you'd have the overhead of the "device not available" exception and the overhead of the IPI/s for no reason (but if no tasks use FPU/MMX/SSE then you avoid overhead); and for the last choice, if all tasks don't use FPU/MMX/SSE then you'd have the overhead saving the FPU/MMX/SSE state for no reason (but if all tasks always use FPU/MMX/SSE then you avoid overhead).
If you try to "lock" tasks to a CPU, then the expense of IPIs should be significantly minimized (It should only be needed if the thread gets migrated). I think, overall, I would use an algorithm for context switches which accounts for the following:
  • Moving average of FPU use by this thread (0 = no FPU use, 1 = FPU used in every timeslice)
  • Moving average of thread FPU use on this core in average reporting period (0 = no threads use the FPU, 1 = one thread uses the FPU in reporting period, and so on)
  • Realtime nature of thread
To the following ends:
  • For realtime threads, the FPU state is always always saved immediately; this allows the thread to start executing on whichever node is most promptly available without requiring IPIs to fetch state.
  • For threads which have an average FPU usage of 0.5 or greater (or whatever is experimentally determined to be most efficient), FPU state is restored at the start of the time slice
  • For threads on cores which have an average FPU usage of 1.5 or greater (or whatever is experimentally determined to be most efficient), FPU state is saved at the end of a time slice
It would obviously be your responsibility to select the granularity of a "reporting period" for efficiency, and to figure out the best method of ascertaining whether the FPU is used for tasks for which FPU state is restored at the start of the time slice. One option would be to slowly erode the tasks average FPU usage each time a restoration occurs, causing such tasks to be occasionally probed.
matute81
Member
Member
Posts: 33
Joined: Tue Sep 28, 2010 2:47 am

Re: Context switching and FPU

Post by matute81 »

Thank you all for replies!
I understand a little bit more about FPU.
My kernel is specifically designed for a dual core CPU (INTEL), but in my system (it MUST be SIL4, for rail automation) tasks are locked to specific core and scheduled sequentially, without time slot! It will be probable that I need to use FPU only on one core and only for one task, probable but not sure... so I'd like to pick up a system that can handle FPU context switching without too much overhead.
I think that the best choice is number 4 of Brendan's list:
adapt the "automatically saving FPU/MMX/SSE state on task switches" thing described in Intel's manual, so that the state is always saved during the task switch if it was used, and loaded during the task switch (if the task has used the FPU/MMX/SSE previously)
What do you think about?
Any other suggestion for an efficient implementation?
matute81
Member
Member
Posts: 33
Joined: Tue Sep 28, 2010 2:47 am

Re: Context switching and FPU

Post by matute81 »

To close the topic:
I think that the best way for my kernel is to supply 2 API (one for save and one for restore FPU context) and tasks that use floating point will call these API in their code.

My last answer is:
in my system tasks always do all the things that they need to do, without being interrupted by kernel, so why must they save FPU context? It's enough if they initialize FPU and disable (setting EM) it at the end of the cycle.
My tests confirm this sentence, for now.
User avatar
Owen
Member
Member
Posts: 1700
Joined: Fri Jun 13, 2008 3:21 pm
Location: Cambridge, United Kingdom
Contact:

Re: Context switching and FPU

Post by Owen »

How can they call them when the kernel takes the time slice away from them? ...
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: Context switching and FPU

Post by Combuster »

Read the previous post: it doesn't. He's apparently using cooperative scheduling.
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
matute81
Member
Member
Posts: 33
Joined: Tue Sep 28, 2010 2:47 am

Re: Context switching and FPU

Post by matute81 »

Combuster wrote:Read the previous post: it doesn't. He's apparently using cooperative scheduling.
Yes! cooperative scheduling, I'm sorry but I sometimes have difficult to talk in english, I am rusted!
Post Reply