Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
matute81 wrote:But I have an answer, must I do FPU context switch like described in Intel software dev man?
You have about 5 choices:
do the "automatically saving FPU/MMX/SSE state on task switches" thing as described in Intel's manual; which does not work for multi-CPU
do the "automatically saving FPU/MMX/SSE state on task switches" thing as described in Intel's manual; but use IPIs to force it to work for multi-CPU
adapt the "automatically saving FPU/MMX/SSE state on task switches" thing described in Intel's manual, so that the state is saved during the task switch if it was used, and the state is loaded during the "device not available" exception if it's needed (not during task switches)
adapt the "automatically saving FPU/MMX/SSE state on task switches" thing described in Intel's manual, so that the state is always saved during the task switch if it was used, and loaded during the task switch (if the task has used the FPU/MMX/SSE previously)
just save the FPU/MMX/SSE state during task switches (regardless of whether it's used or not); and forget about the TS flag and "device not available" exception
For all of the choices, performance will vary. For example, for the second choice, if all tasks always use FPU/MMX/SSE then you'd have the overhead of the "device not available" exception and the overhead of the IPI/s for no reason (but if no tasks use FPU/MMX/SSE then you avoid overhead); and for the last choice, if all tasks don't use FPU/MMX/SSE then you'd have the overhead saving the FPU/MMX/SSE state for no reason (but if all tasks always use FPU/MMX/SSE then you avoid overhead).
Of course you can pick more than one of these choices. For example, you could use one method on single-CPU and a different method on multi-CPU; or maybe even dynamically switch between methods based on current load.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
t's simple. You can clean TS during task switching but it is not effective. You would use FXSAVE/FXRSTOR but only after "clts" instruction. It's seems to me that Intel manual has error in action sequence.
The Intel docs look correct to me. Where is the error ?
If a trainstation is where trains stop, what is a workstation ?
Intel doc wrote:On a task switch, the operating system task switching code must execute the following pseudo-code to set the TS flag according to the current owner of the x87 FPU/MMX/SSE/SSE2/SSE3/SSSE3/SSE4 state. If the new task (task B in this example) is not the current owner of this state, the TS flag is set to 1; otherwise, it is set to 0.
IF Task_Being_Switched_To ≠ x87FPU_MMX_XMM_MXCSR_StateOwner
THEN
CR0.TS ← 1;
ELSE
CR0.TS ← 0;
FI;
Intel doc wrote:
If a new task attempts to access an x87 FPU, MMX, XMM, or MXCSR register while the TS flag is set to 1, a device-not-available exception (#NM) is generated. The device-not-available exception handler executes the following pseudo-code.
FXSAVE “To x87FPU/MMX/XMM/MXCSR State Save Area for Current x87FPU_MMX_XMM_MXCSR_StateOwner”;
FXRSTOR “x87FPU/MMX/XMM/MXCSR State From Current Task’s x87FPU/MMX/XMM/MXCSR State Save Area”;
x87FPU_MMX_XMM_MXCSR_StateOwner ← Current_Task;
CR0.TS ← 0;
This looks okay to me.
If a trainstation is where trains stop, what is a workstation ?
Brendan wrote:
For all of the choices, performance will vary. For example, for the second choice, if all tasks always use FPU/MMX/SSE then you'd have the overhead of the "device not available" exception and the overhead of the IPI/s for no reason (but if no tasks use FPU/MMX/SSE then you avoid overhead); and for the last choice, if all tasks don't use FPU/MMX/SSE then you'd have the overhead saving the FPU/MMX/SSE state for no reason (but if all tasks always use FPU/MMX/SSE then you avoid overhead).
If you try to "lock" tasks to a CPU, then the expense of IPIs should be significantly minimized (It should only be needed if the thread gets migrated). I think, overall, I would use an algorithm for context switches which accounts for the following:
Moving average of FPU use by this thread (0 = no FPU use, 1 = FPU used in every timeslice)
Moving average of thread FPU use on this core in average reporting period (0 = no threads use the FPU, 1 = one thread uses the FPU in reporting period, and so on)
Realtime nature of thread
To the following ends:
For realtime threads, the FPU state is always always saved immediately; this allows the thread to start executing on whichever node is most promptly available without requiring IPIs to fetch state.
For threads which have an average FPU usage of 0.5 or greater (or whatever is experimentally determined to be most efficient), FPU state is restored at the start of the time slice
For threads on cores which have an average FPU usage of 1.5 or greater (or whatever is experimentally determined to be most efficient), FPU state is saved at the end of a time slice
It would obviously be your responsibility to select the granularity of a "reporting period" for efficiency, and to figure out the best method of ascertaining whether the FPU is used for tasks for which FPU state is restored at the start of the time slice. One option would be to slowly erode the tasks average FPU usage each time a restoration occurs, causing such tasks to be occasionally probed.
Thank you all for replies!
I understand a little bit more about FPU.
My kernel is specifically designed for a dual core CPU (INTEL), but in my system (it MUST be SIL4, for rail automation) tasks are locked to specific core and scheduled sequentially, without time slot! It will be probable that I need to use FPU only on one core and only for one task, probable but not sure... so I'd like to pick up a system that can handle FPU context switching without too much overhead.
I think that the best choice is number 4 of Brendan's list:
adapt the "automatically saving FPU/MMX/SSE state on task switches" thing described in Intel's manual, so that the state is always saved during the task switch if it was used, and loaded during the task switch (if the task has used the FPU/MMX/SSE previously)
What do you think about?
Any other suggestion for an efficient implementation?
To close the topic:
I think that the best way for my kernel is to supply 2 API (one for save and one for restore FPU context) and tasks that use floating point will call these API in their code.
My last answer is:
in my system tasks always do all the things that they need to do, without being interrupted by kernel, so why must they save FPU context? It's enough if they initialize FPU and disable (setting EM) it at the end of the cycle.
My tests confirm this sentence, for now.
Read the previous post: it doesn't. He's apparently using cooperative scheduling.
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]