OSDev.org

Posted: **Sat Jan 09, 2016 9:30 am**

Hello everyone,

I wrote a mini OS that runs in protected mode and uses hardware-based task switching using TSS (user tasks run in ring 3).
It's very basic and only runs a single task at a time.
I'm now trying to use floating point code in my user apps.
I use the code below to initialize the FPU (from Brendan's post here: http://forum.osdev.org/viewtopic.php?f=1&t=21813):

Code: Select all

; return 1 if successfully initialized or 0 if no FPU is present
; int fpu_init()
fpu_init:
		mov    eax,cr0            ; eax = CR0
		and    al,~6              ; clear the EM and MP flags (just in case)
		mov    cr0,eax            ; set CR0
		fninit                    ; reset FPU status word
		mov    word [temp],0x5A5A ; make sure temporary word is non-zero
		fnstsw [temp]             ; save the FPU status word in the temp word
		cmp    word [temp],0      ; was the correct status written to the temp word?
		jne    .noFPU             ; no, no FPU present
		fnstcw [temp]             ; save the FPU control word in the temp word
		mov    ax,[temp]          ; ax = saved FPU control word
		and    ax,0x103F          ; ax = bits to examine
		cmp    ax,0x003F          ; are the bits to examine correct?
		jne    .noFPU             ; no, no FPU present
		mov    eax,1
		ret

.noFPU: mov    eax,0
		ret

temp dw 0

The FPU is initialized by the kernel early in the initialization stage.
Later, when I switch to a task that has FPU code, I get a "No math coprocessor" exception.
Does anyone have an idea why?

Thanks in advance for your help.

Posted: **Sat Jan 09, 2016 9:49 am**

Does your debugging show that the EM bit is still clear when the exception happens?

Posted: **Sat Jan 09, 2016 10:58 am**

The value of CR0 is 60000019, so yes the EM bit (bit 2) is at zero.

Posted: **Sat Jan 09, 2016 11:03 am**

You may have the CR0.TS bit set. Read about how it works in conjunction with FPU and TSS-based task switching.

Posted: **Sat Jan 09, 2016 2:20 pm**

It appears that you do have the TS bit set, which will cause an exception:

The TS bit of CR0 helps to determine when the context of the coprocessor does not match that of the task being executed by the 80386 CPU. The 80386 sets TS each time it performs a task switch (whether triggered by software or by hardware interrupt). If, when interpreting one of the ESC instructions, the CPU finds TS already set, it causes exception 7.

( https://pdos.csail.mit.edu/6.828/2007/r ... s11_01.htm )

Posted: **Sat Jan 09, 2016 3:57 pm**

Thanks for the link, but I'm not sure I understand how to handle it.
What steps do I need to implement in order to handle it properly?
From the Intel manuals, it's not clear at all what's actions need to be taken.

Posted: **Sat Jan 09, 2016 4:46 pm**

I think the idea is that a task switch means that the context of the FPU has changed. I guess this means that you need to save the floating-point register values for the old task, load the values for the current task, and clear the TS bit when the exception is triggered. I presume the reason for this is that, whereas you always save the integer registers on a task switch, you would not necessarily do so for the floating-point registers also unless the new task actually uses them; it's just a question of minimizing the work that needs to be done when switching tasks.

Posted: **Sat Jan 09, 2016 5:21 pm**

thxbb12 wrote:Thanks for the link, but I'm not sure I understand how to handle it.
What steps do I need to implement in order to handle it properly?
From the Intel manuals, it's not clear at all what's actions need to be taken.

When you get that exception you know that you need to save the FPU context to some previously run task and load the FPU context of the current task and perform CLTS to clear CR0.TS before returning from the exception handler. There's no problem with finding where to load the new FPU context from because you know the current task (and the current thread if tasks have threads). But you don't immediately know the task to which belongs the old FPU context that's still sitting in the FPU. Which one is it? Let's for a moment think not of the task that used the FPU before the current task, but of the task that will use the FPU next. From that task's point of view which task the old context belongs to? It's the task that's current now. IOW, it's the task in whose context the exception occurred last time. So, your exception handler needs to store somewhere (global kernel variable?) some identifier of the task in whose context the exception occurs. It could be the task selector or a pointer to the TSS or some other unique per-task structure that you have in the kernel. Now, what do you do with the first exception since there's no prior exception and you don't have the ID of the task that used the FPU before? Well, you could define and maintain a flag (another kernel global?) that would tell you if it's the first exception. If it is, you don't save the old context anywhere, you just load the new one atop and then invert the flag (and do CLTS). That's pretty much it.

Posted: **Sat Jan 09, 2016 11:06 pm**

Hi,

thxbb12 wrote:Thanks for the link, but I'm not sure I understand how to handle it.
What steps do I need to implement in order to handle it properly?
From the Intel manuals, it's not clear at all what's actions need to be taken.

In general; when a hardware task switch occurs the CPU doesn't touch FPU state and only sets the TS flag. If the FPU is used when the TS flag is set it causes an exception. Note: For software task switching you can do everything described below by setting the TS flag yourself during task switches.

For single-CPU; when this exception occurs you'd check if "FPUowner == currentTask", and if it's not you'd save the previous task's FPU state and load the current task's FPU state. In either case you'd clear the TS flag. The idea of this is to avoid loading/saving the FPU state when it can be avoided - e.g. if you have 5 tasks where only 2 of them use FPU, then you might do 3 task switches (between tasks that don't use FPU) and avoid loading/save the FPU state 3 times; and if only one task uses FPU then you might never load/save FPU state because that task is always the "FPU owner".

For multi-CPU this doesn't work; because a task that was running on CPU#0 last time might be run on CPU#1 next time, and that task's FPU state might still be in CPU#0 when it's needed by CPU#1. To work around that you have to do it differently. When the exception occurs you load the current task's state and set some sort of "FPU state was loaded" flag (e.g. in your thread data structure or whatever), and when you do a task switch you check that "FPU state was loaded" and save the FPU state if it was set. That means you still avoid some FPU state loading/saving; but a thread's FPU state is in RAM if/when it's needed by a different CPU.

For both of these cases (single-CPU and multi-CPU); when you create a thread you could just set a "thread has no FPU state" flag (and avoid creating an initial FPU state for the thread). Then, instead of just loading the thread's FPU state you'd check if it has an FPU state first, and if it doesn't you'd create the thread's "initial FPU state" and change the "thread has no FPU state" flag. That way if there's lots of threads that never use FPU you avoid creating unnecessary data when creating the thread (and the first time the state is needed you get to do "FNINIT" instead of loading).

Also for both cases (single-CPU and multi-CPU); the exception handler can keep track of how often a thread uses the FPU; and if you know a thread always (or nearly always) uses the FPU then you can load the thread's FPU state during task switches to avoid the overhead of an "extremely likely" exception later.

However...

If you do all of this then it gets a little complicated and the overhead of loading/saving the FPU state is fairly small anyway; so all the complications probably won't make any difference to performance; especially when a task switch involves a lot of work (calculating/updating the amount of time the thread has spent executing, loading/saving debug register state, loading/saving performance monitoring counter state, etc) that makes the FPU state saving/loading seem even more insignificant.

It's not until you include SSE and AVX state that it starts looking beneficial because there's a lot more state being saved/loaded (and the overhead you're avoiding is much higher). For example, for AVX2 there's 32 (256-bit) registers which adds up to 1 KiB of data (plus a little more for status/control) being saved/loaded. For AVX512 this doubles to 2 KiB (and there's a set of 7 "opmask registers" on top of that).

Cheers,

Brendan

Posted: **Sun Jan 10, 2016 8:58 pm**

Thank you everyone, especially Brendan for your very detailed and clear explanations.

OSDev.org

FPU code triggering a no math coprocessor exception...

FPU code triggering a no math coprocessor exception...

Re: FPU code triggering a no math coprocessor exception...

Re: FPU code triggering a no math coprocessor exception...

Re: FPU code triggering a no math coprocessor exception...

Re: FPU code triggering a no math coprocessor exception...

Re: FPU code triggering a no math coprocessor exception...

Re: FPU code triggering a no math coprocessor exception...

Re: FPU code triggering a no math coprocessor exception...

Re: FPU code triggering a no math coprocessor exception...

Re: FPU code triggering a no math coprocessor exception...