Math coprocessor

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
Candamir

Math coprocessor

Post by Candamir »

I have googled a bit about the math coprocessor of Intel (x87), but I still have some doubts. Is this coprocessor automatically called by the CPU when requested to perform a floating point operation or must I write a "math server" for my microkernel in order to use this device?

Candamir
durand
Member
Member
Posts: 193
Joined: Wed Dec 21, 2005 12:00 am
Location: South Africa
Contact:

Re:Math coprocessor

Post by durand »

The math coprocessor is automatically called BUT, just like the CPU, it only has one set of working registers and it's state needs to be initialized, saved and restored for each thread which will be using it.

So, for each thread, you need to provide a 512 byte memory location into and out of which you can FXSAVE and FXRSTOR (fast versions of the FPU save and restore operands. These need to be enabled in CR4). The FPU needs to be initialized before each thread uses it using the FINIT. Initialization only needs to occur at the launch of the thread. Thereafter, you save and restore the FPU state from the buffer as you context switch in and out of the thread execution.

You can also set some bits in CR0 to generate a device-not-found exception whenever a thread attempts to use the FPU. (bits 1 and 5). Using this exception, you can prevent having to finit, fsave, frstor until a thread actually accesses it. When it does access the FPU and cause an exception, you can FINIT if it's the first time, and then start the FPU state saving/restoring for that particular thread. In this way, you prevent the overhead of providing a feature which only a few threads will use and you still only incur a slight penalty when it initially uses an FPU operand.

The FPU FXSAVE, FXRSTOR operands also save the SSE, SSE2, MMX and 3DNow! registers into this byte buffer. You'll be providing access to all these CPU features if you implement FPU support.
User avatar
kataklinger
Member
Member
Posts: 381
Joined: Fri Nov 04, 2005 12:00 am
Location: Serbia

Re:Math coprocessor

Post by kataklinger »

durand wrote: You can also set some bits in CR0 to generate a device-not-found exception whenever a thread attempts to use the FPU. (bits 1 and 5). Using this exception, you can prevent having to finit, fsave, frstor until a thread actually accesses it. When it does access the FPU and cause an exception, you can FINIT if it's the first time, and then start the FPU state saving/restoring for that particular thread. In this way, you prevent the overhead of providing a feature which only a few threads will use and you still only incur a slight penalty when it initially uses an FPU operand.
On a SMP system, this is a little harder to do. And I still haven't got any good solutions for that problem...
durand
Member
Member
Posts: 193
Joined: Wed Dec 21, 2005 12:00 am
Location: South Africa
Contact:

Re:Math coprocessor

Post by durand »

I don't think there is a problem. A thread can only run on one CPU at a time even if there are multiple CPUs. An exception is generated on the same CPU that the offending thread is running.

It's just up the exception handler to do the FINIT, FXRSTOR and FXSAVE stuff...

What problems are you having?
FlashBurn

Re:Math coprocessor

Post by FlashBurn »

On a uni cpu system you would load and save the fpu context only if fpu instructions are used, so that it could happen that a thread uses fpu instructions -> exception -> load fpu env and then it uses some instructions, now other threads have been scheduled and the old thread is on the run again and it can use the actual fpu env, because no other thread has used fpu instructions. This isn?t possible in smp configurations, because you can not be sure that the thread will be on the same cpu the next time. So you have to save the fpu env every time it was used. But you only load it when fpu instructions will be used (with the same exception).

I hope my explanation was clear.
durand
Member
Member
Posts: 193
Joined: Wed Dec 21, 2005 12:00 am
Location: South Africa
Contact:

Re:Math coprocessor

Post by durand »

Ah I see. Sorry, my explanation was unclear. I never meant to imply that you shouldn't save the FPU state. I just meant you could delay initialization, saving and restoring until you were sure the thread needed it.

If a thread uses the FPU for the first time, an exception is generated and that thread is marked as an FPU-user. Then each time that thread is switched in or out of context, the FPU state is restored and saved.

You can put whatever other fancy scheme in addition to reduce the amount of work. Maybe only automatically save/restore state if the thread has caused several exceptions in a row? But the principle is the same.

I didn't mean to imply that you should leave the FPUs in unknown states without saving the registers and relying on exceptions to trigger saving. That requires threads to be bound to a CPU and an exception every time the FPU is used in a thread's context for the first time. I don't know how expensive that would be compared to just restoring for those threads which are known to be FPU users. The fxstor and fxrstr operands are designed to be fast.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re:Math coprocessor

Post by Brendan »

Hi,
durand wrote:The FPU FXSAVE, FXRSTOR operands also save the SSE, SSE2, MMX and 3DNow! registers into this byte buffer. You'll be providing access to all these CPU features if you implement FPU support.
Almost - you'd also need to provide an exception handler for SMM (and set the OSXMMEXCEPT flag in CR4 so the CPU knows this exception is handled) and also set the OSFXSR flag in CR4 so that the CPU knows the OS's context switching code uses FXSAVE/FXRSTOR.
kataklinger wrote:On a SMP system, this is a little harder to do. And I still haven't got any good solutions for that problem...
On an SMP system it is still partially possible. The idea is to always save the state during a context switch if it could have changed (so that other CPUs find the correct state in RAM), and delay loading the new state until it's necessary.

For the "device not available" exception handler, use something like this:

Code: Select all

   loadState(currentTask);
   TS_flag = 0;
And in the context switch code, do something like:

Code: Select all

// Save old task's state

    if( TS_flag == 0) {    // If FPU/MMX/SSE state could have changed..
        saveState(currentTask);
        TS_flag = 1;
    }

// Load new task's state

    currentTask = newTask;
This gives most of the benefits (especially when several tasks don't use the FPU/MMX/SSE state), while still making sure things remain consistant for other CPUs.

You could probably improve on this though - for e.g. always load a new task's FPU/MMX/SSE state during the context switch if you know that the new task always uses FPU/MMX/SSE (which would save the cost of entering/leaving the "device not available" exception handler later). I'm not sure what is best if a task occasionally uses FPU/MMX/SSE though - some sort of algorithm to predict the future would be nice... ;)


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Candamir

Re:Math coprocessor

Post by Candamir »

durand wrote: The FPU FXSAVE, FXRSTOR operands also save the SSE, SSE2, MMX and 3DNow! registers into this byte buffer. You'll be providing access to all these CPU features if you implement FPU support.
The fxsave and fxstore do apply to all these coprocessors, but is this also valid for finit? And if I set bits 5 and 1 in CR0, will that also generate exceptions for these devices?

Thank you

Candamir

BTW: When looking at the Intel docs, I also found that bits 2 and 3 had something to do with the issue... There is a table that describes different combinations of bits 3,2 and 1 but not of bit 5... Also, can the EM flag (bit 2) be used to detect a coprocessors presence? Or would that be better done with cpuid?
Post Reply