FPU initialization

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
User avatar
~
Member
Member
Posts: 1228
Joined: Tue Mar 06, 2007 11:17 am
Libera.chat IRC: ArcheFire

FPU initialization

Post by ~ »

I was trying to find a way to reliably initialize the FPU, and so far, for a safe start the only way to do it without risking unhandled exceptions seems to be to rely only on full legacy x87 instructions.

It seems that setting CR4 "properly" works only for "Pentium 3+". I have a Pentium I @200MHz, and executing this line makes the machine reset:

Code: Select all

mov eax,cr4
or eax,0x200
mov cr4,eax
Also, if that makes it reset, things like FXSAVE and FXRSTOR will also do that. I understand that it has something to do with support for SSE instructions and that Pentium I processor type (given SSE instructions are supposed to be supported from Pentium 3, not before).

Still being so, I think that the code in the wiki should look like this, because right now it is initializing the CPU relying on the ability to set CR4 and the presence of an FPU, or bypass it altogether as if there wasn't an FPU (very bad for machines lower than Pentium 3, which to me seem to be somehow current and still useful/used widely enough as to check this situation):

http://wiki.osdev.org/Fpu

Code: Select all

 void setup_x87_fpu(const uint16_t cw)
 {
    size_t cr4; // backup of CR4
 
    if(cpuid_features.FPU) // checks for the FPU flag
    {
        if(cr4_osfxsr_supported) // checks if CR4 is suitable for this change
        {
              // place CR4 into our variable
              __asm__ __volatile__("mov %%cr4, %0;" : "=r" (cr4));
 
              // set the OSFXSR bit
              cr4 |= 0x200;

              // reload CR4
              __asm__ __volatile__("mov %0, %%cr4;" : : "r"(cr4));
        }



        // INIT the FPU (FINIT)
        __asm__ __volatile__("finit;");
 
        // FLDCW = Load FPU Control Word
        asm volatile("fldcw %0;    "
                     ::"m"(cw));     // sets the FPU control word to "cw"
     }
 }
What do you think? Is all of this approach acceptable for a safe, legacy-compatible start and then, once it works, do more checks to support more advanced FPU work?
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: FPU initialization

Post by Brendan »

Hi,
That entire wiki page is incredibly dodgy.

SSE, the OSFXSR flag in CR4 and the OSXMMEXCPT flag in CR4 all have nothing to do with the FPU or FPU initialisation. You should only touch the OSFXSR flag in CR4 or the OSXMMEXCPT flag in CR4 if CPUID says that the CPU supports FXSAVE/FXRSTOR and SSE (which is are completely different CPUID feature flags that have nothing to do with the "CPU has a built-in FPU" feature flag).

The first step of initialising the FPU is to detect if an FPU is present, and what type it is. If the CPU supports CPUID and the "CPU has a built-in FPU" feature flag is set, then you know there's an FPU built into the CPU. Otherwise, you need to detect if an FPU is present (or not) with something like this (based on example code from Intel's "Processor Identification" application note):

Code: Select all

    mov eax,cr0                    ;eax = CR0
    and al,~6                      ;Clear the EM and MP flags (just in case)
    mov cr0,eax                    ;Set CR0
    fninit                         ;Reset FPU status word
    mov [temp], 0x5A5A             ;Make sure temporary word is non-zero
    fnstsw [temp]                  ;Save the FPU status word in the temporary word
    cmp word [temp],0              ;Was the correct status written to the temporary word?
    jne .noFPU                     ; no, no FPU present
    fnstcw [temp]                  ;Save the FPU control word in the temporary word
    mov ax, [temp]                 ;ax = saved FPU control word
    and ax,0x103F                  ;ax = bits to examine
    cmp ax,0x003F                  ;Are the bits to examine correct?
    jne .noFPU                     ; no, no FPU present
If the CPU is an 80486 or later, and if an FPU is present, then the FPU is built into the CPU. Otherwise it's 80386 or older CPU. I assume nobody cares about 80286 or older so I won't go into those cases. For an 80386 the FPU can be an 80387 or an 80287, and if you care you can check if the FPU knows the difference between positive infinity and negative infinity (80387 does know the difference, and for 80287 "+infinity == -infinity").

Now you should know if an FPU is present, and if it's built into the CPU or not, and if it's 80287 or 80387 or later.

If there is no FPU, then set the EM flag in CR0 and clear the MP flag in CR0. Otherwise, clear the EM flag in CR0 and set the MP flag in CR0; and if the FPU is built into the CPU then set the NE flag in CR0 so that FPU errors are reported as an exception (and so that FPU errors aren't reported using the IRQ13 via. the PIC, which is slower and can cause problems with race conditions, etc). In practice, it's easier to refuse to boot if the CPU is 80386 (or older) and always use the "native FPU exception" mechanism (and skip the "is it 80387 or 80287" check too).

Next, to initialise the FPU use the "FNINIT" instruction (not the "FINIT" instruction). Don't bother loading the FPU control word (with "FLDCW") because "FNINIT" sets the FPU control word to sane defaults anyway.

That's it... :)


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
~
Member
Member
Posts: 1228
Joined: Tue Mar 06, 2007 11:17 am
Libera.chat IRC: ArcheFire

Re: FPU initialization

Post by ~ »

I think I grasped the ideas of what you mentioned. However, it seems to me that the code should test the Status Word for the value 0x37F and the Control Word for the value 0, as stated in the Intel Pentium 4 Reference Volume 2, page 280 -- or "3-242" --, document 24547112.pdf (June 2005).

At least that code didn't work for me and I tried to make a similar one.

I tried to make use of the Bios Data Area, bit 1 of byte (officially a word) at 410h, which if set to 1 indicates FPU presence. It has worked correctly in all of the machines I have whether there's FPU or not.

What do you think? This is the layout of what I did to detect the FPU, but not to determine yet if it's built-in or if it's 387 or 287:

Code: Select all

org 100h
bits 16


START:
 $AX=0;
 $BX=$DS;
 $DS=$AX;

 asm{
  ;See if bit 1 of word at 410h of
  ;the BIOS Data Area is set to 1,
  ;indicating the presence of an FPU
  ;(It Worked in my 7 machines, 386SX
  ;with no FPU, Pentiums 1, 3 and Athlon,
  ;AMD64, and Intel Dual):
  ;;
    test byte[410h],00000010b
     mov ds,bx
      jz ._BDA_no_FPU_flag;
 }

 //INIT: Start determining FPU presence
 //INIT: Start determining FPU presence
 //INIT: Start determining FPU presence

    //Take CR0 and try to clear
    //the EM and MP flags:
    ///
     $EAX  = $CR0;
      $AL &= 11111001b;
     $CR0  = $EAX;


     //Set our test value to a non-zero
     //value:
     ///
      word[tmpfpucw]=0xDEAD;

     //Set other variables to default
     //values:
     ///
      word[tmpfpusw]=0xDEAD;
      byte[fpu_present]=0;



     //Reset the FPU:
     ///
      asm fninit;


     //Fetch the control word after resetting
     //the FPU to defaults:
     ///
      asm fnstcw [tmpfpucw];

     //If the control word we fetched
     //is the default Intel-defined value of
     //0x37F, it means that there's
     //an FPU present. Otherwise, if it's not so,
     //assume there's no (suitable) FPU:
     ///
      if(word[tmpfpucw]==0x37F)
      {
         //Fetch the status word:
         ///
          asm fnstsw [tmpfpusw];
          $AX=[tmpfpusw];

         //Be careful as a caution and clear
         //the BUSY bit of the status word;
         //all other bits should be 0 or we won't
         //account it as a good FPU:
         ///
            $AX &= 0111111111111111b


         //See if the status bits are all 0.
         //After reset with FNINIT,
         //they all should be, or we will consider
         //it as an error and as a non-detected FPU.
          if($AX==0)
          {
            byte[fpu_present]++;


           //Set CR0 with proper values for a present FPU:
           ///
            $EAX  = $CR0;
             $AL &= 11111011b;  //Clear EM flag
             $AL |= 00000010b;  //Set MP flag
            $CR0  = $EAX;



            //Ring the bell to indicate FPU found.
            //Also show message in DOS:
            //(REMOVE for production
            //32-bit or 64-bit code)
            ///
              $AX=0xE07;
              asm int 10h;

              $DX=fpu_found;
              $AH=9;
              asm int 21h;             
          }
      }


  ._BDA_no_FPU_flag:
  if(byte[fpu_present]==0)
  {
    //If no FPU detected, prepare CR0 for emulation:
    ///
     $EAX  = $CR0;
     $AL |= 00000100b;  //Set EM flag
     $AL &= 11111101b;  //Clear MP flag
     $CR0  = $EAX;
  }

 //END:  Start determining FPU presence
 //END:  Start determining FPU presence
 //END:  Start determining FPU presence







  $AH=4Ch;
  asm int 21h;





tmpfpucw dw 0xDEAD
tmpfpusw dw 0xDEAD

fpu_present db 0

fpu_found db "FPU Found!",0x0D,0x0A,'$'
Last edited by ~ on Fri Apr 09, 2010 11:43 am, edited 1 time in total.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: FPU initialization

Post by Brendan »

Hi,
~ wrote:I think I grasped the ideas of what you mentioned. However, it seems to me that the code should test the Status Word for the value 0x37F and the Control Word for the value 0, as stated in the Intel Pentium 4 Reference Volume 2, page 280 -- or "3-242" --, document 24547112.pdf (June 2005).
My copy of the Intel manuals say that after FNINIT the status word is set to 0 and the control word is set to 0x037F (not the other way around).

I don't know why the code I posted didn't work for you. It's the same as Intel's code, and almost the same as the code I've been using without any problem for years. I'd be tempted to wonder if the problem was a bug in your implementation rather than my code (e.g. accidentally getting the Status Word and the Control Word around the wrong way).

Intel's example code says "AND with 0x103F" then "compare with 0x003F". If you ignore the AND then you would need to compare it with 0x073F, but then you'd need to worry about why Intel said to do the AND in the first place. If you take a look at the FPU control field bits, the AND ignores all the "reserved" bits and also ignore the rounding control and rounding precision. I'd ignore all the reserved bits (there's no easy way to predict what value they'd be in non-Intel CPUs/FPUs and no way to determine what future Intel CPUs might do with them).

Also note that according to Intel the busy flag (bit 15) of the status word is "for 8087 compatibility only. It reflects the contents of the ES (Error Summary Status) flag (bit 7)". It doesn't make sense to mask the busy flag without also masking the ES (Error Summary Status) flag (and doesn't really make sense to mask either of these flags).

For the flag in the BDA, there should be no need to test it, and I normally refuse to rely on the (potentially buggy) BIOS when it can be avoided (partly because I can't test every BIOS ever written, and partly because I want to be able to boot without problems on EFI, coreboot, OpenFirmware, etc one day).

You also don't seem to be setting the "Native FPU Exceptions" flag in CR0, although that could just be an oversight. For multi-CPU you must set it (if several CPUs try to generate an IRQ13 at the same time then you'll only get oe IRQ13 and lose the FPU errors from other CPUs) and for single-CPU it's much better.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
~
Member
Member
Posts: 1228
Joined: Tue Mar 06, 2007 11:17 am
Libera.chat IRC: ArcheFire

Re: FPU initialization

Post by ~ »

Brendan wrote:Hi,

My copy of the Intel manuals say that after FNINIT the status word is set to 0 and the control word is set to 0x037F (not the other way around).

Yes, I actually confused both concepts. The previous code was doing it in the right order.

Now I made another layout following the Intel recommendations. It seems to work, but I don't have external FPU's so I couldn't really check for the differences between built-in/external or 287/387.

In any case, it seems to be OK so far (we are talking here about 386SX+ only, though), since it properly seems to check for presence, built-in status and type (it's easier to read the detection order of logic in the "main" block):

Code: Select all

bits 16
org 100h

START:


///INIT: main()
///INIT: main()
///INIT: main()
///INIT: main()

  //Try to set sane 16-bit stack pointers:
  ///
   $EBP &= 0xFFFF;
   $ESP &= 0xFFFF;


  //Reset variables:
  ///
   byte[fpu_found]=0;
   byte[fpu_builtin]=0;
   byte[fputype]=0;



  //Try to find the FPU:
  ///
   cdecl x87_fpu_find();
   if($AL==1)  //FPU found?
   {
    byte[fpu_found]=1;

    //Set CR0 with proper values for a present FPU:
    ///
      $EAX  = $CR0;
       $AL &= 11111011b;  //Clear EM flag
       $AL |= 00000010b;  //Set MP flag
      $CR0  = $EAX;



     //Try to see if CPUID is supported:
     ///
       check_cpuid_support();
       if($AL==1) //CPUID supported?
       {
          //Try to see if the FPU is built-in
          //via CPUID instruction:
          ///
            cpuid_says_fpu_builtin();
            if($AL==1) //FPU built-in?
            {
              byte[fpu_builtin]=1;

              $EAX  = $CR0;
               $AL |= 00100000b;  //Set NE flag
              $CR0  = $EAX;
            }
       }


    //See if the FPU is 287 or 387:
    ///
     cdecl fpu_type();
     byte[fputype]=$AL;
   }
    else  //FPU not found?
    {
      //If no FPU detected, prepare CR0 for emulation:
      ///
       $EAX  = $CR0;
        $AL |= 00000100b;  //Set EM flag
        $AL &= 11111101b;  //Clear MP flag
       $CR0  = $EAX;
    }



  //Show results:
  ///
    if(byte[fpu_found]==1) //FPU found?
    {
       $DX=fpustr;
       $AH=9;
       asm int 21h;

          if(byte[fpu_builtin]==1) //FPU built-in?
          {
             $DX=fpubstr;
             $AH=9;
             asm int 21h;
          }
           else  //FPU not built-in?
           {
             $DX=fpuxstr;
             $AH=9;
             asm int 21h;
           }


       if(byte[fputype]==2) //80287 FPU?
       {
        $DX=fpu287;
        $AH=9;
        asm int 21h;
       }
        else if(byte[fputype]==3) //80387 FPU?
        {
         $DX=fpu387;
         $AH=9;
         asm int 21h;
        }
    }



  //End program:
  ///
    $AH=4Ch;
    asm int 21h;




fpu_found   db 0
fpu_builtin db 0
fputype     db 0


fpustr  db "FPU found!",0x0D,0x0A,'$'
fpubstr db "FPU builtin!",0x0D,0x0A,'$'
fpuxstr db "FPU is external",0x0D,0x0A,'$'


fpu287 db "FPU is 80287",0x0D,0x0A,'$'
fpu387 db "FPU is 80387+",0x0D,0x0A,'$'

///END:  main()
///END:  main()
///END:  main()
///END:  main()


//;//;//;//;//;//;//;//;//;//;//;//;//;//;//;//;//;
//;//;//;//;//;//;//;//;//;//;//;//;//;//;//;//;//;
//;//;//;//;//;//;//;//;//;//;//;//;//;//;//;//;//;
//;//;//;//;//;//;//;//;//;//;//;//;//;//;//;//;//;
//;//;//;//;//;//;//;//;//;//;//;//;//;//;//;//;//;


//Try to find the x87 FPU; the result is returned
//in AL:
//
//        AL == 1 -- found
//        AL == 0 -- not found
//
// All other registers are destroyed.
///
cdecl function x87_fpu_find()
{
 //Declare local variables:
 ///
  stackwideword tmpfpucw;
  stackwideword tmpfpusw;


 //OS specific:
 //Don't allow interrupts; otherwise we would constantly
 //have to reload segment registers with proper values
 //everywhere on IRQs, etc., so consider it critical enough
 //as to not to allow interrupts:
 ///
  asm pushf
  asm cli;


 //Point DS to SS for this routine:
 ///
  push $DS;
  push $SS;
   pop $DS;



  //Take CR0 and try to clear
  //the EM and MP flags:
  ///
   $EAX  = $CR0;
    $AL &= 11111001b;
   $CR0  = $EAX;


   //Set our test value to a non-zero
   //value:
   ///
    stackword[tmpfpucw]=0x5A5A;

   //Set other variables to default
   //values:
   ///
    stackword[tmpfpusw]=0x5A5A;



   //Reset the FPU:
   ///
    asm fninit;


   //Fetch the control word after resetting
   //the FPU to defaults:
   ///
    $EBX=stackword tmpfpucw;   //point to stack variable
    asm fnstcw [ebx];

   //Fetch the status word as well:
   ///
    $EBX=stackword tmpfpusw;   //point to stack variable
    asm fnstsw [ebx];


   //Apply the Intel-recommended AND mask
   //to the Control Word:
   ///
    stackword[tmpfpucw] &= 0x103F;


   //If the FPU status word is 0 and if the
   //ANDed Control Word is 0x3F, then everything
   ///is OK:
   ///
    if(stackword[tmpfpusw]==0 && stackword[tmpfpucw]==0x3F)
    {
      $EAX=1;  //Indicate that we found the FPU
    }
     else $EAX=0; //otherwise, indicate no FPU found


 //Restore segment values:
 ///
  pop $DS;


 //Try to allow interrupts again, if they were enabled
 //in the first place:
 ///
  asm popf;
}










//This will return the status of
//support for CPUID in AL, by trying
//to alter/invert/flip the ID bit in CR0:
//
// AL == 1 -- supported
// AL == 0 -- not supported
///
function check_cpuid_support()
{
 push $EBX;
 push $ECX;
 push $EDX;


  // EAX == 32-bit EFLAGS
  // EBX == backup of EAX
  // ECX == mask for ID bit
  // EDX == inverted mask for ID bit
  ///
   asm pushfd;
   pop $EAX;
   $EBX=$EAX;
   $ECX=00000000001000000000000000000000b;
   $EDX=11111111110111111111111111111111b;


  //Check the ID bit:
  ///
   $EAX &= $ECX;


  if($EAX==0) //Is ID bit set to 0?
  {
      $EAX=$EBX;   //Get full EFLAGS again
      $EAX|=$ECX;  //Set the ID bit

        push $EAX;   //Put this value in stack
        asm popfd;   //Place in EFLAGS

          asm pushfd;  //EFLAGS to stack again
          pop $EAX;    //Put EFLAGS in EAX again

      $EAX &= $ECX; //Check ID bit again


     //If ID was modified, then there's
     //support for CPUID...
     ///
      if($EAX!=0)
      {
       $EAX >>= 21;
      }
  }
   else //Is ID bit set to 1?
   {
      $EAX=$EBX;   //Get full EFLAGS again
      $EAX&=$EDX;  //Clear the ID bit

        push $EAX;   //Put this value in stack
        asm popfd;   //Place in EFLAGS

          asm pushfd;  //EFLAGS to stack again
          pop $EAX;    //Put EFLAGS in EAX again

      $EAX &= $ECX; //Check ID bit again

     //If ID was modified, then there's
     //support for EFLAGS...
     ///
      if($EAX==0)
      {
       $AX++;
      }
       else
       {
        $EAX=0;
       }
   }


 pop $EDX;
 pop $ECX;
 pop $EBX;
}












//Indicates in AL if the x87 FPU is built-in
//in the main CPU:
//
//          AL == 1 -- built-in
//          AL == 0 -- external
//
//All other registers are potentially, destroyed,
//specially EBX, ECX and EDX.
///
function cpuid_says_fpu_builtin()
{

  $EAX=0;     //Basic CPUID function
  asm cpuid;  //Execute this


  //See if the maximum function for CPUID
  //is at least 1; otherwise, we won't be able
  //to determine if the FPU is builtin and
  //consider it non-builtin.
  ///
   if($EAX>=1)
   {
      $EAX=1;     //Second CPUID function
      asm cpuid;  //Execute this

      //The feature information found in bit 0 of
      //EDX tells if the FPU is built-in, if it's
      //set to 1:
       $EDX &= 1;

       if($DL==1)
       {
        $EAX=$EDX; //Indicate that the FPU is built-in
       }
        else
        {
         $EAX^=$EAX; //Set AL to 0 to indicate non-builtin FPU
        }
   }
    else
    {
     $EAX^=$EAX; //Set AL to 0 to indicate non-builtin FPU
    }

}










//Returns the value in AL
//
//          AL == 3 -- 387+
//          AL == 2 -- 287-
///
cdecl function fpu_type()
{
 //Declare a local stack variable:
 ///
  stackwideword fpu_status;


 //Disable interrupts for more safety:
 ///
  asm pushf
  asm cli

  push $EDX;


 //Make DS==SS for our routine
  push $DS;
  push $SS;
   pop $DS;

 $EBX=stackword fpu_status; //point to stack variable

 $EDX=2; //Default: indicate 80287- FPU

 asm{
  ;See if the FPU can differentiate between
  ;-infinity and +infinity. If not, it must be 287-;
  ;if yes, it must be 387+
  ;;
  ;;;
    fld1         ;Load number +1.0
    fldz         ;Load number +0.0
    fdiv st1,st0 ;Divide 1/0 to get positive infinity

    fld st0      ;Copy the result at the top of stack in ST(0)...
    fchs         ;...and change the sign of value in ST(0) for negative infinity

    fcompp       ;Compare +infinity with -infinity
    fstsw [ebx]  ;Store the status word
    mov ax,[ebx] ;fpu_status contents in AX

    sahf         ;Put flags in AH into SF, ZF, AF, PF and CF flags of EFLAGS
                 ;The C3 status flag of the FPU
                 ;corresponds to the ZF flag of
                 ;EFLAGS, so if it's a 287, C3 thinks
                 ;that +infinity == -infinity
                 ;and C3 (which is located at the
                 ;corresponding location of ZF in EFLAGS
                 ;will set ZF)
                 ;
                 ;Otherwise, if it's a 387 FPU, C3
                 ;will be 0 as a result of FCOMPP indicating
                 ;+infinity > -infinity or in short
                 ;+infinity != -infinity, and ZF will
                 ;be cleared to 0, which will cause
                 ;to consider it to be a 387+


    jz ._z287    ;If it's "the same", there's no differentiation so must be 287
     inc edx     ;Indicate 387+
    ._z287:
 }


 $EAX=$EDX;   //Save the result to return

 pop $DS;
 pop $EDX;
 asm popf  ;try to re-enable interrupts (if they were)
}

It tries to detect FPU presence, then sees if CPUID is supported. If so, sees if FPU is built-in (if CPUID is not supported we'll assume the FPU is not built-in --maybe that's an error for 486 with builtin FPU but no CPUID support but I don't know how to workaround that yet--).

Then it sees how it treats infinity. If it doesn't differentiate +infinity and -infinity then it's a 287; otherwise it's a 387+.


Is there still something wrong here or left to do for FPU initialization?
Attachments
fpu.zip
.ASM sources
(5.66 KiB) Downloaded 118 times
User avatar
54616E6E6572
Member
Member
Posts: 47
Joined: Tue Aug 18, 2009 12:52 pm
Location: Kansas City

Re: FPU initialization

Post by 54616E6E6572 »

I suggest you go read Chapter 19: Architecture Compatibility of the Intel® 64 and IA-32 Architectures Software Developer’s Manual -- Volume 3A: System Programming Guide, Part 1.

Specifically look at Section 18: x87 FPU and Section 20: FPU and Math Coprocessor Initilization. They describe the differences between the 8087/287/387/487 and the Pentium+ Integrated chip, how to detect which chip is being used, and how to initialize each chip.

I also suggest you look at Section 16: New Flags in the EFLAGS Register and Section 17: Stack Operations. As they describe the differences between the 8086, 80286, 80386, 80486, and the Pentium processor and how to detect which processor is being used.

Then you should probably skim through the chapter just to learn a few interesting things about the various x86 chips, how to detect which is being used and how to initialize them.

Enjoy!
The 2nd Doctor: "I have no doubt that you could augment an earwig to the point where it could understand nuclear physics, but it would still be a very stupid thing to do!"
Post Reply