Using FPU

DavidCooper · Post by **DavidCooper** » Fri Jul 15, 2011 12:48 pm

Every time I have a go at using the FPU it fails to work. In Bochs I always get this:-

math_abort: MSDOS compatibility FPU exception

That happens even just from an attempt to load the FPU with a number. I assume I need some kind of exception handler, even though I can't see that it has anything important to do, but my attempts to create one (just pointing IDT entries at an iret) make absolutely no difference. I've spent ages trawling through useless information, but maybe I'm just using the wrong search terms. Any suggestions would be welcome.

DavidCooper · Post by **DavidCooper** » Fri Jul 15, 2011 1:11 pm

Ah - this'll be the answer, I expect:-

http://wiki.osdev.org/FPU

Don't know why my search of the WIKI didn't find it before - using more than one search term at a time must have blocked it.

Gigasoft · Post by **Gigasoft** » Fri Jul 15, 2011 1:36 pm

You have to clear CR0.EM (bit 2), and you should also set NE (bit 5) to turn off MS-DOS compatibility mode.

DavidCooper · Post by **DavidCooper** » Fri Jul 15, 2011 1:47 pm

It's working now after adding this code: F 20 D8 (CR4 into EAX) D 0 2 0 0 (set bit 9) F 22 D8 (EAX into CR4), DB E3 (fninit) DB E4 (FPU instruction to set protected mode - probably unnecessary). Hopefully I can now try to add a scientific calculator to my OS.

Gigasoft wrote:You have to clear CR0.EM (bit 2), and you should also set NE (bit 5) to turn off MS-DOS compatibility mode.

I haven't done either of those and it now works, though I've yet to see what happens when I do something more complex than just loading a number into the FPU and back to a different memory location. I'll have to look at CR0 to see what those bits are set to at the moment...

Edit: CR0 is 60000011, so bit 2 is already clear after running the above code, but bit 5 is also clear.

Edit: and it was also 60000011 before running it, so running that code didn't change it.

Edit: seems to be fine anyway - it's now squaring numbers for me.

Solar · Post by **Solar** » Fri Jul 15, 2011 2:24 pm

DavidCooper wrote:Edit: seems to be fine anyway...

Famous last words.

DavidCooper · Post by **DavidCooper** » Fri Jul 15, 2011 4:24 pm

Solar wrote:Famous last words.

Well, I've now checked to see what these bits actually do. Bit 2 simply tells you if there's an FPU or not, so there's no point in changing it, but it'll obviously be useful for finding out if a 386 or 486 has one. Bit 5 seems to be for enabling exceptions, so I can probably just leave it clear as I intend to avoid generating errors to begin with, but I'll have to try it out and see what happens if I divide by zero, after creating an interrupt routine to handle that. I think from memory that's the same interrupt as the system timer, so that could complify things a bit.

Anyway - thanks to Gigasoft for pointing me towards that.

Brendan · Post by **Brendan** » Fri Jul 15, 2011 11:37 pm

Hi,

I'm not sure if you're reading every third word of the manual and then filling in the blanks with fan fiction or something, but...

DavidCooper wrote:It's working now after adding this code: F 20 D8 (CR4 into EAX) D 0 2 0 0 (set bit 9) F 22 D8 (EAX into CR4)

Bit 9 of CR4 is only for SSE and has nothing to do with FPU.

DavidCooper wrote:DB E3 (fninit)

That's recommended (for OS initialisation code).

DavidCooper wrote:DB E4 (FPU instruction to set protected mode - probably unnecessary).

For "32-bit FPUs" (80386 and later), the FSETPM instruction is treated as a NOP - it has no effect. For 80287 it mattered (it effected loads/stores done by the FPU), and for older FPUs it wasn't supported. Your code is probably "80386 or later" anyway, so the "FSETPM" instruction is a waste of time.

DavidCooper wrote:Well, I've now checked to see what these bits actually do. Bit 2 simply tells you if there's an FPU or not, so there's no point in changing it, but it'll obviously be useful for finding out if a 386 or 486 has one.

Bit 2 doesn't tell you if there's an FPU or not. It determines how the WAIT instruction interacts with hardware task switching.

DavidCooper wrote:Bit 5 seems to be for enabling exceptions, so I can probably just leave it clear as I intend to avoid generating errors to begin with, but I'll have to try it out and see what happens if I divide by zero, after creating an interrupt routine to handle that. I think from memory that's the same interrupt as the system timer, so that could complify things a bit.

Um, no.

Originally, when there's a floating point error the FPU (in a separate chip) used IRQ13 and the PIC chip to tell the CPU that an error occurred. This was stupid (causes race conditions, etc) and fails completely in systems with multiple CPUs. For CPUs that have built-in FPUs (80486 and later) it makes far more sense for a floating point error to be treated like any other error and cause an exception instead, where IRQs and the PIC chips aren't involved at all. However, Intel couldn't suddenly change how floating point errors are handled because that would've caused backward compatibility problems for old software (e.g. DOS); so they added the NE flag in CR0. If this flag is set then floating point errors trigger an exception, and if the flag is clear then floating point errors get delivered as IRQ13 instead.

The only thing that determines which conditions cause floating point errors are the 6 flags in the FPU control word. Applications can change the FPU control word to whatever they like any time they like, so an OS/kernel can't assume anything about which FPU errors are enabled or disabled. In general (for applications), "precision", "underflow" and "denormal operand" errors can be safely disabled (unless you care a lot about the accuracy of calculations); and all other errors indicate serious problems with your software (and should therefore be enabled).

Cheers,

Brendan

DavidCooper · Post by **DavidCooper** » Sat Jul 16, 2011 3:45 pm

Hi Brendan, and thanks for your reply.

Brendan wrote:I'm not sure if you're reading every third word of the manual and then filling in the blanks with fan fiction or something, but...

DavidCooper wrote:It's working now after adding this code: F 20 D8 (CR4 into EAX) D 0 2 0 0 (set bit 9) F 22 D8 (EAX into CR4)
Bit 9 of CR4 is only for SSE and has nothing to do with FPU.

I was going by http://wiki.osdev.org/FPU, and this bit in particular:-

Setting the 9th bit (OSFXSR) in the CR4 tells the CPU that we intend on using the FXSAVE, FXRSTOR, and SSEx instructions. If this bit is not set, a #UD exception will be generated on use of the FPU or any SSE instructions.

It seems that that information is wrong (the OSDev wiki being a hotbed of fan fiction), but following it did at the time appear to make the FPU start to function, though in reality the real cure was simply the fninit instruction (DB E3), which I added at the same time after finding the correct hex values for it on a x86 instruction set website - I previously had the wrong values for those two bytes due to an error in a book borrowed from the library years ago (not sure which, but it might have been Barry B. Brey's Programming the 80286, 80386, 80486 and Pentium-Based Personal Computer, or more likely Peter Norton's Programmer's Guide to the IBM PC - the notes I made from those two books are usually the first place I look for things). I'm not even sure I actually made any change to CR4 as I can't find any information on the correct rrr value to select it, so I may have changed bit 9 in CR3 instead - no one provides comprehensive lists of machine code instructions in hex form, so I sometimes have to guess what they are and use a bit of trial and error. (It's ridiculous, but it seems that I'll have to get hold of an assembler and spend weeks trying to work out how to use it just to use it to get the hex values of a handful of rare instructions.)

For "32-bit FPUs" (80386 and later), the FSETPM instruction is treated as a NOP - it has no effect. For 80287 it mattered (it effected loads/stores done by the FPU), and for older FPUs it wasn't supported. Your code is probably "80386 or later" anyway, so the "FSETPM" instruction is a waste of time.

It's harmless to use it though. I thought it might be necessary as information I copied from one of those books years ago suggested it might matter on a 386 and perhaps on a 486, though that may well be wrong.

Bit 2 doesn't tell you if there's an FPU or not. It determines how the WAIT instruction interacts with hardware task switching.

The fan-fiction source I used for that was http://en.wikipedia.org/wiki/Control_register, the relevant part being this:-

2 EM Emulation If set, no x87 floating point unit present, if clear, x87 FPU present

I know that Wikipedia isn't entirely reliable, but if you don't google for things you tend to get attacked by people here, so there's a pressure on everyone just to experiment with unreliable data rather than daring to ask any experts here if they already know of a fully reliable source that they could maybe point to.

DavidCooper wrote:Bit 5 seems to be for enabling exceptions, so I can probably just leave it clear as I intend to avoid generating errors to begin with, but I'll have to try it out and see what happens if I divide by zero, after creating an interrupt routine to handle that. I think from memory that's the same interrupt as the system timer, so that could complify things a bit.
Um, no.

Um, indeed no. In this case I was going by the same Wikipedia page where it said:-

5 NE Numeric error Enable internal x87 floating point error reporting when set, else enables PC style x87 error detection

but I didn't understand it (as it doesn't appear to make a real distinction of any kind), so I guessed at its meaning - I was in a hurry and it wasn't immediately important - I was just happy to have got the FPU to speak to me at last and was keen to start experimenting with it. I also mis-remembered where the system timer interrupt IDT entry is too as it's 8 entries further on from the divide by zero exception, and now it isn't clear to me whether the divide by zero exception is even used by the FPU at all.

Originally, when there's a floating point error the FPU (in a separate chip) used IRQ13 and the PIC chip to tell the CPU that an error occurred. This was stupid (causes race conditions, etc) and fails completely in systems with multiple CPUs. For CPUs that have built-in FPUs (80486 and later) it makes far more sense for a floating point error to be treated like any other error and cause an exception instead, where IRQs and the PIC chips aren't involved at all. However, Intel couldn't suddenly change how floating point errors are handled because that would've caused backward compatibility problems for old software (e.g. DOS); so they added the NE flag in CR0. If this flag is set then floating point errors trigger an exception, and if the flag is clear then floating point errors get delivered as IRQ13 instead.

That makes a lot of sense. I'm still trying to see through the fog here, but it's beginning to look as if all the FPU exceptions trigger an exception using the 17th IDT entry (vector 16) if the NE flag is set. The divide by zero (vector 0) and overflow (vector 4) which I had assumed could be triggered by the FPU are, I now suspect, restricted to integer arithmetic done in the main processor, but I've never found any source of information that spells this kind of thing out (unless it's cunningly hidden deep within tons of other information that I don't need).

In general (for applications), "precision", "underflow" and "denormal operand" errors can be safely disabled (unless you care a lot about the accuracy of calculations); and all other errors indicate serious problems with your software (and should therefore be enabled).

So I really need to enable them all, but start with them disabled and enable just one at a time while I try to write code to handle it.

Anyway, with your help I think I'm beginning to get there at last, so thanks again.

Combuster · Post by **Combuster** » Sat Jul 16, 2011 4:35 pm

DavidCooper wrote:I was going by http://wiki.osdev.org/FPU, and this bit in particular:-
Setting the 9th bit (OSFXSR) in the CR4 tells the CPU that we intend on using the FXSAVE, FXRSTOR, and SSEx instructions. If this bit is not set, a #UD exception will be generated on use of the FPU or any SSE instructions.
It seems that that information is wrong (the OSDev wiki being a hotbed of fan fiction)

Well, no. If I simply substitute "or" with "for", or substitute "the" with "these", it is factually accurate, and you should have realized something was amiss when the first and second sentence are contradicting each other to some extent.

That said, the FPU page is rather poor since it does not go anywhere into best practices and just assumes a system with no need for FPU handling and SSE, and that the FPU is not left in a dodgy state. But it is not blatantly wrong.

DavidCooper · Post by **DavidCooper** » Sat Jul 16, 2011 5:08 pm

berkus wrote:
DavidCooper wrote:no one provides comprehensive lists of machine code instructions in hex form,
Mmm, not true. Afaik Intel manuals have the hex form of the insns as well. Or you could look them up in nasm sources as a nifty table.

I have copies of Intel manuals which are not clear about the hex of some instructions, and I'm fed up of trawling through different manuals looking for the ones that might. I found finit/fninit by doing a search just as you did - that wasn't a problem, and I did that when I realised the info in the book had given me values for finit that were the same as another instruction which couldn't possibly share them. It isn't nearly so easy to find the second byte value for instructions like mov eax,CR4 , though it doesn't look as if I need it at the moment. I guessed years ago that 192 was the right value for CR0, but I don't know if I should add three lots of 8 to that or four to get the value for accessing CR4. Anyway, looking at the source code for nasm might be a viable solution, though I don't know how easy it is to read through. I suspect it would still be easier to download an assembler and try to find out how to get it to convert mnemonics into hex. Would you recommend NASM for that?

DavidCooper · Post by **DavidCooper** » Sat Jul 16, 2011 5:46 pm

Combuster wrote:
DavidCooper wrote:I was going by http://wiki.osdev.org/FPU, and this bit in particular:-
Setting the 9th bit (OSFXSR) in the CR4 tells the CPU that we intend on using the FXSAVE, FXRSTOR, and SSEx instructions. If this bit is not set, a #UD exception will be generated on use of the FPU or any SSE instructions.
It seems that that information is wrong (the OSDev wiki being a hotbed of fan fiction)
Well, no. If I simply substitute "or" with "for", or substitute "the" with "these", it is factually accurate, and you should have realized something was amiss when the first and second sentence are contradicting each other to some extent.

If you already know how things work, turning "or" into "for" might seem obvious, but it wasn't to me. I still can't work out which "the" to turn into "these" - the only one that would survive grammatically would make no real change to the meaning. I'm also failing to pick out the contradiction due, no doubt, to lack of some knowledge that probably doesn't relate to what I need to do with the FPU. The difficulty is always trying to gain enough knowledge to be able to recognise whether information is correct or not, and yet you have to acquire this expertise from sources containing all manner of errors (and which tend to assume tons of knowledge which you don't yet have and often don't need at that stage). We all come into this by different routes and have different errors and holes in our knowledge (and some of us with more than others), but that's where a forum helps as people can put each other right. So, thanks again to the people in this thread (and others) for steering me in the right direction. Your help is much appreciated.

Combuster · Post by **Combuster** » Sat Jul 16, 2011 5:51 pm

If you are concerned with accuracy, limit yourself to the official manuals. Problem solved.

Also did some updates to the FPU page to avoid further confusion.

DavidCooper · Post by **DavidCooper** » Sat Jul 16, 2011 7:07 pm

Combuster wrote:Also did some updates to the FPU page to avoid further confusion.

That is truly excellent - it's rare to find anything spelt out so clearly. Thank you.

Brendan · Post by **Brendan** » Sat Jul 16, 2011 7:29 pm

Hi,

DavidCooper wrote:
Bit 2 doesn't tell you if there's an FPU or not. It determines how the WAIT instruction interacts with hardware task switching.
The fan-fiction source I used for that was http://en.wikipedia.org/wiki/Control_register, the relevant part being this:-

My apologies - I was (incorrectly) thinking of the MP flag and not the EM flag when I wrote that.

Bit 2 is the EM flag. It doesn't say if an FPU is present or not. It only tells the CPU if it should treat FPU instructions as valid or not. Nothing says that anything left this flag in a sane state.

Also, it's possible that in some situations you'd want EM flag to be set even when there is an FPU. Examples include avoiding problems with faulty FPUs (e.g. Pentium's FDIV bug) by emulating FPU instructions even though an FPU is present; and providing extended functionality (e.g. maybe the kernel has support for high precision 256-bit floating point or something, and sets the EM flag as a way to seamlessly enable "high precision" mode for processes that want it).

For code that detects if an FPU is present or not, see Intel's example code in "Intel Processor Identification and the CPUID Instruction Application Note 485" (search for "fnstsw" to find it fast).

DavidCooper wrote:I have copies of Intel manuals which are not clear about the hex of some instructions, and I'm fed up of trawling through different manuals looking for the ones that might.

Just use NASM to convert the instructions into machine code and then use "hexdump" to find out what the opcodes are. Better yet, just use NASM.

The only reason I can think of for anyone to use machine code is as a way of seeking attention - something to brag about when among programmers. It's a bit like setting your testicles on fire at the local pub - your peers/friends will seem impressed, but it can be hard to know if they're impressed by your skills or impressed at how stupid you are, and it'd be a mistake to assume the former when it'd be the latter most of the time.

Cheers,

Brendan

DavidCooper · Post by **DavidCooper** » Sat Jul 16, 2011 9:00 pm

Those were interesting points about why you'd want to bypass the FPU.

Brendan wrote: Just use NASM to convert the instructions into machine code and then use "hexdump" to find out what the opcodes are. Better yet, just use NASM.

I've actually just been downloading nasm and working out how to use it. I've downloaded an assembler before which took an hour to install, gave dire warnings about how it might trash my machine and then which I never managed to work out how to do anything with. By way of contrast, nasm downloaded in the blink of an eye and produced this shortly afterwards:-

Code: Select all

     1 00000000 0F20C0                  mov eax,cr0
     2 00000003 0F20C8                  mov eax,cr1
     3 00000006 0F20D0                  mov eax,cr2
     4 00000009 0F20D8                  mov eax,cr3
     5 0000000C 0F20E0                  mov eax,cr4

So my expected weeks of effort learning how to use it were a bit off target. That's that problem solved.

The only reason I can think of for anyone to use machine code is as a way of seeking attention - something to brag about when among programmers. It's a bit like setting your testicles on fire at the local pub - your peers/friends will seem impressed, but it can be hard to know if they're impressed by your skills or impressed at how stupid you are, and it'd be a mistake to assume the former when it'd be the latter most of the time.

Then think again - I always point out that it's easier to program in machine code than to use assembler (so long as you have an indexing system). I'm impressed by massochists who use assembler, so from my point of view they are the ones setting their testicles on fire. As I've also said before, my original reason for wanting to program directly in machine code was that that is the most obvious way for A.I. to write its code - I see no point in it going through a mnemonic stage as it's far simpler for it to edit machine code directly in memory where it will run and miss out any unnecessary translation stages. Programming in machine code is nothing to boast about, but if you want to see me boast about something, just wait twelve months.

OSDev.org

Using FPU

Using FPU

Re: Using FPU

Re: Using FPU

Re: Using FPU

Re: Using FPU

Re: Using FPU

Re: Using FPU

Re: Using FPU

Re: Using FPU

Re: Using FPU

Re: Using FPU

Re: Using FPU

Re: Using FPU

Re: Using FPU

Re: Using FPU