Hi Brendan, and thanks for your reply.
Brendan wrote:I'm not sure if you're reading every third word of the manual and then filling in the blanks with fan fiction or something, but...
DavidCooper wrote:It's working now after adding this code: F 20 D8 (CR4 into EAX) D 0 2 0 0 (set bit 9) F 22 D8 (EAX into CR4)
Bit 9 of CR4 is only for SSE and has nothing to do with FPU.
I was going by
http://wiki.osdev.org/FPU, and this bit in particular:-
Setting the 9th bit (OSFXSR) in the CR4 tells the CPU that we intend on using the FXSAVE, FXRSTOR, and SSEx instructions. If this bit is not set, a #UD exception will be generated on use of the FPU or any SSE instructions.
It seems that that information is wrong (the OSDev wiki being a hotbed of fan fiction), but following it did at the time appear to make the FPU start to function, though in reality the real cure was simply the fninit instruction (DB E3), which I added at the same time after finding the correct hex values for it on a x86 instruction set website - I previously had the wrong values for those two bytes due to an error in a book borrowed from the library years ago (not sure which, but it might have been Barry B. Brey's Programming the 80286, 80386, 80486 and Pentium-Based Personal Computer, or more likely Peter Norton's Programmer's Guide to the IBM PC - the notes I made from those two books are usually the first place I look for things). I'm not even sure I actually made any change to CR4 as I can't find any information on the correct rrr value to select it, so I may have changed bit 9 in CR3 instead - no one provides comprehensive lists of machine code instructions in hex form, so I sometimes have to guess what they are and use a bit of trial and error. (It's ridiculous, but it seems that I'll have to get hold of an assembler and spend weeks trying to work out how to use it just to use it to get the hex values of a handful of rare instructions.)
For "32-bit FPUs" (80386 and later), the FSETPM instruction is treated as a NOP - it has no effect. For 80287 it mattered (it effected loads/stores done by the FPU), and for older FPUs it wasn't supported. Your code is probably "80386 or later" anyway, so the "FSETPM" instruction is a waste of time.
It's harmless to use it though. I thought it might be necessary as information I copied from one of those books years ago suggested it might matter on a 386 and perhaps on a 486, though that may well be wrong.
Bit 2 doesn't tell you if there's an FPU or not. It determines how the WAIT instruction interacts with hardware task switching.
The fan-fiction source I used for that was
http://en.wikipedia.org/wiki/Control_register, the relevant part being this:-
2 EM Emulation If set, no x87 floating point unit present, if clear, x87 FPU present
I know that Wikipedia isn't entirely reliable, but if you don't google for things you tend to get attacked by people here, so there's a pressure on everyone just to experiment with unreliable data rather than daring to ask any experts here if they already know of a fully reliable source that they could maybe point to.
DavidCooper wrote:Bit 5 seems to be for enabling exceptions, so I can probably just leave it clear as I intend to avoid generating errors to begin with, but I'll have to try it out and see what happens if I divide by zero, after creating an interrupt routine to handle that. I think from memory that's the same interrupt as the system timer, so that could complify things a bit.
Um, no.
Um, indeed no. In this case I was going by the same Wikipedia page where it said:-
5 NE Numeric error Enable internal x87 floating point error reporting when set, else enables PC style x87 error detection
but I didn't understand it (as it doesn't appear to make a real distinction of any kind), so I guessed at its meaning - I was in a hurry and it wasn't immediately important - I was just happy to have got the FPU to speak to me at last and was keen to start experimenting with it. I also mis-remembered where the system timer interrupt IDT entry is too as it's 8 entries further on from the divide by zero exception, and now it isn't clear to me whether the divide by zero exception is even used by the FPU at all.
Originally, when there's a floating point error the FPU (in a separate chip) used IRQ13 and the PIC chip to tell the CPU that an error occurred. This was stupid (causes race conditions, etc) and fails completely in systems with multiple CPUs. For CPUs that have built-in FPUs (80486 and later) it makes far more sense for a floating point error to be treated like any other error and cause an exception instead, where IRQs and the PIC chips aren't involved at all. However, Intel couldn't suddenly change how floating point errors are handled because that would've caused backward compatibility problems for old software (e.g. DOS); so they added the NE flag in CR0. If this flag is set then floating point errors trigger an exception, and if the flag is clear then floating point errors get delivered as IRQ13 instead.
That makes a lot of sense. I'm still trying to see through the fog here, but it's beginning to look as if all the FPU exceptions trigger an exception using the 17th IDT entry (vector 16) if the NE flag is set. The divide by zero (vector 0) and overflow (vector 4) which I had assumed could be triggered by the FPU are, I now suspect, restricted to integer arithmetic done in the main processor, but I've never found any source of information that spells this kind of thing out (unless it's cunningly hidden deep within tons of other information that I don't need).
In general (for applications), "precision", "underflow" and "denormal operand" errors can be safely disabled (unless you care a lot about the accuracy of calculations); and all other errors indicate serious problems with your software (and should therefore be enabled).
So I really need to enable them all, but start with them disabled and enable just one at a time while I try to write code to handle it.
Anyway, with your help I think I'm beginning to get there at last, so thanks again.