Hi,
First, FPU has nothing to do with MMX or 3DNOW or SSE - they should all be treated as entirely separate features; partly because older CPUs may have some features (e.g. FPU) and not others (e.g. no SSE) and partly because future CPUs may remove some features (e.g. no FPU) while keeping other features (e.g. SSE). Also don't forget that some CPUs may have significant bugs/errata with some features (e.g. Pentium and FPU/FDIV) where you might want to refuse to use a feature even though CPUID says it's supported, or even split the feature (e.g. "precise_FPU" and "imprecise_FPU").
So, the first thing I'd do is rip out all the SSE stuff (and maybe put it into a "init_SSE" routine instead). Also note that you don't need FXSAVE/FXRSTOR for "FPU only" (you'd use FNSAVE/FRSTOR if FXSAVE/FXRSTOR aren't supported).
Code: Select all
init_fpu:
push ebx
mov eax, 1
cpuid
;** Only required for SSE **
; ; check CPUID for FXSR bit
; test edx, 0x01000000
; jne .nofx
;** Only required for SSE **
; ; set OSFXSR bit in CR4
; mov eax, cr4
; or eax, 0x200
; mov cr4, eax
;** Set the "can_use_fpu" flag - the results of any of the tests are ignored **
mov eax, 1
mov [can_use_fpu], eax
;** Only determines if FPU is built-in, and can be wrong if FPU is external **
; check CPUID for FPU bit
test edx, 0x00000001
;** Branch is around the wrong way - if the CPU has a built-in FPU you jump to ".nofpu"????
jne .nofpu
;** This enables all exceptions in the FPU control word (but only when there's no FPU built into the CPU) **
; set FPU control word
.cw:
mov ax, 0x37F
mov [.cw], ax ;** WARNING: Self modifying code - bad for performance, and your screwed if other CPUs attempt to call this code
fldcw [.cw]
.nofpu:
;** Only required for SSE **
; ; check CPUID for SSE bit
; test edx, 0x02000000
; jne .nosse
;** First part actually "initialises" FPU flags in CR0 - nothing to do with SSE **
;** Um, only ever runs when a CPU has SSE but doesn't have built-in FPU (!)
; ; initialize SSE
mov eax, cr0
and eax, 0xFFFFFFFB ;Clear the EM flag
or eax, 0x2 ;Set the MP flag
;Forgot about the NE flag???
mov cr0, eax
; mov eax, cr4
; or eax, 0x00000600
; mov cr4, eax
;.nosse:
;.nofx:
pop ebx
ret
NickJohnson wrote:Does anyone see something immediately wrong with this routine? The problem may be elsewhere.
The code to detect if an FPU is present is dodgy (bad branches and irrelevant branches, assumes CPUID instruction is supported, etc). The code to initialise the FPU is dodgy - doesn't set the EM and MP flags in CR0 properly (even if there's no FPU these need to be configured properly), doesn't do a "FNINIT", doesn't disable FPU exceptions that are almost always ignored (e.g. "precision exception"). Also, if an FPU is built into the CPU, then you should use the "native exception handling" (set the NE flag in CR0 so FPU exceptions aren't routed through the legacy PIC chips).
Cheers,
Brendan