FPU problems

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
User avatar
NickJohnson
Member
Member
Posts: 1249
Joined: Tue Mar 24, 2009 8:11 pm
Location: Sunnyvale, California

FPU problems

Post by NickJohnson »

I recently discovered that my FPU code was broken (because my test code was being optimized out, and I haven't actually needed the FPU until now), but I can't seem to find the problem with it. I use CPUID to check for FXSAVE/FXRSTOR, FPU, and SSE support, which seems to indicate that the FPU exists, but the EM flag is set when the VM boots (indicating that it does not, IIRC), and clearing EM causes floating point exceptions with FPU use. All tests are being done in userspace. I'm using QEMU 0.12.4 with "-cpu qemu32" for testing, and I haven't done debugging of the FPU code on real boxes yet, although my OS does otherwise run fine on them.

Here is the key routine for FPU initialization (which sets the value of the dword at can_use_fpu, which is zero initially, according to its findings):

Code: Select all

init_fpu:
	push ebx

	; check CPUID for FXSR bit
	mov eax, 1
	cpuid
	test edx, 0x01000000
	jne .nofx

	; set OSFXSR bit in CR4
	mov eax, cr4
	or eax, 0x200
	mov cr4, eax

	mov eax, 1
	mov [can_use_fpu], eax

	; check CPUID for FPU bit
	test edx, 0x00000001
	jne .nofpu
	
	; set FPU control word
.cw:
	mov ax, 0x37F
	mov [.cw], ax
	fldcw [.cw]

.nofpu:
	; check CPUID for SSE bit
	test edx, 0x02000000
	jne .nosse

	; initialize SSE
	mov eax, cr0
	and eax, 0xFFFFFFFB
	or eax, 0x2
	mov cr0, eax
	mov eax, cr4
	or eax, 0x00000600
	mov cr4, eax

.nosse:
.nofx:
	pop ebx
	ret
Does anyone see something immediately wrong with this routine? The problem may be elsewhere.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: FPU problems

Post by Brendan »

Hi,

First, FPU has nothing to do with MMX or 3DNOW or SSE - they should all be treated as entirely separate features; partly because older CPUs may have some features (e.g. FPU) and not others (e.g. no SSE) and partly because future CPUs may remove some features (e.g. no FPU) while keeping other features (e.g. SSE). Also don't forget that some CPUs may have significant bugs/errata with some features (e.g. Pentium and FPU/FDIV) where you might want to refuse to use a feature even though CPUID says it's supported, or even split the feature (e.g. "precise_FPU" and "imprecise_FPU").

So, the first thing I'd do is rip out all the SSE stuff (and maybe put it into a "init_SSE" routine instead). Also note that you don't need FXSAVE/FXRSTOR for "FPU only" (you'd use FNSAVE/FRSTOR if FXSAVE/FXRSTOR aren't supported).

Code: Select all

init_fpu:
	push ebx

	mov eax, 1
	cpuid

;** Only required for SSE **
;	; check CPUID for FXSR bit
;	test edx, 0x01000000
;	jne .nofx

;** Only required for SSE **
;	; set OSFXSR bit in CR4
;	mov eax, cr4
;	or eax, 0x200
;	mov cr4, eax

;** Set the "can_use_fpu" flag - the results of any of the tests are ignored **
	mov eax, 1
	mov [can_use_fpu], eax

;** Only determines if FPU is built-in, and can be wrong if FPU is external **
	; check CPUID for FPU bit
	test edx, 0x00000001
;** Branch is around the wrong way - if the CPU has a built-in FPU you jump to ".nofpu"????
	jne .nofpu


;** This enables all exceptions in the FPU control word (but only when there's no FPU built into the CPU) **
	; set FPU control word
.cw:
	mov ax, 0x37F
	mov [.cw], ax     ;** WARNING: Self modifying code - bad for performance, and your screwed if other CPUs attempt to call this code
	fldcw [.cw]

.nofpu:

;** Only required for SSE **
;	; check CPUID for SSE bit
;	test edx, 0x02000000
;	jne .nosse

;** First part actually "initialises" FPU flags in CR0 - nothing to do with SSE **
;** Um, only ever runs when a CPU has SSE but doesn't have built-in FPU (!)

;	; initialize SSE
	mov eax, cr0
	and eax, 0xFFFFFFFB       ;Clear the EM flag
	or eax, 0x2               ;Set the MP flag
                                ;Forgot about the NE flag???
	mov cr0, eax


;	mov eax, cr4
;	or eax, 0x00000600
;	mov cr4, eax

;.nosse:
;.nofx:
	pop ebx
	ret
NickJohnson wrote:Does anyone see something immediately wrong with this routine? The problem may be elsewhere.
The code to detect if an FPU is present is dodgy (bad branches and irrelevant branches, assumes CPUID instruction is supported, etc). The code to initialise the FPU is dodgy - doesn't set the EM and MP flags in CR0 properly (even if there's no FPU these need to be configured properly), doesn't do a "FNINIT", doesn't disable FPU exceptions that are almost always ignored (e.g. "precision exception"). Also, if an FPU is built into the CPU, then you should use the "native exception handling" (set the NE flag in CR0 so FPU exceptions aren't routed through the legacy PIC chips).


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
NickJohnson
Member
Member
Posts: 1249
Joined: Tue Mar 24, 2009 8:11 pm
Location: Sunnyvale, California

Re: FPU problems

Post by NickJohnson »

Ah, I thought FPU implied FX*, but I guess it's probably that QEMU is set up with FPU but no SSE, so it only implies F* (and I have nothing to support that). I was trying to group the FPU and SSE because they were saved/loaded with the same instruction. I guess I had more problems than I thought.

Thanks for your help!
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: FPU problems

Post by Brendan »

Hi,
NickJohnson wrote:Ah, I thought FPU implied FX*, but I guess it's probably that QEMU is set up with FPU but no SSE, so it only implies F* (and I have nothing to support that). I was trying to group the FPU and SSE because they were saved/loaded with the same instruction. I guess I had more problems than I thought.
If I remember correctly, the timeline actually went like this: FPU always had FSAVE/FNSAVE/FRSTOR (which used a 108-byte structure). Then Intel added MMX and AMD added 3DNow! (which both just used FSAVE/FNSAVE/FRSTOR). After that (Pentium II) Intel added FXSAVE/FXRSTOR and promoted it as a faster method of saving/loading FPU state (faster, mostly due to better alignment for FPU register storage), but they designed it so that it could be extended later (or probably just told people to reserve 512-bytes of space, even though it probably only used 160 bytes initially). Later (Pentium III) they added SSE and used the extra space.

This means you can have FPU/MMX/3DNOW without FXSAVE/FXRSTOR and without SSE, or FPU/MMX/3DNOW with FXSAVE/FXRSTOR and without SSE, or FPU with FXSAVE/FXRSTOR and with SSE.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Post Reply