A little background:
I have been developing a kernel for a while and have decided to put some effort into a video driver. My plan is for the driver itself to offer a few basic drawing primitives to allow for future hardware acceleration. The idea being that it will function a little like display lists where the user space app can queue up "draw rectangle(...); draw line(...); draw bitmap(...);" and then commit that to the video driver.
In addition to this I have a fairly complete, libc and libstdc++ for my kernel land. The libc is something on the order of 80-90% complete and as standards compliant as possible. This includes the FPU ops for example, cos/sin/tan all boil down to the correct x86 FPU instructions. I've tested in user space and they work nicely.
The question
Anyway, I have been attempting to implement the VESA based software version of this and have some good success so far...except for things involving the FPU

Lines, Squares, Pixels, Bitmaps...all work fine, because they operate on squares. Circles are another story

I've been attempting to implement some circle drawing stuff which requires (at the least) a floating point division, sine and arccos. Unfortunately, vmware just locks up

Code: Select all
double d = sqrt(2.0);
printf("%f\n", d);
qemu: prints correct answer
bochs: prints nan.
Is there anything special I need to do to setup a sane FPU environment in kernel land? I do a fsave/frstor on task switch. And I do an finit early in my kernel's "main" function. What am I missing?