Bochs: math_abort: MSDOS compatibility FPU exception

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
User avatar
spix
Member
Member
Posts: 128
Joined: Mon Jun 26, 2006 8:41 am
Location: Millicent, South Australia
Contact:

Bochs: math_abort: MSDOS compatibility FPU exception

Post by spix »

Hi

I am wondering if anyone gets this error message when dividing things?

Code: Select all

math_abort: MSDOS compatibility FPU exception
I have recently ported e2fsck to my OS, and when running in bochs, I get a lot of those messages and then it trashes my filesystem.

Qemu works perfectly. I haven't tried it on a real PC because I'm not sure I wont mess up the hard drive.

Does anyone know if this is just a quirk in bochs? Or am I doing something wrong?

Andrew
durand
Member
Member
Posts: 193
Joined: Wed Dec 21, 2005 12:00 am
Location: South Africa
Contact:

Re:Bochs: math_abort: MSDOS compatibility FPU exception

Post by durand »

I grep'ed the bochs source code for the log entry and found it in fpu/fpu.cc.

Basically, it looks like the FPU specific flags in your CR0 are set up in a way that bochs doesn't support. In particular, your CR0 NE flag is set to 0. Bochs freaks out when it's set to 0.

Just investigate what bits you're setting and see if it's what you want, etc.. :)
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re:Bochs: math_abort: MSDOS compatibility FPU exception

Post by Brendan »

Hi,
durand wrote:I grep'ed the bochs source code for the log entry and found it in fpu/fpu.cc.

Basically, it looks like the FPU specific flags in your CR0 are set up in a way that bochs doesn't support. In particular, your CR0 NE flag is set to 0. Bochs freaks out when it's set to 0.

Just investigate what bits you're setting and see if it's what you want, etc.. :)
Once upon a time (80386 and older CPUs) the FPU was a seperate chip, which generated IRQ 13 via. the PIC when it got an error. When Intel put the FPU on the same chip as the CPU this became rather dumb - it's much faster and easier to generate an exception instead, and IRQ 13 doesn't work when you've got multiple CPUs.

To fix this Intel invented exception #16 (FPU error). The problem is that this would break backwards compatability, so they also added the NE flag in CR0. This means if the NE flag is clear you get the old IRQ from the PIC, and if it's set you get the new exception instead. Of course for backwards compatability the BIOS will leave the flag clear so that it doesn't break old stuff.

Now for Bochs... For 80486 and later Bochs does behave correctly regardless of whether the NE flag is set or cleared, although if NE flag is clear it'll show a message in the log. The comments in "fpu/fpu.cc" are completely wrong though, and so is the conditional code - they assume that 80286 and 80386 used the new exception rather than the old IRQ. In practice Bochs doesn't support 80386 or older CPUs, so this bug doesn't effect anyone... ;)

None of this solves your problem though - why is the FPU generating an exception in the first place?

In the FPU control register there's a set of flags to enable/disable certain kinds of FPU exceptions (invalid operation, divide by zero, denormalized operand, overflow, underflow, inexact result). If the type of exception is disabled the FPU will continue without generating any exception (and will do some sort of default action to instead, which is intended to cause the least trouble later) - for example, for divide by zero it might set the result to positive or negative infinity and ignore the problem. Some of these exceptions are extremely likely - for example, doing "1 / 3" will result in an inexact result exception because the result can't be perfectly represented.

Given this, I'd guess that one or more of these possibilities may be the problem:
  • - the code you're running enables some FPU exceptions in the FPU control register and expects the OS to handle the resulting exceptions in some way (for e.g perhaps the code expects to receive a SIGFPE signal from the OS).
    - your OS doesn't initialize the FPU correctly, or the code you're running expects that certain FPU exceptions are disabled in the FPU control register when they aren't.
    - your task switch code doesn't save and restore the FPU state correctly
    - the code you're running is faulty or is relying on incorrect data (i.e. generating FPU exceptions for other reasons)
Hope something here helps....


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
spix
Member
Member
Posts: 128
Joined: Mon Jun 26, 2006 8:41 am
Location: Millicent, South Australia
Contact:

Re:Bochs: math_abort: MSDOS compatibility FPU exception

Post by spix »

- your OS doesn't initialize the FPU correctly, or the code you're running expects that certain FPU exceptions are disabled in the FPU control register when they aren't.
- your task switch code doesn't save and restore the FPU state correctly
I wasn't doing anything with th FPU before... so I guess both those are the problem.

I looked at the intel manuals and the linux code (2.2) it seems that when cr0 has the TS flag set and an app tries to do a floating point thing, it will cause exception #7, here is where linux initializes the FPU. So that's what I did.

At each task switch, if the FPU has been initialized it saves the FPU state in the task structure, and restores it next time exception #7 occurs.

Now, linux 2.2 on i386 appears to use hardware task switching, so I am wondering as I am using software task switching, will TS be set on a task switch? I am thinking no, so I am setting TS manually on a task switch, is that ok?

Having done the above, I've stopped bochs giving those messages (also I set NE=1) and I am not getting exception 17 you mentioned (though I will now add a SIGFPE generator to that isr, thanks)

I am still baffled as to why Qemu worked correctly.

Thanks for the help,
Andrew
User avatar
Pype.Clicker
Member
Member
Posts: 5964
Joined: Wed Oct 18, 2006 2:31 am
Location: In a galaxy, far, far away
Contact:

Re:Bochs: math_abort: MSDOS compatibility FPU exception

Post by Pype.Clicker »

QEMU is quite a different beast than Bochs. Bochs emulates, QEMU virtualizes (or at least, qemu-fast does). That means QEMU happily runs your code as usermode code on your host system and only switch to emulator code when you do something the hosting environment doesn't support (such as writing to memory beyond userspace, using priviledged instructions, etc) by the mean of the SIGSEGV handler (or whatever similar mechanism in windows).

Now, that (with the fact it may emulate a higher CPU than your bochs) could explain why it would have NE bit set (wasn't MP bit the one for "Math coprocessor present" ? i don't have my 386-handbook aside, so i cannot remind of what "NE" stands for). but indeed, that doesn't explain why it does FPU things in first place. maybe you're running data rather than running code somewhere ?
User avatar
spix
Member
Member
Posts: 128
Joined: Mon Jun 26, 2006 8:41 am
Location: Millicent, South Australia
Contact:

Re:Bochs: math_abort: MSDOS compatibility FPU exception

Post by spix »

Thanks Pipe,

I think I have worked out the problem, you cant share hard drive images between bochs and qemu.

On qemu, the hard drive works perfectly in linux & my os, on bochs e2fsck trashes it on both linux & bochs.

I did e2fsck on linux on qemu, and tested and had a clean filesystem, then i booted the same linux on bochs and forced e2fsck to run (it thought it was clean as it should) but it found hundreds of duplicate blocks

So my operating system was doing the correct thing by wrecking everything :-/
User avatar
Pype.Clicker
Member
Member
Posts: 5964
Joined: Wed Oct 18, 2006 2:31 am
Location: In a galaxy, far, far away
Contact:

Re:Bochs: math_abort: MSDOS compatibility FPU exception

Post by Pype.Clicker »

spix wrote: I think I have worked out the problem, you cant share hard drive images between bochs and qemu.
"Sharing" ?? you mean, like running bochs and QEMU at the same time on the same HDD.img file so that you could format/populate it with linux and read it with YourOS?

That wouldn't be a wise thing to do, indeed.
User avatar
spix
Member
Member
Posts: 128
Joined: Mon Jun 26, 2006 8:41 am
Location: Millicent, South Australia
Contact:

Re:Bochs: math_abort: MSDOS compatibility FPU exception

Post by spix »

Well, I did that a few times ;D I thought if I unmounted it first...

but this last test i shut down qemu before starting bochs.. maybe the image was no good because of my past indiscretions.

[edit]
I was wrong on a couple of things, linux 2.2 does use software task switching, and bochs seems to work fine with qemu images.
[/edit]
User avatar
spix
Member
Member
Posts: 128
Joined: Mon Jun 26, 2006 8:41 am
Location: Millicent, South Australia
Contact:

Re:Bochs: math_abort: MSDOS compatibility FPU exception

Post by spix »

Sorry for bringing this up again, I'm still having problems and hope someone can help..

I've fixed the FPU exception, each time an IRQ fires my irq fault handler checks if the FPU has been used and if it has, saves the state. Then it resets the task switch flag in cr0 and so next time it restores via the int 7 device missing fault.

in qemu this seems to work, adding some printf to the int 7 handler i can see it is getting called when it should. In bochs, it does a similar thing, but bochs' timer seems to be on steroids, so it does the int 7 handler routine a lot more.

Bochs still trashes the filesystem, yet qemu does not

At first I thought, what if e2fsck (the offending program) was in the middle of a floating point operation when the kernel interrupted, and the kernel wanted to do a floating point operation in that interrupt. So I have done some tests, and no that doesn't appear to be the case.

So, I am no longer getting FPU exceptions, e2fsck thinks it finished with a job well done and marks the filesystem clean. After that the filesystem is almost completely messed up.

Would it be reasonable to assume that the FPU isn't the cause of this problem?

Andrew
User avatar
Candy
Member
Member
Posts: 3882
Joined: Tue Oct 17, 2006 11:33 pm
Location: Eindhoven

Re:Bochs: math_abort: MSDOS compatibility FPU exception

Post by Candy »

spix wrote: I've fixed the FPU exception, each time an IRQ fires my irq fault handler checks if the FPU has been used and if it has, saves the state. Then it resets the task switch flag in cr0 and so next time it restores via the int 7 device missing fault.
Does that actually fix it?
in qemu this seems to work, adding some printf to the int 7 handler i can see it is getting called when it should. In bochs, it does a similar thing, but bochs' timer seems to be on steroids, so it does the int 7 handler routine a lot more.
Bochs' handler isn't on steroids, it's set to 18.2 hertz in the given IPS. If your actual IPS is much higher, it'll fire too often. If your handler takes longer than the IPS / clocktime you'll get a stack overflow (oops on that).
At first I thought, what if e2fsck (the offending program) was in the middle of a floating point operation when the kernel interrupted, and the kernel wanted to do a floating point operation in that interrupt. So I have done some tests, and no that doesn't appear to be the case.
What would e2fsck do with floating point? As far as I'm concerned, that's too unreliable in rounding, especially for a bit-level correct thing as a file system.
So, I am no longer getting FPU exceptions, e2fsck thinks it finished with a job well done and marks the filesystem clean. After that the filesystem is almost completely messed up.

Would it be reasonable to assume that the FPU isn't the cause of this problem?
Might it be an exception mapped to the IRQ there? Did you remap IRQs? Could it be that you're receiving exception 13 instead?
User avatar
spix
Member
Member
Posts: 128
Joined: Mon Jun 26, 2006 8:41 am
Location: Millicent, South Australia
Contact:

Re:Bochs: math_abort: MSDOS compatibility FPU exception

Post by spix »

Does that actually fix it?
Yes, I started out not doing anything with the FPU and I was getting irq 0x13 causing the message in the topic with bochs. I modified cr0 to give a floating point exception instead of an external IRQ, I also have set the TS bit so that when a floating point operation happens it generates a device not found exception which is where I clear the TS bit and restore the FPU state. Next context change the FPU state is saved and the TS bit is set again. I no longer recieve the IRQs and I have set the FPU exception handler to generate a SIGFPE for the task involved, that should exit the program (or if it has a handler, do that) So yeah, it *seems* to be fixed.
Bochs' handler isn't on steroids, it's set to 18.2 hertz in the given IPS. If your actual IPS is much higher, it'll fire too often. If your handler takes longer than the IPS / clocktime you'll get a stack overflow (oops on that).
Well it seems to run a lot faster than it should. I recalibrate the timer phase to 100Hz and use the CPU timer to count the time. It's always gains too much time in bochs.
What would e2fsck do with floating point? As far as I'm concerned, that's too unreliable in rounding, especially for a bit-level correct thing as a file system.
I have no idea why it wants to use the FPU. My ls command I made uses the FPU to calculate human readable sizes.
Might it be an exception mapped to the IRQ there? Did you remap IRQs? Could it be that you're receiving exception 13 instead?
No, I am pretty sure it's not generating IRQ 13, it was doing that at the beginning, as I had the NE bit cleared. Yeah IRQs are remapped.

I'm really not sure what the problem is, I'm just trying to work out what the problem isn't. I don't think it is the FPU as it's no longer generating exceptions, and like you said, what does e2fsck need that for anyway.

Thanks for the reply,
I might try this on a real PC and see what happens.

Andrew
User avatar
Candy
Member
Member
Posts: 3882
Joined: Tue Oct 17, 2006 11:33 pm
Location: Eindhoven

Re:Bochs: math_abort: MSDOS compatibility FPU exception

Post by Candy »

spix wrote:
What would e2fsck do with floating point? As far as I'm concerned, that's too unreliable in rounding, especially for a bit-level correct thing as a file system.
I have no idea why it wants to use the FPU. My ls command I made uses the FPU to calculate human readable sizes.
Human-readable sizes without FPU:

Code: Select all

#include <stdio.h>

typedef unsigned long long uint64_t;

// change to 1000 for SI-compatible numbers
#define KILOBYTE 1024

char *human_readable(char *buffer, int n, uint64_t size) {
  static char indexes[] = " kMGTPE";
  // clean up the interface if you're using this
//  assert(n == 10);
  int idx = 0;
  int cidx = n-1;
  while (size > 10*KILOBYTE) { size /= KILOBYTE; idx++; }
  buffer[cidx--] = '\0';
  buffer[cidx--] = 'B';
  if (idx) buffer[cidx--] = indexes[idx];
  buffer[cidx--] = ' ';
  if (size == 0) {
    buffer[cidx--] = '0';
  } else {
    while (size > 0) {
      buffer[cidx--] = '0' + size % 10;
      size /= 10;
    }
  }
  return buffer + cidx + 1;
}

void try(uint64_t long_num) {
  char buffer[20];
  buffer[10] = 0;
  printf("%s = %08x %08x\n", human_readable(buffer, 10, long_num), long_num);
}

int main() {
  try(239587);
  try(4294967296ULL);
  try(23507012394780145ULL);
  try(1677721600000000000ULL);
  try(0);
  try(42);
}
I'm really not sure what the problem is, I'm just trying to work out what the problem isn't. I don't think it is the FPU as it's no longer generating exceptions, and like you said, what does e2fsck need that for anyway.
Just ffs, assume it does use the FPU. Does e2fsck init its own fpu or does it assume it's inited when it starts up?
User avatar
spix
Member
Member
Posts: 128
Joined: Mon Jun 26, 2006 8:41 am
Location: Millicent, South Australia
Contact:

Re:Bochs: math_abort: MSDOS compatibility FPU exception

Post by spix »

Human-readable sizes without FPU:
Thanks.. the part that uses the FPU is when the human readable size is only 1 digit, then it uses a float, for example 1k would appear 1.0k (or 1.3k or whatever.)
Just ffs, assume it does use the FPU. Does e2fsck init its own fpu or does it assume it's inited when it starts up?
I don't understand what you mean by this. My understanding is, when you use code that requires floating point arithmetic, gcc compiles in the necessary FPU op-codes, e2fsck is written in C. I understand it is the job of the OS to init the FPU.
User avatar
Candy
Member
Member
Posts: 3882
Joined: Tue Oct 17, 2006 11:33 pm
Location: Eindhoven

Re:Bochs: math_abort: MSDOS compatibility FPU exception

Post by Candy »

Does your OS initialize (FINIT) the FPU before use? I would say it either does or your ls would also have failed. It could be that the default settings for the FPU that GCC assumes is different from your setup.
User avatar
spix
Member
Member
Posts: 128
Joined: Mon Jun 26, 2006 8:41 am
Location: Millicent, South Australia
Contact:

Re:Bochs: math_abort: MSDOS compatibility FPU exception

Post by spix »

Does your OS initialize (FINIT) the FPU before use? I would say it either does or your ls would also have failed. It could be that the default settings for the FPU that GCC assumes is different from your setup.
Yeah my OS does initialize the FPU.

I think the FPU thing might have been a red herring. I'm pretty sure I've fixed the FPU but still having the problem with the filesystem. I am thinking It might be my IDE driver that is at fault. I installed my OS on a real PC, and that didn't even mount the filesystem. I think maybe Qemu is really tolerant with Disk I/O, bochs perhaps less so.

Anyway, I'm in the process of rewriting the IDE stuff, so maybe that will fix it.

Thanks for your help.
Post Reply