OSDev.org

The Place to Start for Operating System Developers
It is currently Thu Apr 25, 2024 2:09 pm

All times are UTC - 6 hours




Post new topic Reply to topic  [ 15 posts ] 
Author Message
 Post subject: Bochs: math_abort: MSDOS compatibility FPU exception
PostPosted: Thu Sep 07, 2006 12:04 pm 
Offline
Member
Member
User avatar

Joined: Mon Jun 26, 2006 8:41 am
Posts: 128
Location: Millicent, South Australia
Hi

I am wondering if anyone gets this error message when dividing things?

Code:
math_abort: MSDOS compatibility FPU exception


I have recently ported e2fsck to my OS, and when running in bochs, I get a lot of those messages and then it trashes my filesystem.

Qemu works perfectly. I haven't tried it on a real PC because I'm not sure I wont mess up the hard drive.

Does anyone know if this is just a quirk in bochs? Or am I doing something wrong?

Andrew

_________________
Mort OS - Blog


Top
 Profile  
 
 Post subject: Re:Bochs: math_abort: MSDOS compatibility FPU exception
PostPosted: Thu Sep 07, 2006 5:48 pm 
Offline
Member
Member

Joined: Wed Dec 21, 2005 12:00 am
Posts: 193
Location: South Africa
I grep'ed the bochs source code for the log entry and found it in fpu/fpu.cc.

Basically, it looks like the FPU specific flags in your CR0 are set up in a way that bochs doesn't support. In particular, your CR0 NE flag is set to 0. Bochs freaks out when it's set to 0.

Just investigate what bits you're setting and see if it's what you want, etc.. :)


Top
 Profile  
 
 Post subject: Re:Bochs: math_abort: MSDOS compatibility FPU exception
PostPosted: Thu Sep 07, 2006 9:53 pm 
Offline
Member
Member
User avatar

Joined: Sat Jan 15, 2005 12:00 am
Posts: 8561
Location: At his keyboard!
Hi,

durand wrote:
I grep'ed the bochs source code for the log entry and found it in fpu/fpu.cc.

Basically, it looks like the FPU specific flags in your CR0 are set up in a way that bochs doesn't support. In particular, your CR0 NE flag is set to 0. Bochs freaks out when it's set to 0.

Just investigate what bits you're setting and see if it's what you want, etc.. :)


Once upon a time (80386 and older CPUs) the FPU was a seperate chip, which generated IRQ 13 via. the PIC when it got an error. When Intel put the FPU on the same chip as the CPU this became rather dumb - it's much faster and easier to generate an exception instead, and IRQ 13 doesn't work when you've got multiple CPUs.

To fix this Intel invented exception #16 (FPU error). The problem is that this would break backwards compatability, so they also added the NE flag in CR0. This means if the NE flag is clear you get the old IRQ from the PIC, and if it's set you get the new exception instead. Of course for backwards compatability the BIOS will leave the flag clear so that it doesn't break old stuff.

Now for Bochs... For 80486 and later Bochs does behave correctly regardless of whether the NE flag is set or cleared, although if NE flag is clear it'll show a message in the log. The comments in "fpu/fpu.cc" are completely wrong though, and so is the conditional code - they assume that 80286 and 80386 used the new exception rather than the old IRQ. In practice Bochs doesn't support 80386 or older CPUs, so this bug doesn't effect anyone... ;)

None of this solves your problem though - why is the FPU generating an exception in the first place?

In the FPU control register there's a set of flags to enable/disable certain kinds of FPU exceptions (invalid operation, divide by zero, denormalized operand, overflow, underflow, inexact result). If the type of exception is disabled the FPU will continue without generating any exception (and will do some sort of default action to instead, which is intended to cause the least trouble later) - for example, for divide by zero it might set the result to positive or negative infinity and ignore the problem. Some of these exceptions are extremely likely - for example, doing "1 / 3" will result in an inexact result exception because the result can't be perfectly represented.

Given this, I'd guess that one or more of these possibilities may be the problem:
    - the code you're running enables some FPU exceptions in the FPU control register and expects the OS to handle the resulting exceptions in some way (for e.g perhaps the code expects to receive a SIGFPE signal from the OS).
    - your OS doesn't initialize the FPU correctly, or the code you're running expects that certain FPU exceptions are disabled in the FPU control register when they aren't.
    - your task switch code doesn't save and restore the FPU state correctly
    - the code you're running is faulty or is relying on incorrect data (i.e. generating FPU exceptions for other reasons)

Hope something here helps....


Cheers,

Brendan

_________________
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.


Top
 Profile  
 
 Post subject: Re:Bochs: math_abort: MSDOS compatibility FPU exception
PostPosted: Fri Sep 08, 2006 2:16 am 
Offline
Member
Member
User avatar

Joined: Mon Jun 26, 2006 8:41 am
Posts: 128
Location: Millicent, South Australia
Quote:
- your OS doesn't initialize the FPU correctly, or the code you're running expects that certain FPU exceptions are disabled in the FPU control register when they aren't.
- your task switch code doesn't save and restore the FPU state correctly


I wasn't doing anything with th FPU before... so I guess both those are the problem.

I looked at the intel manuals and the linux code (2.2) it seems that when cr0 has the TS flag set and an app tries to do a floating point thing, it will cause exception #7, here is where linux initializes the FPU. So that's what I did.

At each task switch, if the FPU has been initialized it saves the FPU state in the task structure, and restores it next time exception #7 occurs.

Now, linux 2.2 on i386 appears to use hardware task switching, so I am wondering as I am using software task switching, will TS be set on a task switch? I am thinking no, so I am setting TS manually on a task switch, is that ok?

Having done the above, I've stopped bochs giving those messages (also I set NE=1) and I am not getting exception 17 you mentioned (though I will now add a SIGFPE generator to that isr, thanks)

I am still baffled as to why Qemu worked correctly.

Thanks for the help,
Andrew

_________________
Mort OS - Blog


Top
 Profile  
 
 Post subject: Re:Bochs: math_abort: MSDOS compatibility FPU exception
PostPosted: Fri Sep 08, 2006 3:20 am 
Offline
Member
Member
User avatar

Joined: Wed Oct 18, 2006 2:31 am
Posts: 5964
Location: In a galaxy, far, far away
QEMU is quite a different beast than Bochs. Bochs emulates, QEMU virtualizes (or at least, qemu-fast does). That means QEMU happily runs your code as usermode code on your host system and only switch to emulator code when you do something the hosting environment doesn't support (such as writing to memory beyond userspace, using priviledged instructions, etc) by the mean of the SIGSEGV handler (or whatever similar mechanism in windows).

Now, that (with the fact it may emulate a higher CPU than your bochs) could explain why it would have NE bit set (wasn't MP bit the one for "Math coprocessor present" ? i don't have my 386-handbook aside, so i cannot remind of what "NE" stands for). but indeed, that doesn't explain why it does FPU things in first place. maybe you're running data rather than running code somewhere ?

_________________
Image May the source be with you.


Top
 Profile  
 
 Post subject: Re:Bochs: math_abort: MSDOS compatibility FPU exception
PostPosted: Fri Sep 08, 2006 4:47 am 
Offline
Member
Member
User avatar

Joined: Mon Jun 26, 2006 8:41 am
Posts: 128
Location: Millicent, South Australia
Thanks Pipe,

I think I have worked out the problem, you cant share hard drive images between bochs and qemu.

On qemu, the hard drive works perfectly in linux & my os, on bochs e2fsck trashes it on both linux & bochs.

I did e2fsck on linux on qemu, and tested and had a clean filesystem, then i booted the same linux on bochs and forced e2fsck to run (it thought it was clean as it should) but it found hundreds of duplicate blocks

So my operating system was doing the correct thing by wrecking everything :-/

_________________
Mort OS - Blog


Top
 Profile  
 
 Post subject: Re:Bochs: math_abort: MSDOS compatibility FPU exception
PostPosted: Fri Sep 08, 2006 5:41 am 
Offline
Member
Member
User avatar

Joined: Wed Oct 18, 2006 2:31 am
Posts: 5964
Location: In a galaxy, far, far away
spix wrote:
I think I have worked out the problem, you cant share hard drive images between bochs and qemu.


"Sharing" ?? you mean, like running bochs and QEMU at the same time on the same HDD.img file so that you could format/populate it with linux and read it with YourOS?

That wouldn't be a wise thing to do, indeed.

_________________
Image May the source be with you.


Top
 Profile  
 
 Post subject: Re:Bochs: math_abort: MSDOS compatibility FPU exception
PostPosted: Fri Sep 08, 2006 6:46 am 
Offline
Member
Member
User avatar

Joined: Mon Jun 26, 2006 8:41 am
Posts: 128
Location: Millicent, South Australia
Well, I did that a few times ;D I thought if I unmounted it first...

but this last test i shut down qemu before starting bochs.. maybe the image was no good because of my past indiscretions.

[edit]
I was wrong on a couple of things, linux 2.2 does use software task switching, and bochs seems to work fine with qemu images.
[/edit]

_________________
Mort OS - Blog


Top
 Profile  
 
 Post subject: Re:Bochs: math_abort: MSDOS compatibility FPU exception
PostPosted: Sun Sep 10, 2006 10:52 am 
Offline
Member
Member
User avatar

Joined: Mon Jun 26, 2006 8:41 am
Posts: 128
Location: Millicent, South Australia
Sorry for bringing this up again, I'm still having problems and hope someone can help..

I've fixed the FPU exception, each time an IRQ fires my irq fault handler checks if the FPU has been used and if it has, saves the state. Then it resets the task switch flag in cr0 and so next time it restores via the int 7 device missing fault.

in qemu this seems to work, adding some printf to the int 7 handler i can see it is getting called when it should. In bochs, it does a similar thing, but bochs' timer seems to be on steroids, so it does the int 7 handler routine a lot more.

Bochs still trashes the filesystem, yet qemu does not

At first I thought, what if e2fsck (the offending program) was in the middle of a floating point operation when the kernel interrupted, and the kernel wanted to do a floating point operation in that interrupt. So I have done some tests, and no that doesn't appear to be the case.

So, I am no longer getting FPU exceptions, e2fsck thinks it finished with a job well done and marks the filesystem clean. After that the filesystem is almost completely messed up.

Would it be reasonable to assume that the FPU isn't the cause of this problem?

Andrew

_________________
Mort OS - Blog


Top
 Profile  
 
 Post subject: Re:Bochs: math_abort: MSDOS compatibility FPU exception
PostPosted: Sun Sep 10, 2006 11:22 am 
Offline
Member
Member
User avatar

Joined: Tue Oct 17, 2006 11:33 pm
Posts: 3882
Location: Eindhoven
spix wrote:
I've fixed the FPU exception, each time an IRQ fires my irq fault handler checks if the FPU has been used and if it has, saves the state. Then it resets the task switch flag in cr0 and so next time it restores via the int 7 device missing fault.

Does that actually fix it?
Quote:
in qemu this seems to work, adding some printf to the int 7 handler i can see it is getting called when it should. In bochs, it does a similar thing, but bochs' timer seems to be on steroids, so it does the int 7 handler routine a lot more.

Bochs' handler isn't on steroids, it's set to 18.2 hertz in the given IPS. If your actual IPS is much higher, it'll fire too often. If your handler takes longer than the IPS / clocktime you'll get a stack overflow (oops on that).
Quote:
At first I thought, what if e2fsck (the offending program) was in the middle of a floating point operation when the kernel interrupted, and the kernel wanted to do a floating point operation in that interrupt. So I have done some tests, and no that doesn't appear to be the case.

What would e2fsck do with floating point? As far as I'm concerned, that's too unreliable in rounding, especially for a bit-level correct thing as a file system.
Quote:
So, I am no longer getting FPU exceptions, e2fsck thinks it finished with a job well done and marks the filesystem clean. After that the filesystem is almost completely messed up.

Would it be reasonable to assume that the FPU isn't the cause of this problem?

Might it be an exception mapped to the IRQ there? Did you remap IRQs? Could it be that you're receiving exception 13 instead?


Top
 Profile  
 
 Post subject: Re:Bochs: math_abort: MSDOS compatibility FPU exception
PostPosted: Sun Sep 10, 2006 12:00 pm 
Offline
Member
Member
User avatar

Joined: Mon Jun 26, 2006 8:41 am
Posts: 128
Location: Millicent, South Australia
Quote:
Does that actually fix it?

Yes, I started out not doing anything with the FPU and I was getting irq 0x13 causing the message in the topic with bochs. I modified cr0 to give a floating point exception instead of an external IRQ, I also have set the TS bit so that when a floating point operation happens it generates a device not found exception which is where I clear the TS bit and restore the FPU state. Next context change the FPU state is saved and the TS bit is set again. I no longer recieve the IRQs and I have set the FPU exception handler to generate a SIGFPE for the task involved, that should exit the program (or if it has a handler, do that) So yeah, it *seems* to be fixed.

Quote:
Bochs' handler isn't on steroids, it's set to 18.2 hertz in the given IPS. If your actual IPS is much higher, it'll fire too often. If your handler takes longer than the IPS / clocktime you'll get a stack overflow (oops on that).


Well it seems to run a lot faster than it should. I recalibrate the timer phase to 100Hz and use the CPU timer to count the time. It's always gains too much time in bochs.

Quote:
What would e2fsck do with floating point? As far as I'm concerned, that's too unreliable in rounding, especially for a bit-level correct thing as a file system.


I have no idea why it wants to use the FPU. My ls command I made uses the FPU to calculate human readable sizes.

Quote:
Might it be an exception mapped to the IRQ there? Did you remap IRQs? Could it be that you're receiving exception 13 instead?


No, I am pretty sure it's not generating IRQ 13, it was doing that at the beginning, as I had the NE bit cleared. Yeah IRQs are remapped.

I'm really not sure what the problem is, I'm just trying to work out what the problem isn't. I don't think it is the FPU as it's no longer generating exceptions, and like you said, what does e2fsck need that for anyway.

Thanks for the reply,
I might try this on a real PC and see what happens.

Andrew

_________________
Mort OS - Blog


Top
 Profile  
 
 Post subject: Re:Bochs: math_abort: MSDOS compatibility FPU exception
PostPosted: Sun Sep 10, 2006 2:32 pm 
Offline
Member
Member
User avatar

Joined: Tue Oct 17, 2006 11:33 pm
Posts: 3882
Location: Eindhoven
spix wrote:
Quote:
What would e2fsck do with floating point? As far as I'm concerned, that's too unreliable in rounding, especially for a bit-level correct thing as a file system.


I have no idea why it wants to use the FPU. My ls command I made uses the FPU to calculate human readable sizes.

Human-readable sizes without FPU:
Code:
#include <stdio.h>

typedef unsigned long long uint64_t;

// change to 1000 for SI-compatible numbers
#define KILOBYTE 1024

char *human_readable(char *buffer, int n, uint64_t size) {
  static char indexes[] = " kMGTPE";
  // clean up the interface if you're using this
//  assert(n == 10);
  int idx = 0;
  int cidx = n-1;
  while (size > 10*KILOBYTE) { size /= KILOBYTE; idx++; }
  buffer[cidx--] = '\0';
  buffer[cidx--] = 'B';
  if (idx) buffer[cidx--] = indexes[idx];
  buffer[cidx--] = ' ';
  if (size == 0) {
    buffer[cidx--] = '0';
  } else {
    while (size > 0) {
      buffer[cidx--] = '0' + size % 10;
      size /= 10;
    }
  }
  return buffer + cidx + 1;
}

void try(uint64_t long_num) {
  char buffer[20];
  buffer[10] = 0;
  printf("%s = %08x %08x\n", human_readable(buffer, 10, long_num), long_num);
}

int main() {
  try(239587);
  try(4294967296ULL);
  try(23507012394780145ULL);
  try(1677721600000000000ULL);
  try(0);
  try(42);
}


Quote:
I'm really not sure what the problem is, I'm just trying to work out what the problem isn't. I don't think it is the FPU as it's no longer generating exceptions, and like you said, what does e2fsck need that for anyway.


Just ffs, assume it does use the FPU. Does e2fsck init its own fpu or does it assume it's inited when it starts up?


Top
 Profile  
 
 Post subject: Re:Bochs: math_abort: MSDOS compatibility FPU exception
PostPosted: Sun Sep 10, 2006 3:14 pm 
Offline
Member
Member
User avatar

Joined: Mon Jun 26, 2006 8:41 am
Posts: 128
Location: Millicent, South Australia
Quote:
Human-readable sizes without FPU:


Thanks.. the part that uses the FPU is when the human readable size is only 1 digit, then it uses a float, for example 1k would appear 1.0k (or 1.3k or whatever.)

Quote:
Just ffs, assume it does use the FPU. Does e2fsck init its own fpu or does it assume it's inited when it starts up?


I don't understand what you mean by this. My understanding is, when you use code that requires floating point arithmetic, gcc compiles in the necessary FPU op-codes, e2fsck is written in C. I understand it is the job of the OS to init the FPU.

_________________
Mort OS - Blog


Top
 Profile  
 
 Post subject: Re:Bochs: math_abort: MSDOS compatibility FPU exception
PostPosted: Mon Sep 11, 2006 10:52 am 
Offline
Member
Member
User avatar

Joined: Tue Oct 17, 2006 11:33 pm
Posts: 3882
Location: Eindhoven
Does your OS initialize (FINIT) the FPU before use? I would say it either does or your ls would also have failed. It could be that the default settings for the FPU that GCC assumes is different from your setup.


Top
 Profile  
 
 Post subject: Re:Bochs: math_abort: MSDOS compatibility FPU exception
PostPosted: Mon Sep 11, 2006 12:22 pm 
Offline
Member
Member
User avatar

Joined: Mon Jun 26, 2006 8:41 am
Posts: 128
Location: Millicent, South Australia
Quote:
Does your OS initialize (FINIT) the FPU before use? I would say it either does or your ls would also have failed. It could be that the default settings for the FPU that GCC assumes is different from your setup.


Yeah my OS does initialize the FPU.

I think the FPU thing might have been a red herring. I'm pretty sure I've fixed the FPU but still having the problem with the filesystem. I am thinking It might be my IDE driver that is at fault. I installed my OS on a real PC, and that didn't even mount the filesystem. I think maybe Qemu is really tolerant with Disk I/O, bochs perhaps less so.

Anyway, I'm in the process of rewriting the IDE stuff, so maybe that will fix it.

Thanks for your help.

_________________
Mort OS - Blog


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 15 posts ] 

All times are UTC - 6 hours


Who is online

Users browsing this forum: Bing [Bot], Majestic-12 [Bot] and 266 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group