Page 1 of 1

A "mov" instruction can produce a Divide-by-zero Error?

Posted: Sat Jan 11, 2014 12:10 am
by windows8
This is the instruction which produced a Divide-by-zero Error.

Code: Select all

      ret = (*info->handler)(reg,info->data); 
"info->handler" has been checked,it is right.
When I run it in QEMU and Bochs,nothing will be wrong.
But when I run it in VirtualBox,sometimes it will produce Divide-by-zero Error.
(Only sometimes.But it is in doIRQ function,so it is called lots of times!!!!)

The asm instruction which produced a Divide-by-zero Error is:

Code: Select all

(0) [0x000000102ddd] 0008:0000008000102ddd (unk. ctxt): mov rax, qword ptr ds:[rax] ; 488b00

(This code is from Bochs,but bochs never produced this exception.)
When this code runs,%rax is 0x80001187c0. It is also right!!!
Of course,%ds should be right,too!!

Why this instruction produced a Divide-by-zero Error????
Thanks!

(You can see the exception in attachment.)

Re: A "mov" instruction can produce a Divide-by-zero Error?

Posted: Sat Jan 11, 2014 12:25 am
by Brendan
Hi,

It's impossible for a "mov" instruction (on its own) to result in a divide error.
windows8 wrote:The asm instruction which produced a Divide-by-zero Error is:

Code: Select all

(0) [0x000000102ddd] 0008:0000008000102ddd (unk. ctxt): mov rax, qword ptr ds:[rax] ; 488b00
I think you mean "the instruction that was overwritten by trash used to be:"...


Cheers,

Brendan

Re: A "mov" instruction can produce a Divide-by-zero Error?

Posted: Sat Jan 11, 2014 1:27 am
by windows8
Brendan wrote: I think you mean "the instruction that was overwritten by trash used to be:"...
Thanks. I tried to run the code below in exception just now.

Code: Select all

printk(" %d",*(u32 *)regs->rip);
It shows "0x48008b48".
Memory and CPU is little-endian,so it will be saved in memory as same as below:

Code: Select all

48 8b 00 48
The "48 8b 00" is as same as "mov rax, qword ptr ds:[rax]". (Bochs said so.)

I wrote main.S :

Code: Select all

_start:
   .int 0x48008b48
Next,

Code: Select all

as -o main.o main.S
ld -o main -e_start main.o
objdump -S main
It prints:

Code: Select all

  4000e8:       48 8b 00                mov    (%rax),%rax
It is same as "mov rax, qword ptr ds:[rax]",too!!!

So,this code is not overwritten by trash?

Re: A "mov" instruction can produce a Divide-by-zero Error?

Posted: Sat Jan 11, 2014 2:53 am
by Brendan
windows8 wrote:
Brendan wrote:I think you mean "the instruction that was overwritten by trash used to be:"...
Thanks. I tried to run the code below in exception just now.
windows8 wrote:So,this code is not overwritten by trash?
It definitely looks like the instruction wasn't overwritten (but this doesn't necessarily prove it wasn't).

Let's start from the start. We only really know 3 facts:
  • There are only 2 instructions that can possible cause a divide error (DIV and IDIV)
  • Something has caused your interrupt handler to be invoked
  • The interrupt handler says that the "DS:return EIP" points to the MOV instruction
We don't actually know that your interrupt handler was started by a divide error. For example, the instruction before the MOV could have been "int 0x00" and you'd probably get the same symptoms. In theory it could also have been caused by an IRQ (if the PIC chips and or APIC/s are misconfigured). Another alternative is that your IDT is messed up (e.g. maybe somehow the "divide error" exception handler got installed in IDT entry 0x14 where the page fault handler should be, and the MOV causes a page fault that starts your divide error exception handler).

We don't know what the virtual address was, we only know the offset within the segment. It's possible that CS base got messed up somehow, and the instruction at "messed up CS base + offset" is a DIV instruction, even though the instruction at "correct CS base + offset" is MOV (and even though the instruction at "correct DS base + offset" is the same MOV). It's extremely likely that when the CPU started the exception handler it loaded a new value into CS (which would hide the problem) and also likely that your exception handler loads DS.

Next (assuming it was a divide error, and assuming that the virtual address is right), we only know that the instruction was a MOV some time before the exception occurred and that the instruction is a MOV after the exception occurred. We don't actually know that the instruction wasn't a DIV at the exact instant that the CPU tried to execute it; and (at least in theory) there are a few things that can cause this. Failing to invalidate a TLB entry at the right time is one of them (e.g. the CPU gets the DIV instruction using a stale TLB entry, but the TLB entry happens to be "fixed" by the time the exception is started). Another random (and extremely unlikely) possibility is some strange interaction with a device (e.g. a bus mastering PCI device that happens to corrupt and then fix the instruction).

Now; I'm not saying any of these things are likely (they all sound very unlikely to me, and I'm sure they all sound very unlikely to you too). However, all of these things are more likely than a MOV instruction causing a divide error.


Cheers,

Brendan

Re: A "mov" instruction can produce a Divide-by-zero Error?

Posted: Sat Jan 11, 2014 3:30 am
by windows8
Brendan wrote:For example, the instruction before the MOV could have been "int 0x00"
I'm not sure it.I am going to check it soon.
Brendan wrote:Another alternative is that your IDT is messed up
I think that it was not happened. All exceptions will be handled in handleException.
I can get the exception number from regs.exceptionNumber.
My OS has produced "General Protection Fault" and "Page Fault". They are no problems.
Brendan wrote:It's possible that CS base got messed up
But if this happened,exception handler can't work,too.
(doIRQ runs in kernel mode,an exception will never change cs [at least in my OS].)
(My OS has only two code selectors,one of them is used by user programs.)
Brendan wrote:Failing to invalidate a TLB entry at the right time is one of them
Addresses which >= 512GB are kernel page tables,I set them to global pages.
They are never changed even if my os enter user mode.

Thanks again.

Re: A "mov" instruction can produce a Divide-by-zero Error?

Posted: Sat Jan 11, 2014 6:11 am
by bluemoon
windows8 wrote:

Code: Select all

printk(" %d",*(u32 *)regs->rip);
Are you sure that regs has the correct value? I suggest you do break point debugging.
windows8 wrote: Addresses which >= 512GB are kernel page tables,I set them to global pages.
They are never changed even if my os enter user mode.
TLB is a limited resource, global flag is only a hint for the chip and does not mean it never get flushed away.

Re: A "mov" instruction can produce a Divide-by-zero Error?

Posted: Wed Jan 15, 2014 3:30 am
by windows8
bluemoon wrote: Are you sure that regs has the correct value? I suggest you do break point debugging.
If the 'regs' saved the wrong value,nothing will be true.
The whole exception handler depends the 'regs'.

I want to do break point debugging,too!
But I know Bochs can debug asm code,qemu + gdb can debug C code.
Both of them never produces this exception.Only Virtual Box produces this exception.
I don't know how to do break point debugging in VirtualBox......
Thanks!

Re: A "mov" instruction can produce a Divide-by-zero Error?

Posted: Wed Jan 15, 2014 4:35 am
by Brendan
Hi,
windows8 wrote:I want to do break point debugging,too!
But I know Bochs can debug asm code,qemu + gdb can debug C code.
Both of them never produces this exception.Only Virtual Box produces this exception.
I don't know how to do break point debugging in VirtualBox......
VirtualBox does have a built-in debugger (sort of like the one in Bochs, but less good). Sadly, it's also a very poorly documented feature - the only information I've found is in this part of the virtualBox manual.


Cheers,

Brendan

Re: A "mov" instruction can produce a Divide-by-zero Error?

Posted: Wed Jan 15, 2014 6:38 am
by Combuster
Both of them never produces this exception.Only Virtual Box produces this exception.
That symptom would be typical for a case of uninitialized memory.