Page 1 of 1
VirtualBox causing random OpCode Exception
Posted: Mon Mar 18, 2013 9:34 pm
by greyOne
I'm not even sure this is a question at this point;
I've run into the strangest problem while using VirtualBox and launching ELF files.
I basically setup a test case for myself to first run an ELF executable, then switch page directories,
And then finally run the ELF again in the new directory.
The ELF file in question was compiled with my toolchain in C++.
The whole case runs just fine in Bochs, just fine in VMWare, and just fine on 2 real machines as well.
However, in VirtualBox, I get the strangest issues I've ever run into.
After mapping all the requested pages from the ELF file,
(I'm sure of this; I traced the output, I have a printf statement between loading and executing)
It requests anywhere from 1 to 7 absolutely bogus pages (random each time, and not sequential)
And dies with an Invalid OpCode Exception.
VirtualBox doesn't even provide me with sufficient tools to debug this; (as far as I know)
I'm afraid it might be a minor issue somewhere that's causing the issue, and may come up later.
Has anyone run into anything similar?
Cheers.
Re: VirtualBox causing random OpCode Exception
Posted: Mon Mar 18, 2013 9:36 pm
by Kazinsal
Before blaming the tools (which 99 times out of 100 it's not), double-check your code. Load it up in bochs' debugger and single-step through the ELF loader.
My crystal ball suggests you're smashing the stack somewhere.
Re: VirtualBox causing random OpCode Exception
Posted: Mon Mar 18, 2013 9:43 pm
by greyOne
Blacklight wrote:Before blaming the tools (which 99 times out of 100 it's not), double-check your code. Load it up in bochs' debugger and single-step through the ELF loader.
My crystal ball suggests you're smashing the stack somewhere.
But it runs just fine in bochs...
I'll give it a try.
EDIT: And my stack looks just fine. Doesn't look like that was the issue.
I've got some odd values in some of the registers though...
Trying to figure out how they get there.
EDIT: Nope, those are fine too.
Re: VirtualBox causing random OpCode Exception
Posted: Tue Mar 19, 2013 1:42 am
by xenos
greyOne wrote:But it runs just fine in bochs...
That doesn't mean anything. Some of the bugs in my kernel became visible only in AMD SimNow!, while the kernel was running fine in Bochs, QEMU and VirtualBox. And recently I had a problem that occurred only when the kernel was loaded with GRUB2, while it was running fine in all of those simulators when loaded with GRUB Legacy:
I got a page fault in kernel mode whenever a thread started using the FPU / SSE. It turned out that the "device not available" handler tried to reload the SSE state from some bogus address - but there was no saved SSE state yet. This was caused by an uninitialized pointer to the saved SSE state. The "device not available" handler checks this pointer to see whether there is a saved state to reload, or whether just to freshly initialize the FPU / SSE. When the kernel was loaded by GRUB Legacy, the pointer was NULL "by accident", because this part of memory was never used before (and set to 0 by Bochs). But GRUB2 used it and so my pointer contained some leftover garbage.
Conclusion: Even if your code works on n machines, it might crash on a different one. (And never use uninitialized pointers...)
Re: VirtualBox causing random OpCode Exception
Posted: Tue Mar 19, 2013 8:44 am
by HugeCode
I'm just going to shoot... Do you have some loop or halt on the end of your code? I've had same problem not very long time ago, and I spent about an hour until I found out that only thing wrong is missing stopping instruction... There can also be cli missing before hlt.
Re: VirtualBox causing random OpCode Exception
Posted: Tue Mar 19, 2013 8:59 am
by Combuster
I have this nice gadget that is quite effective at making garbage values visible across all VMs/emulators. Just fill all the RAM you might get to touch with 0xCCs before you continue doing crazy stuff. Just make sure you don't go overwriting things the bootloader might have added for you.
Code: Select all
MOV ESP, kernelimageend+0x2000 ; make room for stack
AND ESP, 0xfffff000 ; align on page boundary
; fill from end-of-image up to 4M with predictable garbage
MOV EDI, ESP
MOV EAX, 0xCCCCCCCC
MOV ECX, 0x00400000
SUB ECX, ESP
SHR ECX, 2
CLD
REP STOSD
Edit: Credit where credit's due, idea shamelessly copied from Brendan.
Re: VirtualBox causing random OpCode Exception
Posted: Tue Mar 19, 2013 9:59 am
by xenos
Nice one... Maybe I should fill all initially and freshly mapped pages with garbage as well
And I think there used to be a Bochs rewriting named Rebochs that offers a "nasty" mode in which RAM is not cleared at startup, but filled with garbage.
Re: VirtualBox causing random OpCode Exception
Posted: Tue Mar 19, 2013 10:34 am
by greyOne
With some further testing, I've managed to conclude that whatever the issue is,
It's probably not with loading the ELF into memory. The random faults happen somewhere
Between (inclusive) the call to the program entry point, and the first line of the program.
And yet, whatever the issue is, it's causing the program to not return as well.
EDIT:
Suddenly I remembered I had some loose code in my crt0. Maybe that's the issue...
EDIT:
After fixing my crt0 and rebuilding newlib, I get some progress.
In all 3 program I was testing with, I no longer get an invalid opcode,
But it still requests the bogus pages in one of them, and the return value is incorrect.
It also doesn't seem to run the entire program...
Still working on this.
Re: VirtualBox causing random OpCode Exception
Posted: Tue Mar 19, 2013 4:09 pm
by greyOne
Still no idea what the issue was, but the problem was solved by switching stacks.
I guess that's the end to that.
EDIT: Huh. It seems GRUB's default stack location has caused such problems before.