VirtualBox causing random OpCode Exception

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
greyOne
Member
Member
Posts: 58
Joined: Sun Feb 03, 2013 10:38 pm
Location: Canada

VirtualBox causing random OpCode Exception

Post by greyOne »

I'm not even sure this is a question at this point;
I've run into the strangest problem while using VirtualBox and launching ELF files.

I basically setup a test case for myself to first run an ELF executable, then switch page directories,
And then finally run the ELF again in the new directory.
The ELF file in question was compiled with my toolchain in C++.

The whole case runs just fine in Bochs, just fine in VMWare, and just fine on 2 real machines as well.
However, in VirtualBox, I get the strangest issues I've ever run into.

After mapping all the requested pages from the ELF file,
(I'm sure of this; I traced the output, I have a printf statement between loading and executing)
It requests anywhere from 1 to 7 absolutely bogus pages (random each time, and not sequential)
And dies with an Invalid OpCode Exception.

VirtualBox doesn't even provide me with sufficient tools to debug this; (as far as I know)
I'm afraid it might be a minor issue somewhere that's causing the issue, and may come up later.
Has anyone run into anything similar?
Cheers.
User avatar
Kazinsal
Member
Member
Posts: 559
Joined: Wed Jul 13, 2011 7:38 pm
Libera.chat IRC: Kazinsal
Location: Vancouver
Contact:

Re: VirtualBox causing random OpCode Exception

Post by Kazinsal »

Before blaming the tools (which 99 times out of 100 it's not), double-check your code. Load it up in bochs' debugger and single-step through the ELF loader.

My crystal ball suggests you're smashing the stack somewhere.
greyOne
Member
Member
Posts: 58
Joined: Sun Feb 03, 2013 10:38 pm
Location: Canada

Re: VirtualBox causing random OpCode Exception

Post by greyOne »

Blacklight wrote:Before blaming the tools (which 99 times out of 100 it's not), double-check your code. Load it up in bochs' debugger and single-step through the ELF loader.

My crystal ball suggests you're smashing the stack somewhere.
But it runs just fine in bochs...
I'll give it a try.

EDIT: And my stack looks just fine. Doesn't look like that was the issue.
I've got some odd values in some of the registers though...
Trying to figure out how they get there.

EDIT: Nope, those are fine too.
User avatar
xenos
Member
Member
Posts: 1121
Joined: Thu Aug 11, 2005 11:00 pm
Libera.chat IRC: xenos1984
Location: Tartu, Estonia
Contact:

Re: VirtualBox causing random OpCode Exception

Post by xenos »

greyOne wrote:But it runs just fine in bochs...
That doesn't mean anything. Some of the bugs in my kernel became visible only in AMD SimNow!, while the kernel was running fine in Bochs, QEMU and VirtualBox. And recently I had a problem that occurred only when the kernel was loaded with GRUB2, while it was running fine in all of those simulators when loaded with GRUB Legacy:

I got a page fault in kernel mode whenever a thread started using the FPU / SSE. It turned out that the "device not available" handler tried to reload the SSE state from some bogus address - but there was no saved SSE state yet. This was caused by an uninitialized pointer to the saved SSE state. The "device not available" handler checks this pointer to see whether there is a saved state to reload, or whether just to freshly initialize the FPU / SSE. When the kernel was loaded by GRUB Legacy, the pointer was NULL "by accident", because this part of memory was never used before (and set to 0 by Bochs). But GRUB2 used it and so my pointer contained some leftover garbage.

Conclusion: Even if your code works on n machines, it might crash on a different one. (And never use uninitialized pointers...)
Programmers' Hardware Database // GitHub user: xenos1984; OS project: NOS
HugeCode
Member
Member
Posts: 112
Joined: Mon Dec 17, 2012 9:12 am

Re: VirtualBox causing random OpCode Exception

Post by HugeCode »

I'm just going to shoot... Do you have some loop or halt on the end of your code? I've had same problem not very long time ago, and I spent about an hour until I found out that only thing wrong is missing stopping instruction... There can also be cli missing before hlt.
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: VirtualBox causing random OpCode Exception

Post by Combuster »

I have this nice gadget that is quite effective at making garbage values visible across all VMs/emulators. Just fill all the RAM you might get to touch with 0xCCs before you continue doing crazy stuff. Just make sure you don't go overwriting things the bootloader might have added for you.

Code: Select all

                        MOV ESP, kernelimageend+0x2000  ; make room for stack
                        AND ESP, 0xfffff000             ; align on page boundary

                        ; fill from end-of-image up to 4M with predictable garbage
                        MOV EDI, ESP
                        MOV EAX, 0xCCCCCCCC
                        MOV ECX, 0x00400000
                        SUB ECX, ESP
                        SHR ECX, 2
                        CLD
                        REP STOSD
Edit: Credit where credit's due, idea shamelessly copied from Brendan.
Last edited by Combuster on Tue Mar 19, 2013 10:14 am, edited 1 time in total.
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
User avatar
xenos
Member
Member
Posts: 1121
Joined: Thu Aug 11, 2005 11:00 pm
Libera.chat IRC: xenos1984
Location: Tartu, Estonia
Contact:

Re: VirtualBox causing random OpCode Exception

Post by xenos »

Nice one... Maybe I should fill all initially and freshly mapped pages with garbage as well ;) And I think there used to be a Bochs rewriting named Rebochs that offers a "nasty" mode in which RAM is not cleared at startup, but filled with garbage.
Programmers' Hardware Database // GitHub user: xenos1984; OS project: NOS
greyOne
Member
Member
Posts: 58
Joined: Sun Feb 03, 2013 10:38 pm
Location: Canada

Re: VirtualBox causing random OpCode Exception

Post by greyOne »

With some further testing, I've managed to conclude that whatever the issue is,
It's probably not with loading the ELF into memory. The random faults happen somewhere
Between (inclusive) the call to the program entry point, and the first line of the program.

And yet, whatever the issue is, it's causing the program to not return as well.

EDIT:
Suddenly I remembered I had some loose code in my crt0. Maybe that's the issue...

EDIT:
After fixing my crt0 and rebuilding newlib, I get some progress.
In all 3 program I was testing with, I no longer get an invalid opcode,
But it still requests the bogus pages in one of them, and the return value is incorrect.
It also doesn't seem to run the entire program...

Still working on this.
greyOne
Member
Member
Posts: 58
Joined: Sun Feb 03, 2013 10:38 pm
Location: Canada

Re: VirtualBox causing random OpCode Exception

Post by greyOne »

Still no idea what the issue was, but the problem was solved by switching stacks.
I guess that's the end to that.

EDIT: Huh. It seems GRUB's default stack location has caused such problems before.
Post Reply