Page 1 of 1

GCC optimization causing an invalid opcode

Posted: Mon Jan 09, 2012 5:21 pm
by lackerty
I've been a long time stalker of the forums but I've met a stumper that I cannot pass. Five hours of battle with this error and it is still winning.

After several false-starts (albeit it valuable learning experiences) I've decided to officially start my project.

GCC optimizations are causing the Invalid Opcode exceptions though whenever I call my second or third functions. I have my init() function (the entry point from the bootloader), print, printHex, and clearscreen. But calling either printHex or clearscreen causes the invalid opcode exception. Both functions call the print function; printHex simply creates a string of characters that is the ascii representation of the number to be printed and calls print with that string while clearscreen calls print 25 times with a string of 80 <space key> characters (Hackish method yes, but as these functions are only until I have a graphics driver I just need them to work until then).

Any level of optimizations causes this so I do know that it is something activated with -O1 but I'm not sure what specifically. I thought it was -fauto-inc-dec but when I turned off optimizations and turned only it on there was no error. I decided to go through the entire list of -O1 optimizations and enable them all one at a time, getting no invalid opcode errors. The only error I did get was a no compile-time error when I enabled -fdelayed-branch, with GCC claiming that the target (x86_64-elf with -m64 and -mtune=corei7) does not support delayed branches. Could it be that when optimization is enabled it doesn't check for target support of each individual optimization? And if so does this mean I must compile with this particular optimization turned off?

Or is this the mark of a source code error that only optimization brings out? Unfortunately despite having all warnings turned on and have all of them treated as an error the compiler builds the kernel without any sort of error or warning and bochs simply fails whenever a call is made to the clearscreen or printHex functions. I can call print as much as I want though.

Sorry if this is fairly obvious.

EDIT: Some non-readable characters appear in the first two positions on the screen, which happened before for me when I had an include file that contained code before the entry point as it started executing random code that corresponded to functions and their data that were misaligned but now I'm using a linker script with my init() function set as the entry point. As the script tells it to output a flat binary and that the init() is the entry point shouldn't it just create a jump/call to init() at the very beginning? Another reason why I doubt it is this is because the two included files only contain typedefs, defines, and macros. Nothing that ends up being actually compiled into code. The only other thing before the init() function in the actually source file is function declarations which also don't get compiled into actual code as the source from them are after the init() function so it doesn't seem like it can be a case of improper entry but it sure is looking like one.

Re: GCC optimization causing an invalid opcode

Posted: Mon Jan 09, 2012 7:20 pm
by gerryg400
I have my init() function (the entry point from the bootloader), print, printHex, and clearscreen.
How do you get to your init function from your bootloader ? You have some sort of asm code that initialises the c environment and calls init() ?

Re: GCC optimization causing an invalid opcode

Posted: Mon Jan 09, 2012 9:44 pm
by lackerty
My bootloader is Pure64. I followed the instructions included with it, using my cross-compiler to compile the supplied skeleton kernel using the supplied linker script. The only change I made was renaming main to init (and changed the return type to void) and going into the linker's script and changing the entry point from main to init. After that I began to add in my own code. This has worked perfectly fine until now.

I switched it back to main and the problem persists.

As an attempt to diagnose the problem I commented out the calls to print() in clearScreen() and printHex(). The kernel no longer crashes with an invalid opcode exception unless it was compiled with -O1 optimization. -O3, -O2, and no optimization work fine now (albeit, the clearScreen() and printHex() do nothing without the calls to print()).

So I know the problem is with my code and that the kernel's init() function is being called from the bootloader as I am seeing the results of print() calls from main. It just appears to be that when print() is called from anywhere else that an issue occurs. And when -O1 is used. But I think that might be a sign of a separate issue since it only crops up then. Or it could be the same issue and just be manifesting itself stronger under -O1. I'll have to get it working under -O3, -O2, and no optimization first to find out.

EDIT: It's not the call to print() from clearScreen() or printHex() itself that causes the crash. I had the init() function print to the second line and the last two lines, all three of which would not be overwritten by the crash error message when the Pure64 interrupt is called after the invalid opcode. They could not be seen however after the resulting crash. So for some reason including calls to print() in these two functions cause something else to change in the compiled code that causes the crash. I'm going to compile the kernel with the calls and dissasemble it and then compile it again without them and compare a dissasembly of that one to the previous one to see what's so different. I'd think it would be just a couple of lines (moving the inputs for the print() function into the register/stack and then the calls to the function itself) but this suggests it is not quite so.

EDIT2: As I suspected upon seeing the lack of those printed lines at least 20-30 instructions are moved between the two versions. With the calls it starts with a compare, test, jump if equal, a dec, and then a load effective address. Without the calls in the two functions this small block of code isn't seen until offset 0x80 into the file. 128 bytes of code prefix them and the last instruction before this block of code in the version without the calls is 0x7E: jmp short 0x7E, the while(1) loop in init() that guarantees that it never returns.

As a point of interest I decided to look at the disassembly of -O1 without the calls and it has the same 5 instructions as the version with the calls, the only difference being that the jump if equal is to a slightly nearer address (probably because -O3 uses speed optimizations that can make the resulting binary larger).

EDIT3: I looked for jmp shorts in the two versions that result in a crash and they appear very late in the file. It appears that part of the optimization process moves stuff around, probably to try and align the code better while still adding as few 0x00s in it as possible to keep the size down. So I guess the simple solution would be to use an asm stub for now on and make sure the end is aligned on a 16/32/64/128/something to that effect to maintain alignement of the binary.

EDIT4: I'm having a bit of trouble getting the assembly stub working as I'm not using elf so calling external functions is difficult but I'm pretty sure that this will solve this. I should've been using this from the beginning.

Re: GCC optimization causing an invalid opcode

Posted: Tue Jan 10, 2012 1:47 am
by gerryg400
I believe the Pure64 loader will load your kernel and jmp to 0x100000. That's fine if your C code begins there but it probably doesn't. Setting the ENTRY to main() or init() doesn't help because Pure64 doesn't know that you've done that.

So, if you use the Pure64 loader you need an assy stub whose text segment begins at 0x100000.

What issue are you having with the stub ?

Re: GCC optimization causing an invalid opcode

Posted: Tue Jan 10, 2012 3:17 am
by lackerty
Yeah, you're right. Up until optimizations are added and things start moving around just loading the compiled C code and jumping to it works.

I was having troubles because when the output of NASM is a binary object file it doesn't support external references but I decided to just try using an elf64 object file a moment ago and see if it would work. The linker didn't complain and everything boots and runs as expected now at all optimization levels.

An interesting lesson in debugging for me. If it looks like a duck and quacks like a duck then it is a duck. This applies to programming errors too apparently.