Kernel binary image Broken on GCC (formerly using MinGW)
Posted: Fri Mar 15, 2024 2:44 pm
Hi all,
I'm currently facing an issue which is somewhat related to OS programming. Before I lose my hair, I'm gonna try to explain it without running in circles.
Context
I'm working on the kernel of my homemade x86 OS project. I have my own MBR and stage 2 bootloaders written in real mode assembly, and assembled in raw flat binary images. For the kernel, I use a combination of GCC and NASM to generate an ELF version of my kernel, which I then translate to a flat binary using OBJCOPY. I then put everything in a HDD image which I boot using QEMU. The kernel is loaded and stays at physical address 0x10000.
As a note : the kernel doesn't start immediately with C code, but with a short assembly file, which performs some other required initialization before eventually calling the kernel's main C function.
The Issue
However, as it turns out, using a big build.sh to build the entire kernel is not very handy. So with the rapidly growing amount of source files, I decided to transition to Makefiles.
I am building under Windows, using WSL (Ubuntu 20.04). While writing my kernel's makefile, I accidentally targeted GCC, LD and OBJCOPY from a MinGW install on Windows. The build was working nonetheless, and the kernel was working perfectly, but my goal here is to make building the OS portable (able to build on other systems seamlessly).
I went over and installed gcc-i686-linux-gnu, still on WSL, and when I went and rewrote the Makefile to use the newly installed compiler, things exploded... a lot.
What Happened
For starters, I was shocked to find that my binary kernel, which was only 38 KiB before, had inflated to almost a 100 KiB. I figured after searching online that this was due to the line ". = 0x10000" in my LD script, which forced the code to be relocated 0x10000 bytes after the beginning of the file (that was the same script, but no error on MinGW???)
After working around this line and rebuilding again, my binary was still enormous, but the code was gone (what?). The text mode splash screen was still fine though
So I went online and investigated the new problem. I made more modifications, had another brand new issue... Tried again with different options, same issue as before but slightly different... and you can see how this quickly devolves into insanity. In total, during my research, I have had :
Finally, the question(s)
So... how are MinGW utilities able to just compile and output a perfect binary, but other packages can't? Who is right and who is wrong? Is GNU's GCC just a gigantic nightmare of non-portability, or is it MinGW cutting a lot of corners to make everything work without frustration for the Windows devs out there who wanna pretend to Linux?
I'm pretty sure I am getting some of Linux's ecosystem wrong here, in that GCC and OBJCOPY don't actually work the way I expect them to, because I'm now used to the MinGW versions working with minimal effort... but still. I can't be the only one experiencing this issue. No forum is talking about this!
I hope you can please help me on this problem. I know I can stick to MinGW for the time being, but I really want to be able to build my OS on a native Linux distro.
Let me know if you need any other information / file / code that would be useful.
-----
Resources
ld.txt, used with MinGW :
I'm currently facing an issue which is somewhat related to OS programming. Before I lose my hair, I'm gonna try to explain it without running in circles.
Context
I'm working on the kernel of my homemade x86 OS project. I have my own MBR and stage 2 bootloaders written in real mode assembly, and assembled in raw flat binary images. For the kernel, I use a combination of GCC and NASM to generate an ELF version of my kernel, which I then translate to a flat binary using OBJCOPY. I then put everything in a HDD image which I boot using QEMU. The kernel is loaded and stays at physical address 0x10000.
As a note : the kernel doesn't start immediately with C code, but with a short assembly file, which performs some other required initialization before eventually calling the kernel's main C function.
The Issue
However, as it turns out, using a big build.sh to build the entire kernel is not very handy. So with the rapidly growing amount of source files, I decided to transition to Makefiles.
I am building under Windows, using WSL (Ubuntu 20.04). While writing my kernel's makefile, I accidentally targeted GCC, LD and OBJCOPY from a MinGW install on Windows. The build was working nonetheless, and the kernel was working perfectly, but my goal here is to make building the OS portable (able to build on other systems seamlessly).
I went over and installed gcc-i686-linux-gnu, still on WSL, and when I went and rewrote the Makefile to use the newly installed compiler, things exploded... a lot.
What Happened
For starters, I was shocked to find that my binary kernel, which was only 38 KiB before, had inflated to almost a 100 KiB. I figured after searching online that this was due to the line ". = 0x10000" in my LD script, which forced the code to be relocated 0x10000 bytes after the beginning of the file (that was the same script, but no error on MinGW???)
After working around this line and rebuilding again, my binary was still enormous, but the code was gone (what?). The text mode splash screen was still fine though
So I went online and investigated the new problem. I made more modifications, had another brand new issue... Tried again with different options, same issue as before but slightly different... and you can see how this quickly devolves into insanity. In total, during my research, I have had :
- Sections popping in the wrong order in the final binary file (.rodata before .text)
- Nothing changing, even after playing with countless options like -ffreestanding, -nostdinc, -nostdlib, -fomit-frame-pointer, etc...
- Undefined references to literally any cross-referencing symbol (ASM to C, or C to ASM)
- Using the "-r" option (partial link) to circumvent undefined references, only to see that it just ignores the errors and doesn't actually solve them
- The assembly head referencing the main C function, translating it to address 0, even though READELF showed a non-zero offset? And no, I didn't forget the underscore preceding the function name
- Reverting everything : linker script, build.sh... and instead update my build.sh to the new GCC compiler, which unsurprisingly leads to the same errors
- Tried everything above on a Linux Mint 20 setup (because I have a dual boot, so I might as well) and encountering the same issues over and over again.
Finally, the question(s)
So... how are MinGW utilities able to just compile and output a perfect binary, but other packages can't? Who is right and who is wrong? Is GNU's GCC just a gigantic nightmare of non-portability, or is it MinGW cutting a lot of corners to make everything work without frustration for the Windows devs out there who wanna pretend to Linux?
I'm pretty sure I am getting some of Linux's ecosystem wrong here, in that GCC and OBJCOPY don't actually work the way I expect them to, because I'm now used to the MinGW versions working with minimal effort... but still. I can't be the only one experiencing this issue. No forum is talking about this!
I hope you can please help me on this problem. I know I can stick to MinGW for the time being, but I really want to be able to build my OS on a native Linux distro.
Let me know if you need any other information / file / code that would be useful.
-----
Resources
ld.txt, used with MinGW :
Code: Select all
OUTPUT_ARCH("i386")
SECTIONS
{
. = 0x10000;
.text . : {
*(.text)
. = ALIGN(4);
}
.data . : {
*(.data)
. = ALIGN(4);
}
.bss . : {
*(.bss)
. = ALIGN(4);
}
.rodata . : {
*(.rodata)
. = ALIGN(4);
}
}