Kernel binary image Broken on GCC (formerly using MinGW)

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
User avatar
Dave08
Posts: 8
Joined: Mon May 27, 2019 11:32 am
Location: France
Contact:

Kernel binary image Broken on GCC (formerly using MinGW)

Post by Dave08 »

Hi all,

I'm currently facing an issue which is somewhat related to OS programming. Before I lose my hair, I'm gonna try to explain it without running in circles.

Context
I'm working on the kernel of my homemade x86 OS project. I have my own MBR and stage 2 bootloaders written in real mode assembly, and assembled in raw flat binary images. For the kernel, I use a combination of GCC and NASM to generate an ELF version of my kernel, which I then translate to a flat binary using OBJCOPY. I then put everything in a HDD image which I boot using QEMU. The kernel is loaded and stays at physical address 0x10000.
As a note : the kernel doesn't start immediately with C code, but with a short assembly file, which performs some other required initialization before eventually calling the kernel's main C function.

The Issue
However, as it turns out, using a big build.sh to build the entire kernel is not very handy. So with the rapidly growing amount of source files, I decided to transition to Makefiles.
I am building under Windows, using WSL (Ubuntu 20.04). While writing my kernel's makefile, I accidentally targeted GCC, LD and OBJCOPY from a MinGW install on Windows. The build was working nonetheless, and the kernel was working perfectly, but my goal here is to make building the OS portable (able to build on other systems seamlessly).
I went over and installed gcc-i686-linux-gnu, still on WSL, and when I went and rewrote the Makefile to use the newly installed compiler, things exploded... a lot.

What Happened
For starters, I was shocked to find that my binary kernel, which was only 38 KiB before, had inflated to almost a 100 KiB. I figured after searching online that this was due to the line ". = 0x10000" in my LD script, which forced the code to be relocated 0x10000 bytes after the beginning of the file (that was the same script, but no error on MinGW???)
After working around this line and rebuilding again, my binary was still enormous, but the code was gone (what?). The text mode splash screen was still fine though :D

So I went online and investigated the new problem. I made more modifications, had another brand new issue... Tried again with different options, same issue as before but slightly different... and you can see how this quickly devolves into insanity. In total, during my research, I have had :
  • Sections popping in the wrong order in the final binary file (.rodata before .text)
  • Nothing changing, even after playing with countless options like -ffreestanding, -nostdinc, -nostdlib, -fomit-frame-pointer, etc...
  • Undefined references to literally any cross-referencing symbol (ASM to C, or C to ASM)
  • Using the "-r" option (partial link) to circumvent undefined references, only to see that it just ignores the errors and doesn't actually solve them
  • The assembly head referencing the main C function, translating it to address 0, even though READELF showed a non-zero offset? And no, I didn't forget the underscore preceding the function name
  • Reverting everything : linker script, build.sh... and instead update my build.sh to the new GCC compiler, which unsurprisingly leads to the same errors
  • Tried everything above on a Linux Mint 20 setup (because I have a dual boot, so I might as well) and encountering the same issues over and over again.
To be clear, the binary image I want (and have been having before I moved from MinGW to this new compiler) can just be loaded with a good old int 0x13 interrupt call and executed simply by jumping at the load address, implying that the startup code is the first thing present in the file. I have tested it countless times, and it never failed to load. Only instances where it crashed is either because I didn't load enough sectors, or loaded starting from the wrong LBA address.

Finally, the question(s)
So... how are MinGW utilities able to just compile and output a perfect binary, but other packages can't? Who is right and who is wrong? Is GNU's GCC just a gigantic nightmare of non-portability, or is it MinGW cutting a lot of corners to make everything work without frustration for the Windows devs out there who wanna pretend to Linux?
I'm pretty sure I am getting some of Linux's ecosystem wrong here, in that GCC and OBJCOPY don't actually work the way I expect them to, because I'm now used to the MinGW versions working with minimal effort... but still. I can't be the only one experiencing this issue. No forum is talking about this!

I hope you can please help me on this problem. I know I can stick to MinGW for the time being, but I really want to be able to build my OS on a native Linux distro.
Let me know if you need any other information / file / code that would be useful.



-----



Resources

ld.txt, used with MinGW :

Code: Select all

OUTPUT_ARCH("i386")

SECTIONS
{
   . = 0x10000;
   
   .text . : {
      *(.text)
      . = ALIGN(4);
   }
   
   .data . : {
      *(.data)
      . = ALIGN(4);
   }
   
   .bss . : {
      *(.bss)
      . = ALIGN(4);
   }
   
   .rodata . : {
      *(.rodata)
      . = ALIGN(4);
   }
}
Octocontrabass
Member
Member
Posts: 5560
Joined: Mon Mar 25, 2013 7:01 pm

Re: Kernel binary image Broken on GCC (formerly using MinGW)

Post by Octocontrabass »

Dave08 wrote:my goal here is to make building the OS portable (able to build on other systems seamlessly).
I went over and installed gcc-i686-linux-gnu,
A compiler that targets bare metal is portable. A compiler that targets Linux is not portable.
Dave08 wrote:I figured after searching online that this was due to the line ". = 0x10000" in my LD script, which forced the code to be relocated 0x10000 bytes after the beginning of the file (that was the same script, but no error on MinGW???)
Your searches found the wrong answer. The actual problem is that your LD script doesn't tell the linker what to do with all the sections in your object files, so something ended up at address 0. You can use objdump or readelf to see the section names and compare them to what's in your LD script.
Dave08 wrote:Sections popping in the wrong order in the final binary file (.rodata before .text)
This also points to a problem with your LD script.
Dave08 wrote:Nothing changing, even after playing with countless options like -ffreestanding, -nostdinc, -nostdlib, -fomit-frame-pointer, etc...
Unless you've worked out some crazy workaround, make won't rebuild your object files if you're only changing the compiler options.
Dave08 wrote:Undefined references to literally any cross-referencing symbol (ASM to C, or C to ASM)
The 32-bit Windows ABI requires the C compiler to add an underscore prefix to all symbols, which means referring to those same symbols in assembly requires an underscore prefix that isn't present in C. The System V i386 psABI does not have this name mangling, which means there is no extra prefix added anywhere, and you'll need to remove the extra underscores from your assembly (or add them to your C) to get linking to work.
Dave08 wrote:The assembly head referencing the main C function, translating it to address 0, even though READELF showed a non-zero offset? And no, I didn't forget the underscore preceding the function name
What underscore? If there's an extra underscore in your assembly that isn't in your C, those are two different symbols.
Dave08 wrote:So... how are MinGW utilities able to just compile and output a perfect binary, but other packages can't?
Part of it is that your code isn't portable between the Windows ABI and the System V ABI, part of it is that you have bugs that coincidentally didn't cause any problems when you were using MinGW.
Dave08 wrote:Who is right and who is wrong?
Both are right.
Dave08 wrote:

Code: Select all

      *(.text)
You need to use a wildcard such as "(.text*)" or "(.text .text.*)" to capture all the sections your compiler might generate. This also applies to .data, .rodata, and .bss.
Dave08 wrote:

Code: Select all

      . = ALIGN(4);
You might need more than 4-byte alignment.

You probably need to add a /DISCARD/ section to your LD script.
User avatar
Dave08
Posts: 8
Joined: Mon May 27, 2019 11:32 am
Location: France
Contact:

Re: Kernel binary image Broken on GCC (formerly using MinGW)

Post by Dave08 »

Hi again, thank you for the detailed answer. I have taken into account some of your remarks.
Octocontrabass wrote:A compiler that targets bare metal is portable. A compiler that targets Linux is not portable.
I actually installed gcc-i686-linux-gnu, naively thinking it could work because it was the closest one available to the recommended toolchain (bare elf wasn't available on apt). I actually apologize for being generally blind about the page you linked. I did read it before posting, but didn't notice the download links at the bottom. So I have downloaded i386-elf, now I just need to set it up with my makefile.
Octocontrabass wrote:Your searches found the wrong answer. The actual problem is that your LD script doesn't tell the linker what to do with all the sections in your object files, so something ended up at address 0. You can use objdump or readelf to see the section names and compare them to what's in your LD script.
[...]
This also points to a problem with your LD script.
[...]
You probably need to add a /DISCARD/ section to your LD script.
Regarding LD configuration, that was just me being negligent with the linker script.
Although oddly enough, MinGW did output a different binary using the same script and options for LD, which I still don't understand. I am going to go and look at a guide for LD scripts though.
Octocontrabass wrote:Unless you've worked out some crazy workaround, make won't rebuild your object files if you're only changing the compiler options.
I was making sure to clean every object and binary file before recompiling, so that wasn't the issue there. I didn't use any obscure workaround either.
Octocontrabass wrote:The 32-bit Windows ABI requires the C compiler to add an underscore prefix to all symbols, which means referring to those same symbols in assembly requires an underscore prefix that isn't present in C. The System V i386 psABI does not have this name mangling, which means there is no extra prefix added anywhere, and you'll need to remove the extra underscores from your assembly (or add them to your C) to get linking to work.
My mistake here is that I wasn't aware of the ABI differences. I thought the additional underscore for C linking was standard, but it only applies to Windows it seems.
Something funny also, if you go to elixir.bootlin.com and open very early versions (e.g. v0.01) of the Linux kernel, the assembly source files do use underscores for external symbols. Now granted, the thing is dated 1991, but I don't think Linus programmed on DOS.



Bottom line is that my problem rose from misconceptions about how compiler toolchains work.
For now, I actually found a MinGW cross-compiler for building Windows software on Linux (yes, that's a thing, available with "apt install"). I'm going to be using this compiler for two reasons. One is that I don't want to go through all of my assembly files and remove the underscores to conform to the System V ABI just yet. The second is that I have not finished setting up the elf compiler, and that I'd rather continue development for a bit before diving back in.

I'll probably reply under this post to update it when I get this compiler working, or not.
Post Reply