Strange behaviour when cross-compiling w/clang and gcc

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
User avatar
cakehonolulu
Member
Member
Posts: 37
Joined: Thu Jun 16, 2016 9:35 am
Libera.chat IRC: cakehonolulu

Strange behaviour when cross-compiling w/clang and gcc

Post by cakehonolulu »

Hello everyone!

I'm currently writing an MBR bootloader for x86; as a learning exercise.

It uses FAT16 as of right now and has the usual stuff: BPB, root directory parsing and file loading to memory (The second stage of the bootloader) etc...

I've recently upgraded the host binutils to (I think) 2.39 and all of a sudden a bunch of new warnings have appeared.

One of them was 'missing .note.GNU-stack section implies executable stack'

Which I easily solved by supplying the linker a simple no-exec-stack flag.

The other 2 ones, are kinda giving me a bad time.

I'm talking about: 'warning: <file> has a LOAD segment with RWX permissions'

and: warning: relocation in read-only section `.text' /usr/bin/ld: warning: creating DT_TEXTREL in a PIE

For the first one, (Based on https://www.redhat.com/zh/blog/linkers- ... d-segments) I specified a 4K ALIGN param between each section of the linker script; that looks like it fixed it.

And for the second one... Well, I've added the regular no-pie and no-pic flags (CFLAGS-> -fno-pic -fno-pie ; LDFLAGS -> -no-pie -nostdlib -static) and for some reason, under gcc, the warning doesn't appear anymore and the code works and boots correctly; but under clang, it's a completely different story.

The code straight up doesn't work.

Under a debugger I can see this:

Code: Select all

Breakpoint 1, 0x00007c00 in ?? ()
(gdb) s
Cannot find bounds of current function
(gdb) 
Cannot find bounds of current function
Which is strange because code seems to get compiled perfectly fine.

I'll leave the gcc debugger output for reference:

Code: Select all

Breakpoint 1, 0x00007c00 in init0_fat16 ()
(gdb) s
Single stepping until exit from function init0_fat16,
which has no line number information.
53		xor %ax, %ax							# Xor'ing ax to ax, results in a 0, as xor'ing two registers with
(gdb) 
55		mov %ax, %ds							# Move 0x0 to the data segment register.
(gdb)
After further inspection, I found two things.

If I remove both -no-pie and -static from the link flags, it works under clang.

If I only remove -static, I get the 'Cannot find bounds of current function' in gdb again.

Or, if I remove -no-pie, the emulator (qemu in this case) constantly reboots (It's triple faulting).

I'm not really sure why this is happening and I'd love some guidance if possible, I've tried a bunch of things I've thought but it doesn't look like it fixes it.

I could go gcc-only but I think I'd be missing some stuff I really appreciate from clang (And I also like having code that has 0 to no warnings)

Repository:
https://github.com/cakehonolulu/atom

Thanks for reading!
Last edited by cakehonolulu on Fri Sep 30, 2022 2:22 am, edited 1 time in total.
Octocontrabass
Member
Member
Posts: 5563
Joined: Mon Mar 25, 2013 7:01 pm

Re: Strange behaviour when cross-compiling w/clang and gcc

Post by Octocontrabass »

cakehonolulu wrote:Under a debugger I can see this:

Code: Select all

Breakpoint 1, 0x00007c00 in ?? ()
(gdb) s
Cannot find bounds of current function
(gdb) 
Cannot find bounds of current function
Which is strange because code seems to get compiled perfectly fine.
That means the debugging information is missing. Either the assembler isn't generating it in the first place, or the linker is stripping it from the resulting binary.
cakehonolulu wrote:I'm not really sure why this is happening and I'd love some guidance if possible, I've tried a bunch of things I've thought but it doesn't look like it fixes it.
I think we might need to see your build scripts. Better yet, a link to your online repository (if you have one).
User avatar
cakehonolulu
Member
Member
Posts: 37
Joined: Thu Jun 16, 2016 9:35 am
Libera.chat IRC: cakehonolulu

Re: Strange behaviour when cross-compiling w/clang and gcc

Post by cakehonolulu »

Octocontrabass wrote:
cakehonolulu wrote:Under a debugger I can see this:

Code: Select all

Breakpoint 1, 0x00007c00 in ?? ()
(gdb) s
Cannot find bounds of current function
(gdb) 
Cannot find bounds of current function
Which is strange because code seems to get compiled perfectly fine.
That means the debugging information is missing. Either the assembler isn't generating it in the first place, or the linker is stripping it from the resulting binary.
cakehonolulu wrote:I'm not really sure why this is happening and I'd love some guidance if possible, I've tried a bunch of things I've thought but it doesn't look like it fixes it.
I think we might need to see your build scripts. Better yet, a link to your online repository (if you have one).
First of all, thanks for reading and answering!

I'll leave the repository here and I'll edit the main post too just in case:
https://github.com/cakehonolulu/atom

Sorry for the code quality, I tend to comment code more but I've been busy with other stuff irl and I've not been able to 'prettify' the code.

I've discovered that disabling PIE under Clang makes it break right here (It might break further down the line too, but I can't test it because it doesn't even load my second stage binary):
https://github.com/cakehonolulu/atom/bl ... ot0.S#L105

It's a div instruction, I was thinking maybe a divide by zero of some sort? But it's strange because It'd fault. right?

Maybe disabling PIE under Clang makes it so that 'variables' declared under .text somehow get messed?

About the repository, what I'm testing currently rests in the bootloader/i386/fat and bootloader/i386/fat/stage2 directories, both have Makefiles w/the compilation steps.

As for building the actual FAT16 image:

Code: Select all

dd if=/dev/zero of=hdd.img bs=1 count=0 seek=10M status=none
mkfs.fat -F 16 hdd.img
cp bootloader/$(ARCH)/fat/stage2/stage2.bin STAGE2.BIN
mcopy -i hdd.img STAGE2.BIN ::
dd conv=notrunc if=bootloader/$(ARCH)/fat/boot0.bin of=hdd.img bs=512 seek=$(HDD_MBR_SECTOR) status=none
Basically, create a blank 10Meg file, format it as FAT16, push the stage2 file onto the filesystem and overwrite the first sector of the disk with the boot0 code (Loaded at 0x7C00 by the bios).

EDIT:

I also want to ask, is it okay if I try to fix those warnings or should I ignore them?
If I'm honest, I'd love to get them sorted out, as I stated in the main post, I love fixing warnings, it's one of the things uni has taught me to always do, but considering the nature of them. I'm questioning if I should fix them or not (I've just checked other people's solutions, even on popular repositories; and they tend to remove those warnings with a linker flag altogether, but if I did that I'd be ignoring the issue, at least that's what I think)

Thanks for reading!
Octocontrabass
Member
Member
Posts: 5563
Joined: Mon Mar 25, 2013 7:01 pm

Re: Strange behaviour when cross-compiling w/clang and gcc

Post by Octocontrabass »

cakehonolulu wrote:I'll leave the repository here and I'll edit the main post too just in case:
You're using Clang with a bare metal target, which means Clang is calling your host GCC. You may be able to convince Clang to call your cross-GCC instead, but the only way to prevent it from calling GCC is to use a different (hosted) target.
cakehonolulu wrote:I've discovered that disabling PIE under Clang makes it break right here (It might break further down the line too, but I can't test it because it doesn't even load my second stage binary):
Break how?
cakehonolulu wrote:It's a div instruction, I was thinking maybe a divide by zero of some sort? But it's strange because It'd fault. right?
If CX isn't greater than DX, it will cause a #DE exception. You're using source-level debugging (step) instead of instruction-level debugging (stepi) so your debugger might be confused by the sudden jump to the BIOS exception handler.
cakehonolulu wrote:I also want to ask, is it okay if I try to fix those warnings or should I ignore them?
That depends. Are you doing the things that cause those warnings intentionally, or do they indicate mistakes in your code?

The "<file> has a LOAD segment with RWX permissions" warning means you've created a binary that mixes code and writable data. In an ordinary application or an OS kernel, you might want to avoid that because it's a security risk. In a real mode bootloader, you might do it on purpose to make your binary smaller.

The "relocation in read-only section `.text'" and "creating DT_TEXTREL in a PIE" warnings are similar: you've written code that isn't position-independent, but the linker is trying to create a position-independent binary. I think this warning indicates a real problem, since you're not trying to write a position-independent bootloader.
MichaelPetch
Member
Member
Posts: 797
Joined: Fri Aug 26, 2016 1:41 pm
Libera.chat IRC: mpetch

Re: Strange behaviour when cross-compiling w/clang and gcc

Post by MichaelPetch »

cakehonolulu wrote:It's a div instruction, I was thinking maybe a divide by zero of some sort? But it's strange because It'd fault. right?
DIV CX actually divides DX:AX by CX. You originally clear DX and then it is clobbered. DX is being used as part of the division
and may be causing a division overflow (same exception as division by zero). Before doing the DIV %CX you probably meant to set DX to 0?

Debugging 16-bt code in GDB can very problematic since GDB doesn't understand segmentation (see the answers to this Stackoverflow question ). To debug the real mode portion of the bootloader I'd consider using BOCHS which has a built in debugger that understands real mode and segment:offset addressing. The downside is that the symbolic debugger is limited in BOCHS but given that this code you are debugging is early in your code debugging it without symbols would be trivial.
User avatar
cakehonolulu
Member
Member
Posts: 37
Joined: Thu Jun 16, 2016 9:35 am
Libera.chat IRC: cakehonolulu

Re: Strange behaviour when cross-compiling w/clang and gcc

Post by cakehonolulu »

Octocontrabass wrote:You may be able to convince Clang to call your cross-GCC instead, but the only way to prevent it from calling GCC is to use a different (hosted) target.:
Which is strange, because everything under gcc works meravelously... Anyway, specifiying a target triplet and an arch isn't enough to cross-compile w/clang?
Talking about this bits on the wiki: https://wiki.osdev.org/LLVM_Cross-Compiler , more precisely:

"An example for compiling to a generic X86 ELF target would be:

Code: Select all

--target=i686-pc-none-elf -march=i686
Maybe I also need to specify the linker to use (ld.lld)?
Octocontrabass wrote:Break how?
It reboots each time it gets there (Compiled under clang, not gcc); I can't really assure what sort of fault it is since I can't use Bochs under Ubuntu for some reason (Segmentation Faults) and qemu isn't giving me much fault information other than CPU Reset (Which is normal).
Octocontrabass wrote:If CX isn't greater than DX, it will cause a #DE exception. You're using source-level debugging (step) instead of instruction-level debugging (stepi) so your debugger might be confused by the sudden jump to the BIOS exception handler.
True, i'll test right away w/stepi instead and see what it returns me.
Octocontrabass wrote:That depends. Are you doing the things that cause those warnings intentionally, or do they indicate mistakes in your code?

The "<file> has a LOAD segment with RWX permissions" warning means you've created a binary that mixes code and writable data. In an ordinary application or an OS kernel, you might want to avoid that because it's a security risk. In a real mode bootloader, you might do it on purpose to make your binary smaller.
Actually, I don't really mind having code and writable data under the .text section (At least for this part of the initialization process); but it really something to take into account when I'm developing things further down the line.
Octocontrabass wrote:The "relocation in read-only section `.text'" and "creating DT_TEXTREL in a PIE" warnings are similar: you've written code that isn't position-independent, but the linker is trying to create a position-independent binary. I think this warning indicates a real problem, since you're not trying to write a position-independent bootloader.
How so? It's not enough to specify an entry point in the linker script w/the starting offset for the binary? Also, where should I check for position-dependent code I've written?

Thanks for all your answers and your time!
Octocontrabass
Member
Member
Posts: 5563
Joined: Mon Mar 25, 2013 7:01 pm

Re: Strange behaviour when cross-compiling w/clang and gcc

Post by Octocontrabass »

cakehonolulu wrote:Anyway, specifiying a target triplet and an arch isn't enough to cross-compile w/clang?
Not for bare metal targets. It works fine for hosted targets.
cakehonolulu wrote:Maybe I also need to specify the linker to use (ld.lld)?
I was never able to figure out how to get it to work. Clang just passes -fuse-ld to GCC.
cakehonolulu wrote:I can't really assure what sort of fault it is since I can't use Bochs under Ubuntu for some reason (Segmentation Faults) and qemu isn't giving me much fault information other than CPU Reset (Which is normal).
You need to turn off hardware acceleration to get better fault information out of QEMU. Try adding "-accel tcg" to your QEMU options.
cakehonolulu wrote:
Octocontrabass wrote:you're not trying to write a position-independent bootloader.
How so?
Your bootloader is always loaded to the same address.
cakehonolulu wrote:It's not enough to specify an entry point in the linker script w/the starting offset for the binary?
The entry point has nothing to do with whether your binary is position-independent. The linker's default is to produce a position-independent binary, so it gives you warnings when it finds things that are typically not allowed in position-independent code.
cakehonolulu wrote:Also, where should I check for position-dependent code I've written?
Which parts of your code would break if you loaded it at a different address without changing the linker script? Those are the position-dependent parts.
MichaelPetch
Member
Member
Posts: 797
Joined: Fri Aug 26, 2016 1:41 pm
Libera.chat IRC: mpetch

Re: Strange behaviour when cross-compiling w/clang and gcc

Post by MichaelPetch »

cakehonolulu wrote: It reboots each time it gets there (Compiled under clang, not gcc); I can't really assure what sort of fault it is since I can't use Bochs under Ubuntu for some reason (Segmentation Faults) and qemu isn't giving me much fault information other than CPU Reset (Which is normal).
Have you tried running QEMU with options -no-reboot -no-shutdown -d int

Regarding BOCHS on Ubuntu - any chance that is 22.04? If so have you tried building BOCHS from source? Others have experienced problems as well
User avatar
cakehonolulu
Member
Member
Posts: 37
Joined: Thu Jun 16, 2016 9:35 am
Libera.chat IRC: cakehonolulu

Re: Strange behaviour when cross-compiling w/clang and gcc

Post by cakehonolulu »

Octocontrabass wrote:Not for bare metal targets. It works fine for hosted targets.
Alright, good to know.
Octocontrabass wrote:I was never able to figure out how to get it to work. Clang just passes -fuse-ld to GCC.
Now that you mention that; I've just added -fuse-ld=lld if compiling w/clang (And also enabled all the pie-related flags back) and for some reason the bootloader doesn't fault anymore (It prints R's non-stop as it's not able to load the second stage off the filesystem, but that'd need some further debugging on my end).

So, it's linking w/llvm's linker now or am I missing something (Also, for the record, I'm not currently specifying clang to use the i686 bare metal cross compiler I built as I currently don't know how to do so)?
Octocontrabass wrote:You need to turn off hardware acceleration to get better fault information out of QEMU. Try adding "-accel tcg" to your QEMU options.
Alright, good to know too.
Octocontrabass wrote:Your bootloader is always loaded to the same address.
True.
Octocontrabass wrote:The entry point has nothing to do with whether your binary is position-independent. The linker's default is to produce a position-independent binary, so it gives you warnings when it finds things that are typically not allowed in position-independent code.
Makes sense, the stage 2 gets linked to be loaded at phys 0x7E00 so it has to be position dependent by design; that's why I'm not understanding those warnings (Also, it's why I'm explicitly specifying all the no-pie flags).

Not sure if I explained it correctly or if it made sense; if that's the case I'll try to find a better way to explain it.
Octocontrabass wrote:Which parts of your code would break if you loaded it at a different address without changing the linker script? Those are the position-dependent parts.
True, I'll take a look.

Thanks again for your help!
User avatar
cakehonolulu
Member
Member
Posts: 37
Joined: Thu Jun 16, 2016 9:35 am
Libera.chat IRC: cakehonolulu

Re: Strange behaviour when cross-compiling w/clang and gcc

Post by cakehonolulu »

MichaelPetch wrote:DIV CX actually divides DX:AX by CX. You originally clear DX and then it is clobbered. DX is being used as part of the division
and may be causing a division overflow (same exception as division by zero). Before doing the DIV %CX you probably meant to set DX to 0?

Debugging 16-bt code in GDB can very problematic since GDB doesn't understand segmentation (see the answers to this Stackoverflow question ). To debug the real mode portion of the bootloader I'd consider using BOCHS which has a built in debugger that understands real mode and segment:offset addressing. The downside is that the symbolic debugger is limited in BOCHS but given that this code you are debugging is early in your code debugging it without symbols would be trivial.
Noted, I should probably take a look into that...

I'll see if I can get bochs to work; currently downloading and compiling from source.
MichaelPetch wrote:Have you tried running QEMU with options -no-reboot -no-shutdown -d int
Indeed, but I discarded -no-reboot because I found a post on osdev's forums mentioning that it sometimes discarded fault information (I think this has probably been fixed in more recent qemu versions, but just to be sure as I have not found a changelog mentioning the bugfix).
MichaelPetch wrote:Regarding BOCHS on Ubuntu - any chance that is 22.04? If so have you tried building BOCHS from source? Others have experienced problems as well
22.10 actually, but looks like the same problem. I'll try compiling it from source to see if it makes any difference.

Thanks for your time, greatly appreciate all the help!

EDIT:

After compiling from source and fixing my bochsrc, I can now use bochs again!
User avatar
cakehonolulu
Member
Member
Posts: 37
Joined: Thu Jun 16, 2016 9:35 am
Libera.chat IRC: cakehonolulu

Re: Strange behaviour when cross-compiling w/clang and gcc

Post by cakehonolulu »

For the record, I was finally able to fix the unbootable clang build by using LLVM's linker (Passing -fuse-ld=lld to the linker) and by fixing my linker script (For some reason I forgot to specify rodata and that was throwing the whole thing off).

A little screenshot showcasing the result:
Attachments
osdev-min.png
Post Reply