Page 1 of 2

Why so many custom-build toolchains?

Posted: Fri Jul 06, 2018 1:49 am
by anta40
I guess I spent too much time lurking on OSDev board.
Time to start studying and coding.

I looked at some hobby OSes in Github, and 1 common thing among them is the need to 'build your own GCC'.
I don't understand it. Assuming you are only targeting X86, isn't the standard GCC is enough?
Of course there's the -ffreestanding and you can always implement some features in assembly.

Perhaps I'm missing something here.

Thank you :)

Re: Why so many custom-build toolchains?

Posted: Fri Jul 06, 2018 1:58 am
by Korona
For freestanding code, using an existing GCC works as long as it is configured similarly. However, using its libgcc will almost never work (e.g. using a Linux libgcc will break without glibc) as libgcc calls malloc()/free() if they are available at libgcc build time. Furthermore, keep in mind that not all code is freestanding code. If you are compiling user-space programs for your OS, you will almost surely need a new gcc. For example, a OS-specific GCC will have __youros__ instead of __linux__ and it will not try to use features that are actually not available in your OS (e.g. dynamic linking).

Re: Why so many custom-build toolchains?

Posted: Fri Jul 06, 2018 4:56 am
by Solar
Try this:

Code: Select all

gcc -dumpmachine
It will give something like this:

Code: Select all

x86_64-linux-gnu
Usually, only one out of three will match YourOS -- the x86_86... and the other two are not just for show.

Re: Why so many custom-build toolchains?

Posted: Fri Jul 06, 2018 9:49 am
by simeonz
It never became perfectly clear to me too, to be honest. :) The wiki is very adamant about using a cross-compiler, but I am not sure what exactly varies between a hosted linux target and a freestanding elf one, which cannot be controlled from the command line. I am curious more than anything, because making a comparison like that can also provide insight into the toolchain and runtime. Here are the things I can think of:
  • The start files (i.e. crti.o, crtn.o, crtbegin.o, crtend.o) could have different contents (or be absent), but they are not relevant when using -nostartfiles or -nostdlib.
  • The language runtime in libgcc and libstdc++ may vary, but it is not relevant with -nostdlib.
  • The default search paths for system libraries and headers may have to be modified by using --sysroot, -nostdinc, -isystem.
  • Depending on the binutils configuration as well, certain aspects like the choice between .init or .init_array for hooking the initialization funclets will vary. If both are supported on the target or neither is used in the executable, this could be unimportant. Otherwise, I don't think that the choice can be controlled at compile-/link-time - it is burned-in when building binutils.
  • It could affect the built-in specs, which govern how the gcc driver treats the options and maps them to the auxiliary build tools, such as the assembler and linker. But the spec files usually govern the defaults primarily, which can be overridden. And besides, you don't have to link using the gcc driver.
  • The built-in macros will be determined by the target, which will change the behavior of certain headers. This can be worked around with -undef, -U, and -D.
  • The assumed behavior of the standard library routines may change (from undefined to defined), but this can be controlled with -ffreestanding and -fno-builtin.
I don't believe that the code generation will be impacted.

Edit: crti.o, crtn.o, crt1.o come from glibc, but they are suppressed with -nostartfiles all the same.
Edit2: To think of it, there are other things affected, like the supported list of linker emulations in binutils (, which you pass with -m). The hosted environment should have more comprehensive support, rather than more restrictive. There is something called multilib mapping, which I think is burned-in. Honestly, I am not sure how it works, but sounds like it shouldn't be relevant here.

Re: Why so many custom-build toolchains?

Posted: Fri Jul 06, 2018 1:05 pm
by nielsd
There's something else too that hasn't been said yet.
If you have multiple people with different versions of the compiler, you might run into problems where the code generation differs.
A custom toolchain is also important when you want to have an OS specific toolchain. For example: You don't want to have to modify Makefiles with alot of compiler flags when you port software and need to compile it on your host system.

Re: Why so many custom-build toolchains?

Posted: Fri Jul 06, 2018 1:47 pm
by simeonz
It also occurred to me that the linux kernel is built with the standard toolchain from the distribution, and it runs in a freestanding environment. They do all the hacks necessary to adjust the paths, remove the startup files, disable vectorization, disable the red zone, etc. So, although it may not be the most elegant solution, it is not bad enough to be deemed inadequate for the kernel builds. Now, a hosted compiler for a custom OS is a different story.

Re: Why so many custom-build toolchains?

Posted: Sat Jul 07, 2018 2:13 am
by Korona
The Linux guys do not link to libgcc, (AFAICS) basically because Torvalds does not trust the GCC developers to not **** up. Instead they reimplement functions like long division. You can do this and get away with the Linux GCC but it will be more work and you have to know what you're doing. Also note that the Linux userland does use a OS-specific toolchain. You cannot build the Linux userland with e.g. a FreeBSD gcc, no matter what flags you pass to it.

Re: Why so many custom-build toolchains?

Posted: Sat Jul 07, 2018 2:56 pm
by simeonz
Korona wrote:The Linux guys do not link to libgcc, (AFAICS) basically because Torvalds does not trust the GCC developers to not **** up. Instead they reimplement functions like long division. You can do this and get away with the Linux GCC but it will be more work and you have to know what you're doing.
I would like to ask actually, is there some kind of official position from the gcc guys (written or spoken) regarding the context usage of the elf targets, like i686-elf and x86_64-elf. I mean, is it intended for environments without red-zone and demand-saving of the FPU and SSE context, or is it intended for targets merely without libc and dynamic loader support. The former case would be a narrowing of the amd64 ABI, which seems counter intuitive to a name like x86_64-elf. For i686-elf, libgcc is I think compiled without vectorization and red zone, but that seems to be implied by the i686 architecture and ABI, not by the target as such. I haven't checked for x86_64-elf, because I don't have a build readily available, but in any case - is there some kind of official statement on the kind of assumptions that those targets make?

Re: Why so many custom-build toolchains?

Posted: Sat Jul 07, 2018 3:43 pm
by Korona
I don't think there is a written statement, but you can look at the source: *-elf libgcc is compiled with the default compiler flags IIRC. That means that the x86_64 libgcc does assume a red-zone. However, you should be able to change that by changing the default spec file definitions in the target headers (i.e. the gcc/config directory). I have not verified whether libgcc keeps working if you do that but it is on my TODO list.

Re: Why so many custom-build toolchains?

Posted: Sat Jul 07, 2018 10:09 pm
by simeonz
Korona wrote:I don't think there is a written statement, but you can look at the source: *-elf libgcc is compiled with the default compiler flags IIRC. That means that the x86_64 libgcc does assume a red-zone.
I see. I didn't find mno-red-zone anywhere in the gcc and libgcc configuration files, but wanted another opinion, in case the story is more convoluted (i.e. implied options from -fbuilding-libgcc, etc). Anyway, if that is the case, then the elf targets are designed for unknown System V ABI compliant environment. If anyone wants to use them in a kernel context, it is their responsibility to know their job. Fair enough. Thanks for clarifying.
Korona wrote:However, you should be able to change that by changing the default spec file definitions in the target headers (i.e. the gcc/config directory).
Actually, it appears that the wiki has a page about that. They even demonstrate multilib mapping.

Re: Why so many custom-build toolchains?

Posted: Mon Jul 09, 2018 12:56 am
by Solar
simeonz wrote:The wiki is very adamant about using a cross-compiler, but I am not sure what exactly varies between a hosted linux target and a freestanding elf one, which cannot be controlled from the command line.
One, those command line options varied, depending on host and target. The forum was a very busy place with all the questions about "why is this not working" and "why is that not working" and people being asked about what their setup was and getting explanations on why they had to set their command line up just so. (Remember, there are MinGW users out there, and Cygwin users as well...) Going cross-compiler leveled the playing field for everybody. It's kind of the OSDev variant of Stackoverflow's "Minimal, Complete, and Verifiable Example". It also massively reduced the number of people going some "custom" way with e.g. DJGPP etc.; it reduced the "which toolchain is best" discussions significantly and also made many Wiki entries much simpler.

Two, another rather common issue was that people went #include <stdio.h> and then asking why printf() wasn't working in their boot menu, or going #include <stdlib.h> and asking why malloc() did not work as expected, or building to a.out and trying to execute that as bootloader... Following the rule "fail early, fail loudly", a cross-compiler setup slaps you for trying to work with what isn't there, much more so than a system compiler bent to your will.

Three, in many cases it is much easier to influence what compiler in which version you are using with a cross-compiler setup than with your system compiler. Fiddling with the system compiler can render your whole system useless, while you can do anything to your cross-compiler without any risk to your "other" build chains.

Four, en route to making your system self-hosting, at some point you basically have to go into the fun that is compiler-building, for bootstrapping. Why not make the first step toward your native build tools right at the beginning?

I hope that clears things up a bit.

Re: Why so many custom-build toolchains?

Posted: Mon Jul 09, 2018 2:09 am
by simeonz
Solar wrote:I hope that clears things up a bit.
That's fine. I genuinely believed that there might have been technical arguments. And consequently became curious what they were or how someone learned about them. Whichever the case, I am not arguing against it being a clean solution. There is one downside obviously. That you don't automatically update the ordinary compiler, when you update the freestanding one. But that is not game breaking.

It's clear now. Thanks.

Re: Why so many custom-build toolchains?

Posted: Mon Jul 09, 2018 2:54 am
by Solar
Not updating the compiler automatically can actually be a benefit, too. See, your system compiler gets updated only after the distro maintainers have (hopefully) checked that everything will still be shiny after the update... but of course they are only testing the distro, not your OS...

YourOS should have its own compiler update schedule. Imagine the pain when you're in the middle of some involved work when the system compiler gets updated, and you are sitting there with a broken build and have to figure out what is due to your changes and what is due to your compiler having been updated.

You also cannot easily fall back to previous versions with the system compiler. While setting up a newer cross-compiler version, testing it against YourOS, and then either updating or switching back to the old version is very easy.

While GCC is rather stable for now, there have been ABI-breaking changes in the past (anyone remember the fun we had during the 3.x releases?), and there are likely to be more in the future. I feel you'd need control rather than automatic updates...

Re: Why so many custom-build toolchains?

Posted: Mon Jul 09, 2018 5:38 am
by simeonz
Solar wrote:Not updating the compiler automatically can actually be a benefit, too. See, your system compiler gets updated only after the distro maintainers have (hopefully) checked that everything will still be shiny after the update... but of course they are only testing the distro, not your OS...
I didn't make myself very clear here. You are right. I meant that if a person wants the newer language features or some bug fix, whether they build the compiler themselves or use an unofficial repository, they would likely want both toolchains updated. It would be unlikely to desire C++17 or the fno-plt option for just one version. If the case is exceptional.. noone is restrained from doing the right thing and building two separate compilers. Although you could still build them both as hosted and provide the necessary switches later. It is a choice.

Re: Why so many custom-build toolchains?

Posted: Wed Jul 11, 2018 3:07 am
by Velko
What about LLVM? The Wiki page is not very generous, but the impression is that there is no need to build one for each target.

I imagine, that some effort is needed in order to obtain OS Specific Toolchain for userspace, but for pure kernel work - isn't pre-built, pre-packaged versions enough?