Page 1 of 4

Languages for standalone

Posted: Fri Oct 09, 2020 9:28 am
by PeterX
Which programming lnaguages can compile to a standalone program (= without needing runtime library/object file)?

- C can.
- Forth can, I guess.
- C++ can't because of OOP initialization.
- Lisp/Scheme can probably if I use a compiler and don't use sophisticated list operations (like eval).
- Rust can't?
- Can Go?
- Any other language?

Greetings
Peter

Re: Languages for standalone

Posted: Fri Oct 09, 2020 12:41 pm
by bzt
I guess all compiled languages can if you statically link their runtime in. You can do that with C++, Rust, Pascal, Ada (gnat) etc. The big advantage of C here is that it was created for freestanding mode in the first place (libc and POSIX was added later). Other languages were designed in a way that they expect support from the OS to some extent.

C++ doesn't have a "freestanding" mode per se, but if your code avoids OOP (no exceptions, no new/delete calls, no streams etc.), you can get away with it. Rust is a bit trickier, because in theory you can use it in standard-library-free mode, but problematic TBH (much easier to get the stdlib statically compiled in than to write stdlib-free Rust code). With Ada there's no way to eliminate the runtime (the language has some constructs that require runtime support, like exceptions, generics, async functions and randevous points etc.).

Not sure if Assembly counts for you (as it's not a language rather a one-by-one translation of instructions), but if it does, then you can also count in many HLA capable assemblers (each with its own dialect and macro-sets).

Cheers,
bzt

Re: Languages for standalone

Posted: Sun Oct 11, 2020 12:34 pm
by nullplan
PeterX wrote:Which programming lnaguages can compile to a standalone program (= without needing runtime library/object file)?
All languages, absolutely all of them, require some kind of support. Some of that support can be implemented in the bootloader (that would be load-time support), but still some things need to be done at run time. And it depends on your compiler and version what exactly that entails. For example, GCC compiled C code can make calls to libgcc functions, and calls to memcpy, memmove, or memcmp. So something must implement those. And at least the mem* functions aren't going to fall from the sky.

Depending on your choice of language, the run-time support becomes larger or smaller. C has a pretty small amount, especially on AMD64 if you avoid FPU stuff (which you should, in kernel mode), because in that case libgcc contains almost nothing.
PeterX wrote:- C++ can't because of OOP initialization.
A thousand issues with using C++ for a kernel, and this is the one you pick? Initialization can be implemented rather easily. You only need to figure out when a good time for the initialization would be (for instance, before setting up paging is probably not a good time), and then run all constructors in order.
bzt wrote:C++ doesn't have a "freestanding" mode per se,
False, it does too have one. Note that that one requires a lot more from you in terms of run-time support than C's freestanding mode: <new>, <typeinfo>, and <exception> are all part of the set of required headers for freestanding mode.
bzt wrote: but if your code avoids OOP (no exceptions, no new/delete calls, no streams etc.), you can get away with it.
All of these can be implemented. For exceptions, there is an ABI that says how to do it. new and delete can be overridden even by a hosted application, never mind a freestanding one, and streams are just a library you can implement. I personally don't see the point, but you can. If you are willing to stray outside the bounds of the standard, then you can use some GCC options to prevent the use of exceptions and RTTI, to reduce the amount needed for the run-time library.

Re: Languages for standalone

Posted: Mon Oct 12, 2020 1:41 am
by Solar
Kindly refer to the Languages page of the OSDev Wiki which elaborates on exactly this issue, and has links to various language-specific subpages.

Re: Languages for standalone

Posted: Mon Oct 12, 2020 12:53 pm
by bzt
nullplan wrote:All languages, absolutely all of them, require some kind of support.
Nope, C language doesn't need anything. All of its run-time is pushed into library functions. C language simply doesn't have any complex constructs (like strings, exceptions, streams etc). Comparing strings for example is just a function call like any other, has no language specific syntax. Allocating memory likewise, just a function call like any other, no language specific syntax.

The only true dependency a freestanding C code has is a stack and a zerod out bss, however those are also required by almost every executable no matter the language they were compiled from.
nullplan wrote:For example, GCC compiled C code can make calls to libgcc functions, and calls to memcpy, memmove, or memcmp. So something must implement those. And at least the mem* functions aren't going to fall from the sky.
In freestanding mode without optimization, no C compiler will make such calls (neither gcc, nor Clang, nor TCC, nor MSVC etc.). Btw, memcmp and friends appeared first in AT&T System V UNIX, and the C language is much older than that.

It is true that the gcc optimizer might emit mem* calls, but that's compiler specific. Furthermore gcc will automatically inject __builtin_mem* functions if you don't link with libgcc, so they literally "fall from the sky" with gcc. But this is again a compiler specific thing, CLang for example doesn't do that, neither has a libgcc library, meaning these aren't language features. (Not to mention that if you compile for UEFI for example, then the CLang optimizer will automatically generate CompareMem / ZeroMem calls and not memcmp / memset; so even the optimizer is environment and not language specific.)
nullplan wrote:
bzt wrote:C++ doesn't have a "freestanding" mode per se,
False, it does too have one.
The page you linked tells also that C++ freestanding mode is implementation-specific. A certain compiler might or might not implement that. It might inject the required functions transparently, or it might require a statically linked run-time. The point is, unlike C, the C++ language has features that won't automatically work, unless the programmer provides run-time support. (Eg.: memory allocation is part of the language for example)
nullplan wrote:Note that that one requires a lot more from you in terms of run-time support than C's freestanding mode: <new>, <typeinfo>, and <exception> are all part of the set of required headers for freestanding mode.
Including headers will avoid syntax errors, but they won't give you run-time support. That's another beast to feed.
nullplan wrote:
bzt wrote: but if your code avoids OOP (no exceptions, no new/delete calls, no streams etc.), you can get away with it.
All of these can be implemented.
They CAN be, but they are typically not provided by the compiler. Of course everything can be done if you statically link or implement the run-time support into your executable.

Cheers,
bzt

Re: Languages for standalone

Posted: Mon Oct 12, 2020 3:01 pm
by sj95126
bzt wrote:The only true dependency a freestanding C code has is a stack and a zerod out bss
Technically, you don't even need a bss, if you have no uninitialized global data. For a long time, my (admittedly small) kernel had no .bss section in its ELF header.

To be *really* pedantic, you could conceivably compile simple C code into a standalone flat binary and never use a stack either. You'd have to implement a custom ABI but it's possible.

Re: Languages for standalone

Posted: Mon Oct 12, 2020 3:12 pm
by nexos
GCC programs require libgcc.

Re: Languages for standalone

Posted: Mon Oct 12, 2020 3:20 pm
by sj95126
nexos wrote:GCC programs require libgcc.
My kernel is built with gcc and doesn't use libgcc.

Re: Languages for standalone

Posted: Mon Oct 12, 2020 3:29 pm
by thewrongchristian
sj95126 wrote:
nexos wrote:GCC programs require libgcc.
My kernel is built with gcc and doesn't use libgcc.
https://wiki.osdev.org/Libgcc#I_link_wi ... changes.3F

Re: Languages for standalone

Posted: Mon Oct 12, 2020 4:01 pm
by sj95126
thewrongchristian wrote:
sj95126 wrote:
nexos wrote:GCC programs require libgcc.
My kernel is built with gcc and doesn't use libgcc.
https://wiki.osdev.org/Libgcc#I_link_wi ... changes.3F
Seeing as how I moved libgcc.a out of the cross-compiler directory structure, it'd be awfully hard for it to use it without me knowing.

Re: Languages for standalone

Posted: Mon Oct 12, 2020 4:25 pm
by Octocontrabass
bzt wrote:In freestanding mode without optimization, no C compiler will make such calls (neither gcc, nor Clang, nor TCC, nor MSVC etc.).
Here's GCC making a call to the libgcc function __divdi3() in freestanding mode without optimization.

Here's Clang making a call to memset() in freestanding mode without optimization.

Compiler Explorer doesn't seem to support TCC. I'm not familiar enough with MSVC to know if it automatically links against a library for the function calls it emits.
bzt wrote:Furthermore gcc will automatically inject __builtin_mem* functions if you don't link with libgcc, so they literally "fall from the sky" with gcc.
False. GCC will only emit inline code for the __builtin_mem*() functions in cases where the optimizer thinks doing so will be better than emitting a mem*() function call. This has nothing to do with linking against libgcc. You must provide implementations for the cases where the compiler chooses to emit a function call instead of inline code. (Try compiling with -Os, which tells the optimizer to prefer function calls since they're usually smaller than inline code.)

Clang does the same thing, but its optimizer will sometimes make different choices compared to GCC.
bzt wrote:But this is again a compiler specific thing, CLang for example doesn't do that, neither has a libgcc library, meaning these aren't language features.
You're correct that it's compiler-specific, but this is a bad example. Clang has inline implementations of the __builtin_mem*() functions, just like GCC, and emits calls to its support library, just like GCC. From what I understand, Clang typically uses compiler-rt instead of libgcc, and always links against it even in freestanding mode so you won't notice that it's a separate library unless it's missing.

Re: Languages for standalone

Posted: Tue Oct 13, 2020 10:19 am
by nexos
Overall, C itself appears to be a freestanding language with no dependencies. With GCC, you need libgcc, and probably a couple other functions the optimizer with make calls to. BSS should also be cleared. But the loader will do that (hopefully, unless it's my first loader and I didn't think it was needed until a variable initialized to zero in a program contained garbage :? ).

Re: Languages for standalone

Posted: Tue Oct 13, 2020 12:56 pm
by Solar
Hosted C needs:
  • Setup of the standard input, standard output, and standard error file streams.
  • Setup of the "C" locale (for ctype.h and time.h functions).
  • Arrays of function pointers to be used for registering functions via atexit() and at_quick_exit().
  • Initialization of all objects with static duration with their respective init values. (Neiter .bss nor .data nor .rodata is mentioned in the C standard.)
  • Setup of argc, argv in whatever way main() will expect those parameters on the platform. (Neither "stack" nor "heap" is mentioned in the C standard.)
  • Some way for getenv() to read environment variables. (POSIX handles this via a third parameter to main(), which is why I mention it separately from the library/kernel interfaces below.)
  • Jump to main().
  • On return from main, calling any functions registered via atexit() and at_quick_exit().
  • Flushing and closing of all open streams.
  • Delivering the return value of main() to the calling environment.
  • Of course, the backend functionality on which those library features rely that cannot be themselves be implemented without external support. fopen(), fclose(), fputc(), fgetc(), rename(), remove(), fseek(), time(), system(), ...
The usual mechanic is for the loader to call _start(), which does the setup, calls main(), and handles the wind-down after main() returns.

Freestanding C does not need to provide any library facilities beyond <float.h>, <iso646.h>, <limits.h>, <stdalign.h>, <stdarg.h>, <stdbool.h>, <stddef.h>, <stdint.h>, and <stdnoreturn.h> (which are, incidentially, those headers that only declare constants and macros, but no functions). That does away with most of the above requirements, unless of course your freestanding environment offers such support. Objects with static duration need still be initialized. The function called at program startup is implementation-defined, as is the effect of program termination.

C++ requires some mechanism to call constructors of objects with static duration, which is pretty easily solved with a bit of link script and two lines of plumbing in _start() (or by not having constructed objects with static duration). If you settle for a subset of C++ without exceptions and RTTI, you're done (and neither exceptions nor RTTI are of much use in kernel space anyway).

The rest (like those libgcc dependencies you are talking about) are an issue of GCC's implementation, not of the language.

Re: Languages for standalone

Posted: Tue Oct 13, 2020 3:03 pm
by bzt
Octocontrabass wrote:False. GCC will only emit inline code for the __builtin_mem*() functions in cases where the optimizer thinks doing so will be better than emitting a mem*() function call.
Octocontrabass wrote:Clang has inline implementations of the __builtin_mem*() functions, just like GCC
I have a different experience. Look, here's an example, where it wasn't the optimizer that emitted the memcmp call. If I've used memcmp() or __builtin_memcmp(), I could compile this code with gcc, but not with CLang (both in freestanding mode). The solution was to use __builtin_memcmp() with gcc, and implement memcmp() with CLang.

I admit, this was with older versions (about two years ago), both gcc and CLang could have changed since. I can imagine for example that __builtin_memcmp() was added to CLang since.
sj95126 wrote:My kernel is built with gcc and doesn't use libgcc.
Mine neither. I've deliberately eliminated all compiler-specific libraries (wasn't easy, but possible). Now I can compile my OS with both gcc and CLang as-is (and possibly with many other ANSI C compilers too).
sj95126 wrote:Technically, you don't even need a bss, if you have no uninitialized global data.
Yes, that's true. I was assuming a typical freestanding code will need a bss, but true, you can do without.
sj95126 wrote:To be *really* pedantic, you could conceivably compile simple C code into a standalone flat binary and never use a stack either.
On the other hand I don't think this is possible (not on all architectures that is). Regardless to the ABI, on x86 some CPU instructions need the stack (like "call" or "ret" for example), and I don't think you can convince a C compiler not to use such instructions.

Cheers,
bzt

Re: Languages for standalone

Posted: Tue Oct 13, 2020 3:30 pm
by Octocontrabass
bzt wrote:I have a different experience. Look, here's an example, where it wasn't the optimizer that emitted the memcmp call. If I've used memcmp() or __builtin_memcmp(), I could compile this code with gcc, but not with CLang (both in freestanding mode). The solution was to use __builtin_memcmp() with gcc, and implement memcmp() with CLang.
Replace -O2 with -Os and GCC emits calls to memcmp() too.
bzt wrote:Mine neither. I've deliberately eliminated all compiler-specific libraries (wasn't easy, but possible). Now I can compile my OS with both gcc and CLang as-is (and possibly with many other ANSI C compilers too).
How do you tell Clang to not link against libgcc or compiler-rt?