How to compile a flat position-independent binary with GCC?

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
onlyonemac
Member
Member
Posts: 1146
Joined: Sat Mar 01, 2014 2:59 pm

How to compile a flat position-independent binary with GCC?

Post by onlyonemac »

Just that, really. And do I need to build a cross-compiler for this or can I just pass some options to the compiler? I read the cross-compiler tutorial on the wiki but it didn't say anything about flat or position-independent binaries.
When you start writing an OS you do the minimum possible to get the x86 processor in a usable state, then you try to get as far away from it as possible.

Syntax checkup:
Wrong: OS's, IRQ's, zero'ing
Right: OSes, IRQs, zeroing
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: How to compile a flat position-independent binary with G

Post by Combuster »

Look up -fPIC and -fPIE.
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
onlyonemac
Member
Member
Posts: 1146
Joined: Sat Mar 01, 2014 2:59 pm

Re: How to compile a flat position-independent binary with G

Post by onlyonemac »

I believe that -fPIE will give me position-indeoendent code, but what about the flat binary part? (I want just the compiled machine code, not something with e.g. an ELF header or linking symbols or whatever.)
When you start writing an OS you do the minimum possible to get the x86 processor in a usable state, then you try to get as far away from it as possible.

Syntax checkup:
Wrong: OS's, IRQ's, zero'ing
Right: OSes, IRQs, zeroing
mariuszp
Member
Member
Posts: 587
Joined: Sat Oct 16, 2010 3:38 pm

Re: How to compile a flat position-independent binary with G

Post by mariuszp »

In your linker script:

Code: Select all

OUTPUT_FORMAT("binary")
(pass the -Tfilename.ld option to the linker)
onlyonemac
Member
Member
Posts: 1146
Joined: Sat Mar 01, 2014 2:59 pm

Re: How to compile a flat position-independent binary with G

Post by onlyonemac »

mariuszp wrote:In your linker script:

Code: Select all

OUTPUT_FORMAT("binary")
(pass the -Tfilename.ld option to the linker)
Linker script? What's that? Normally when I compile software I just run either (if I'm going to compile in just one command)

Code: Select all

gcc -o output_executable main.c other_file.c
or (if I'm going to compile each source file separately)

Code: Select all

gcc -c main.c
gcc -c other_file.c
gcc -o output_executable main.o other_file.o
Where does the "linker" fit into this?
When you start writing an OS you do the minimum possible to get the x86 processor in a usable state, then you try to get as far away from it as possible.

Syntax checkup:
Wrong: OS's, IRQ's, zero'ing
Right: OSes, IRQs, zeroing
User avatar
iansjack
Member
Member
Posts: 4706
Joined: Sat Mar 31, 2012 3:07 am
Location: Chichester, UK

Re: How to compile a flat position-independent binary with G

Post by iansjack »

The linker script tells the linker how you want the files combined. In this case it tells it that you want a binary format output file, not the default elf format.

If you want to do non-default things then you need to learn how to use your toolkit. Use Google to discover the difference between a compiler and a linker, and how they work together to produce output files. "gcc" is a meta-tool that calls the pre-processor, the compiler, the linker, or some combination of them in succession.
User avatar
Schol-R-LEA
Member
Member
Posts: 1925
Joined: Fri Oct 27, 2006 9:42 am
Location: Athens, GA, USA

Re: How to compile a flat position-independent binary with G

Post by Schol-R-LEA »

onlyonemac: I realize I'm rather late to the party, but if you haven't done so already, look at the pages for Linkers and Object Files to get an overview of the linkage process. You can get more specific information for the pages for the individual linkers; since you are using GCC, you will want to look up GNU ld (the binutils loader) and ar (the library archiver), as well as the page for Linker scripts, though that's only a stub right now (and specific to ld at that). For OS dev, you definitely need to know what the linker is doing, at least in a general way.

If you don't mind me asking, why are you looking to produce a flat binary from what I presume is C (or some other HLL) code? While there are some scenarios where this would be appropriate, you don't want to commit to doing so if it is just a way of avoiding loading ELF files - eventually, you will want (or more likely, need) the details which ELF (or other non-flat executable formats) provide as a matter of course. Even in a boot loader, being able to load a standard executable is pretty much a necessity for anything beyond the second stage boot loader (and any existing bootloader such as GRUB will generally read at least one standard executable format). If you are rolling your own boot loader, you should seriously consider having a 'half' stage, written in assembly, that exists solely to a) switch to protected mode, b) find the kernel executable file in the file system, and c) parse the executable format enough to load the code from it (which with ELF probably means mapping that part of the file to virtual memory and then just paging it in, I think).
Rev. First Speaker Schol-R-LEA;2 LCF ELF JAM POEE KoR KCO PPWMTF
Ordo OS Project
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.
onlyonemac
Member
Member
Posts: 1146
Joined: Sat Mar 01, 2014 2:59 pm

Re: How to compile a flat position-independent binary with G

Post by onlyonemac »

Schol-R-LEA wrote:If you don't mind me asking, why are you looking to produce a flat binary from what I presume is C (or some other HLL) code? While there are some scenarios where this would be appropriate, you don't want to commit to doing so if it is just a way of avoiding loading ELF files - eventually, you will want (or more likely, need) the details which ELF (or other non-flat executable formats) provide as a matter of course.
My OS is rather unconventional in design. There are no "binaries" as such; everything that is executed consists of small "modules" of user-space code (which may call into kernel-space code) and those "modules" do not get "loaded" according to a header but are simply transferred to memory for execution however the kernel sees fit (i.e. they may be loaded it any address and execution begins at the start and ends at the end). This is why I am also needing position-independent code.

With regards to the original topic, I think I am figuring out how to do this now.
When you start writing an OS you do the minimum possible to get the x86 processor in a usable state, then you try to get as far away from it as possible.

Syntax checkup:
Wrong: OS's, IRQ's, zero'ing
Right: OSes, IRQs, zeroing
User avatar
iansjack
Member
Member
Posts: 4706
Joined: Sat Mar 31, 2012 3:07 am
Location: Chichester, UK

Re: How to compile a flat position-independent binary with G

Post by iansjack »

onlyonemac wrote:My OS is rather unconventional in design. There are no "binaries" as such; everything that is executed consists of small "modules" of user-space code (which may call into kernel-space code) and those "modules" do not get "loaded" according to a header but are simply transferred to memory for execution however the kernel sees fit (i.e. they may be loaded it any address and execution begins at the start and ends at the end).
That's not unconventional; it is, in essence, how all operating systems work. The point about ELF format files (or any other object format) is that they store useful information about the executable code as well as the code itself. They may also contain relocation data, such as the location of variables, so that they can easily be loaded anywhere and can share information with other parts of the OS.

As a simple example, you need to store information about the lengths of the text and data segments somewhere so that you know how much code to load and where to load it.
onlyonemac
Member
Member
Posts: 1146
Joined: Sat Mar 01, 2014 2:59 pm

Re: How to compile a flat position-independent binary with G

Post by onlyonemac »

iansjack wrote:As a simple example, you need to store information about the lengths of the text and data segments somewhere so that you know how much code to load and where to load it.
Yeah I've got information like that, but it's provided by the object that supplies the binary code rather than as part of the code itself (the object stores information about how large each of its contained code modules are). As I regard position-independent code to be more elegant than relocation, I don't need any relocation data, leaving only size data which I have already addressed.
When you start writing an OS you do the minimum possible to get the x86 processor in a usable state, then you try to get as far away from it as possible.

Syntax checkup:
Wrong: OS's, IRQ's, zero'ing
Right: OSes, IRQs, zeroing
User avatar
iansjack
Member
Member
Posts: 4706
Joined: Sat Mar 31, 2012 3:07 am
Location: Chichester, UK

Re: How to compile a flat position-independent binary with G

Post by iansjack »

onlyonemac wrote:As I regard position-independent code to be more elegant than relocation, I don't need any relocation data, leaving only size data which I have already addressed.
An interesting viewpoint. What happens if one module needs to access a variable in another module (or a global module in the OS)? Do you have to hard code the addresses into every module?

I suspect that, eventually, you will find a lot more problems by trying to use flat binaries with no metadata. It can also lead to excessively large binnary files. This sort of decision is normally predicated upon a reluctance to understand, and work with, the structure of an elf file; it's often a mistake to try to avoid learning about these things.
User avatar
Schol-R-LEA
Member
Member
Posts: 1925
Joined: Fri Oct 27, 2006 9:42 am
Location: Athens, GA, USA

Re: How to compile a flat position-independent binary with G

Post by Schol-R-LEA »

I suspect that there's a communication gap here; while I definitely agree with iansjack given what you have said so far, I think further information might clarify what you are meaning to do. The impression I am getting is that there are three possible design approaches you might have in mind:
  • A persistent OO system, in which each object is held in memory indefinitely with virtual memory swap space used for persistently rather than as temporary scratch space, and all objects sharing a common executable memory cache. However, this is a case where relocation information would be more important, not less, as you would have to be able to link the objects to their code at runtime.
  • A threading interpreter such as is often used in Forth systems (where the term 'threading' refers to the interpretation method, not to multi-threading in the usual sense). Here, the system would drill down through a series of interpretable words (Forth terminology) until it found a word that linked to a piece of executable code. However, as with the previous instance, retaining relocation information would still be a requirement, at least regarding the each word's linkage to the code, as the address of the code will change each time the code is reloaded.
  • A system using the Actors model of concurrent OO, in which each object is a separate process and all object interactions are done through IPC. In this case, relocation can genuinely be avoided (though it would still be useful). Actors have some useful and appealing properties, but the message-passing overhead can lead to severe efficiency problems.
I don't know for certain if you have any of these in mind, either individually or in combination, but it might help both us and you if you could write up a detailed model of your design for us to comment on.
Rev. First Speaker Schol-R-LEA;2 LCF ELF JAM POEE KoR KCO PPWMTF
Ordo OS Project
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.
onlyonemac
Member
Member
Posts: 1146
Joined: Sat Mar 01, 2014 2:59 pm

Re: How to compile a flat position-independent binary with G

Post by onlyonemac »

iansjack wrote:An interesting viewpoint. What happens if one module needs to access a variable in another module (or a global module in the OS)? Do you have to hard code the addresses into every module?
There are no global variables. The only inter-module communication is through the attributes of objects.
Schol-R-LEA wrote:
  • A system using the Actors model of concurrent OO, in which each object is a separate process and all object interactions are done through IPC. In this case, relocation can genuinely be avoided (though it would still be useful). Actors have some useful and appealing properties, but the message-passing overhead can lead to severe efficiency problems.
It is most similar to that, although in a single-tasking variant. While there is currently no multi-tasking, I do intend to implement it at some stage (note that "single-tasking" here doesn't necessarily mean that only one "program" can be "open" at a time, but rather that there is only one thread and that that thread is under control of only one code module at a time). In other words, you can have your web browser and your text editor open at the same time, but to switch between them the web browser would have to pass control back to a supervisor thread and which would pass it to the text editor. Also I'm going to see how nicely a "stateless" interface works, where the user does not interact with programs interactively (dialog boxes, buttons, text input fields, etc.) but rather receives and supplies information by viewing and modifying objects (although care will need to be taken to not modify objects that are in use). (The interfaces for viewing and modifying objects will be implemented as a kernel plugin.)
When you start writing an OS you do the minimum possible to get the x86 processor in a usable state, then you try to get as far away from it as possible.

Syntax checkup:
Wrong: OS's, IRQ's, zero'ing
Right: OSes, IRQs, zeroing
User avatar
Schol-R-LEA
Member
Member
Posts: 1925
Joined: Fri Oct 27, 2006 9:42 am
Location: Athens, GA, USA

Re: How to compile a flat position-independent binary with G

Post by Schol-R-LEA »

What you are describing regarding the process management sounds a lot like early versions of Macintosh Finder: what might be termed cooperative multitasking with blocking scheduling, where there can be several programs in memory but only one has the CPU at a given time and it blocks all other processes while it runs. It also sounds surprisingly similar to Oberon OS, though without any provision for ceding the processor even when waiting (the Oberon Language compiler also would insert scheduler calls into the code to prevent starvations and allow the kernel to poll the peripherals rather than having interrupts, but that's a separate issue). While it certainly is easy and usable, it doesn't use the CPU efficiently (especially if there is more than one), it gives no provision for backgrounding a task, and it can lead to kernel starvation if the process goes off the rails or goes into an unbounded loop.

Since you are using GCC (and seem to be using an x86 family processor), I assume you are running in pmode with a flat (per-process) memory model, or in lmode (which AFAICT doesn't use segmentation at all). How are you handling memory protection, and are you implementing (or planning to implement) paging?

Regarding loading code, while the behavioral model is closest to the third case I described, the actual use is closer to the other two, in that while you can avoid relocation editing, you would still need linkage editing to resolve references to external symbols. Avoiding that entirely would require either a) additional indirection in the form of an external call table (which would be populated at runtime, but which at least wouldn't require you to change the values in the executable image itself), b) a message-passing system for sending references between objects (a la Actor Model), or c) an all-in-one compilation method where every part of the code for a program is (re-)compiled to build a single executable, with no compiled libraries and no code sharing between programs. None of these approaches would play well with C or C++, and would require changes to the compiler itself to practical (while the second could be done with explicit message passing, it would be awkward at best).

Given some of the other questions you've asked, I think you may have a weak mental model of how some things work, particularly how function calls and activation records work, and what the GCC driver program actually does. That's OK, everyone has that problem for some things when starting out (I certainly did regarding shared libraries and processor caches, for example), but we'd need to know how you think these things work in order to best explain what you are misunderstanding and correct your confusion. Can you tell us, in your own words, what a compiler and a linker do in general, and what GCC does specifically? Also, can you explain how a function call is implemented at the assembly code level (I'll address the issues of environments, scoping, and activation records later, in the other thread)?
Rev. First Speaker Schol-R-LEA;2 LCF ELF JAM POEE KoR KCO PPWMTF
Ordo OS Project
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.
onlyonemac
Member
Member
Posts: 1146
Joined: Sat Mar 01, 2014 2:59 pm

Re: How to compile a flat position-independent binary with G

Post by onlyonemac »

Schol-R-LEA wrote:Since you are using GCC (and seem to be using an x86 family processor), I assume you are running in pmode with a flat (per-process) memory model, or in lmode (which AFAICT doesn't use segmentation at all). How are you handling memory protection, and are you implementing (or planning to implement) paging?
I am using a flat protected mode with no paging, which is why I am compiling position-independent code. At this stage there is no memory protection, however due to the modular design of my OS it would be easy enough to enable paging and memory protection at a later time and have most if not all existing code continue to work (certainly all existing userspace code would work due to the poisition-independence).
Schol-R-LEA wrote:Given some of the other questions you've asked, I think you may have a weak mental model of how some things work, particularly how function calls and activation records work, and what the GCC driver program actually does. That's OK, everyone has that problem for some things when starting out (I certainly did regarding shared libraries and processor caches, for example), but we'd need to know how you think these things work in order to best explain what you are misunderstanding and correct your confusion. Can you tell us, in your own words, what a compiler and a linker do in general, and what GCC does specifically? Also, can you explain how a function call is implemented at the assembly code level (I'll address the issues of environments, scoping, and activation records later, in the other thread)?
Yeah I'm not the clearest on how the different parts of the toolchain work, however I'm learning it over time. So:
  • a compiler "converts" high-level code such as C into machine code in an object file
  • a linker combines multiple object files together into a binary file
  • GCC provides a convenient way to perform numerous compiling and linking jobs
  • a function call at the assembly level depends on the ABI used, but for C on x86 it's something like pushing all the parameters onto the stack and then executing a "call" to the destination function (I'll read up more on this if I need it)
Anyway I actually came here to ask this:

Here is my linker script:

Code: Select all

OUTPUT_FORMAT("binary")
OUTPUT_ARCH(i386)
ENTRY(start)
phys = 0x00000000;
SECTIONS
{
  .text phys : AT(phys) {
    code = .;
    *(.text)
    *(.rodata)
    . = ALIGN(4096);
  }
  .data : AT(phys + (data - code))
  {
    data = .;
    *(.data)
    . = ALIGN(4096);
  }
  .bss : AT(phys + (bss - code))
  {
    bss = .;
    *(.bss)
    . = ALIGN(4096);
  }
  end = .;
}
Here is my compilation script:

Code: Select all

#!/bin/sh

gcc -fPIE -ffreestanding -fno-builtin -nostdlib -nostdinc -m32 -Wl,-Bstatic -Wall -c $1.c -o $1.o
ld -o $1.bin -T linkscript.lds $1.o
And I get this error when compiling "beep.c" (I can provide the source for this file if required, however it's rather long and not exactly relevant to the error):

Code: Select all

beep.o: In function `start':
beep.c:(.text+0xe): undefined reference to `_GLOBAL_OFFSET_TABLE_'
So what do I do now?
When you start writing an OS you do the minimum possible to get the x86 processor in a usable state, then you try to get as far away from it as possible.

Syntax checkup:
Wrong: OS's, IRQ's, zero'ing
Right: OSes, IRQs, zeroing
Post Reply