Sequence from bootloader to kernel in c

Discussions on more advanced topics such as monolithic vs micro-kernels, transactional memory models, and paging vs segmentation should go here. Use this forum to expand and improve the wiki!
mihe
Member
Member
Posts: 38
Joined: Sun Oct 21, 2018 1:37 pm

Sequence from bootloader to kernel in c

Post by mihe »

Hello everybody,

after doing some research and testing, I think I have clear what are the next steps I have to take to continue with my project, however, given the amount of uncertainly I am leaving at the side of the road on different subjects, because some knowledge gaps, I would really appreciate if you think I am taking the right way.

Current Goal: Concatenate a compiled Kernel in c to the bootloader, load the whole ELF in memory from disk, parse it and allocate the relevant content in memory, and jump to it to continue execution.

This is the scenario:
  • - Rolling my own bootloader, which at this moment is able to:
    • - Check and enable A20 while in real mode (Bochs comes it with it enabled anyway)
      - Reading sectors from floppy, form a binary blob, no filesystem.
      - Print stuff for debugging, including converting decimal values to ASCII strings
      - Checking memory (Only Int 0x12, no E820 yet, because Bochs is a predictable platform, so it is fair enough so far)
      - Preparing some basic segmentation, one segment for code, one for data, covering the whole memory, overlapping.
      - Switching to Protected Mode
      - Printing stuff in Protected Mode via VGA Text mode, with return carriage and stuff...
  • In the kernel side:
    • - I have prepared my cross-compiler, targeting i686-elf , 32 bits.
      - I am compiling using the following optins: (no linking yet)
      • - -m32: To make sure it is 32bits
        - -c: To compile but do not link yet
        - -o: output file...
        - -std=gnu99: to use C99 standard and not earlier versions
        - -ffreestanding: Let GCC know we do not use any standard library (aside from libgcc).
        - -O0: No optimizations, to avoid running into problems at this stage, reduce complexity and make the learning curve not that step.
        - -Wall and -Wextra: To make sure the compiler is extremely picky with our code ,and clear it from any warning that could be a potential error later on.
        - -masm=intel: Just personal preference to examine the code generated

The conceptual questions I have at this moment are:
  • @ Shall I use std=gnu99 or a more recent version of the C standard? Is there any showstopper to use c11 or c17?
    • * For linking, I will use:
      • - -ffreestanding (it is not clear in the gcc doc if this is a CC or LD parameter)
        - -nostdlib: Because we cannot use any standard library
        - -lgcc: To include the libgcc library
        - I will use also the linker to concatenate my asm bootloader (already in machine code via the NASM assembler) with the compiled code from the Kernel in c, resulting in the final ELF32 file.
        - For the aforementioned process, I will use a linker script.
      @ Is this process conceptually correct. Remember that I am reading from a floppy without filesystem, just a binary stream?
    * For loading and jumping I am thinking on:
    • - Read the sectors where the raw ELF file is
      - Parse the ELF file using asm code to extract the code and data, and handle the relocation offsets
      - Carefully structure the fetched code and data in memory
      - Jump to the kernel.
      @ Am I planning this right or there is any pitfall or limitation on this approach?
Any comment, suggestion, hint or tip will be truly appreciated!!

Thanks in advance.
Octocontrabass
Member
Member
Posts: 5516
Joined: Mon Mar 25, 2013 7:01 pm

Re: Sequence from bootloader to kernel in c

Post by Octocontrabass »

mihe wrote:- -m32: To make sure it is 32bits
You don't need this.
mihe wrote:- -O0: No optimizations, to avoid running into problems at this stage, reduce complexity and make the learning curve not that step.
I prefer to compile multiple times with different optimization levels. That way, if I have some undefined behavior in my code, I'm more likely to end up with a crash.
mihe wrote:- -masm=intel: Just personal preference to examine the code generated
This option also affects inline assembly, in case you use any.
mihe wrote:@ Shall I use std=gnu99 or a more recent version of the C standard? Is there any showstopper to use c11 or c17?
I'm not aware of any reasons to avoid using C17 instead of C99. There are only a handful of differences between the two, as far as a freestanding compiler is concerned, so you may not notice the difference if you don't try to use any of the new features.

C17 is (almost?) exclusively clarifications and fixes for C11, so there's no reason to use C11.
mihe wrote:- I will use also the linker to concatenate my asm bootloader (already in machine code via the NASM assembler) with the compiled code from the Kernel in c, resulting in the final ELF32 file.
- For the aforementioned process, I will use a linker script.
I'm not sure this will work the way you want. The bootloader is a boot sector, right? You can't put that part inside an ELF executable because then it won't be at the beginning.
mihe
Member
Member
Posts: 38
Joined: Sun Oct 21, 2018 1:37 pm

Re: Sequence from bootloader to kernel in c

Post by mihe »

First of all, thanks for taking the time to answer.
Octocontrabass wrote:
mihe wrote:- -m32: To make sure it is 32bits
You don't need this.

I thought GCC was defaulting 64 bits, specially when I used the one installed in Ubuntu 64 bits to create the cross-compiler.
mihe wrote:- -O0: No optimizations, to avoid running into problems at this stage, reduce complexity and make the learning curve not that step.
I prefer to compile multiple times with different optimization levels. That way, if I have some undefined behavior in my code, I'm more likely to end up with a crash.

That is a very good tip, better of doing it from the beginning, rather than pretending to enable optimizations at the very end and have an overwhelming amount of issues to fix.
mihe wrote:- -masm=intel: Just personal preference to examine the code generated
This option also affects inline assembly, in case you use any.

Yes, and I really look for that because I am more comfortable with Intel syntax.
mihe wrote:@ Shall I use std=gnu99 or a more recent version of the C standard? Is there any showstopper to use c11 or c17?
I'm not aware of any reasons to avoid using C17 instead of C99. There are only a handful of differences between the two, as far as a freestanding compiler is concerned, so you may not notice the difference if you don't try to use any of the new features.

C17 is (almost?) exclusively clarifications and fixes for C11, so there's no reason to use C11.

I will stick to C99 for the time being, unless I am in real need to use something of the new stuff
mihe wrote:- I will use also the linker to concatenate my asm bootloader (already in machine code via the NASM assembler) with the compiled code from the Kernel in c, resulting in the final ELF32 file.
- For the aforementioned process, I will use a linker script.
I'm not sure this will work the way you want. The bootloader is a boot sector, right? You can't put that part inside an ELF executable because then it won't be at the beginning.
You are right, that solution was wrong. I figured out this evening while trying to make it work.

In the end, I have ended up doing the following (I put the process just in case it is helpful for anyone else):

  • Assembling the bootloader with NASM as a raw binary file
    Compiling the Kernel in C using "-m32 -std=gnu99 -ffreestanding -O0 -Wall -Wextra -masm=intel" (I will remove now the m32 if it is not required"
    Linking the object file using LD with the flags "-o ./build/kernel/kernel.elf ./build/kernel/kernel.o -nostdlib -m elf_i386"
    Concatenating the raw binary bootloader and the whole ELF file using "cat ./build/bootloader/stage1.bin ./build/kernel/kernel.elf > ./build/kernel_core.img"
    Creating an empty floppy with the corect size "dd if=/dev/zero bs=512 count=2880 > ./build/kernel.img"
    Injecting at the begining my concatenated (bootloader + Kernel ELF) at the beginning "dd conv=notrunc if=./build/kernel_core.img of=./build/kernel.img"
  • So far, it is working !! However there are a couple of things that I did not fully understand during the process:

    1) I think I am running into trouble because the parameter -lgcc fails while linking. I am using a cross-linker but it fails with a "cannot find -lgcc". I guess I have to let the compiler call the linker, instead of calling directly LD, as explained here https://wiki.osdev.org/Libgcc, so this is my working line at this moment. I would rather have CC and LD as separated processes, because I am interested on investigating the intermediate states (for learning), so I will try to make LD find the proper library.

    2) Second not-quite-right thing is, when compiling, I got this error:


i686-elf-ld: warning: cannot find entry symbol _start; defaulting to 0000000008048074. I guess it is related with no having a main() but rather a different symbol name, and because my assembly code for the bootloader does not come with any symbol either (raw binary). I guess this only affects the Virtual Memory Addresses defined in the ELF file, so I could potentially cope with it while parsing the ELF, but I would prefer to figure out a cleaner solution. I am investigating this too before moving forward.



Again, thank you very much for your insights!! really helpful
Octocontrabass
Member
Member
Posts: 5516
Joined: Mon Mar 25, 2013 7:01 pm

Re: Sequence from bootloader to kernel in c

Post by Octocontrabass »

mihe wrote:I thought GCC was defaulting 64 bits, specially when I used the one installed in Ubuntu 64 bits to create the cross-compiler.
The copy of GCC included with Ubuntu defaults to 64 bits. Your cross-compiler defaults to 32 bits.
mihe wrote:Linking the object file using LD with the flags "-o ./build/kernel/kernel.elf ./build/kernel/kernel.o -nostdlib -m elf_i386"
You don't need "-m elf_i386" either. Your cross-compiler's LD defaults to 32 bits as well.

However, you do need a linker script to specify the load address and entry point. And speaking of entry points...
mihe wrote:i686-elf-ld: warning: cannot find entry symbol _start; defaulting to 0000000008048074.
This error is because you haven't told LD what the entry point should be. You probably want to have an entry point in your ELF headers; otherwise it'll be hard for your bootloader to figure out where to jump after it's finished loading your kernel.
MichaelPetch
Member
Member
Posts: 780
Joined: Fri Aug 26, 2016 1:41 pm
Libera.chat IRC: mpetch

Re: Sequence from bootloader to kernel in c

Post by MichaelPetch »

Octocontrabass wrote:
mihe wrote:However, you do need a linker script to specify the load address and entry point. And speaking of entry points...
I'm a proponent of using a linker script but it is possible to use the -e entrysymbol (where entrysymbol is your entry point's label) to override the default _start label and you can use the -Ttext=0xXXXXXXXX (where XXXXXXXX) is the virtual memory address (origin point) that the default linker script should start at. Of course if you convert the ELF file to binary to be loaded by the bootloader then the entry point is discarded, but if he's writing an ELF loader it would be useful.
MichaelPetch
Member
Member
Posts: 780
Joined: Fri Aug 26, 2016 1:41 pm
Libera.chat IRC: mpetch

Re: Sequence from bootloader to kernel in c

Post by MichaelPetch »

mihe wrote:1) I think I am running into trouble because the parameter -lgcc fails while linking. I am using a cross-linker but it fails with a "cannot find -lgcc". I guess I have to let the compiler call the linker, instead of calling directly LD, as explained here https://wiki.osdev.org/Libgcc, so this is my working line at this moment. I would rather have CC and LD as separated processes, because I am interested on investigating the intermediate states (for learning), so I will try to make LD find the proper library.
The reason it isn't working is that you have to manually specify the path for the linker to look for the libgcc library. If you were to use GCC though it knows the path to the libgcc it relies on and passes that to LD for you. If you want to discover where libgcc is use the command i686-elf-gcc -xc -E -v -lgcc . Of course i686-elf-gcc is whatever the name for your cross compiler is. Under LIBRARY_PATH you'll find a list of paths to search for the libraries. The one you want is usually <crossinstallbase>/lib/gcc/<crossprefix>/<version>/
. <crossinstallbase> is the base directory where the cross compiler was installed, <crossprefix> is the cross compilers prefix (like i686-elf, i386-elf etc) and <version> is the version. You can pass that library path to LD using the -L option.
mihe
Member
Member
Posts: 38
Joined: Sun Oct 21, 2018 1:37 pm

Re: Sequence from bootloader to kernel in c

Post by mihe »

MichaelPetch wrote:
mihe wrote:1) I think I am running into trouble because the parameter -lgcc fails while linking. I am using a cross-linker but it fails with a "cannot find -lgcc". I guess I have to let the compiler call the linker, instead of calling directly LD, as explained here https://wiki.osdev.org/Libgcc, so this is my working line at this moment. I would rather have CC and LD as separated processes, because I am interested on investigating the intermediate states (for learning), so I will try to make LD find the proper library.
The reason it isn't working is that you have to manually specify the path for the linker to look for the libgcc library. If you were to use GCC though it knows the path to the libgcc it relies on and passes that to LD for you. If you want to discover where libgcc is use the command i686-elf-gcc -xc -E -v -lgcc . Of course i686-elf-gcc is whatever the name for your cross compiler is. Under LIBRARY_PATH you'll find a list of paths to search for the libraries. The one you want is usually <crossinstallbase>/lib/gcc/<crossprefix>/<version>/
. <crossinstallbase> is the base directory where the cross compiler was installed, <crossprefix> is the cross compilers prefix (like i686-elf, i386-elf etc) and <version> is the version. You can pass that library path to LD using the -L option.
Thanks for this ! The path on gcc was correct, it was using the cross-compiler path, but as it has been mentioned before, LD seems to be a bit shortsighted :-) . I used the -L parameter and now it finds the library.
mihe
Member
Member
Posts: 38
Joined: Sun Oct 21, 2018 1:37 pm

Re: Sequence from bootloader to kernel in c

Post by mihe »

Octocontrabass wrote:
mihe wrote:I thought GCC was defaulting 64 bits, specially when I used the one installed in Ubuntu 64 bits to create the cross-compiler.
The copy of GCC included with Ubuntu defaults to 64 bits. Your cross-compiler defaults to 32 bits.
mihe wrote:Linking the object file using LD with the flags "-o ./build/kernel/kernel.elf ./build/kernel/kernel.o -nostdlib -m elf_i386"
You don't need "-m elf_i386" either. Your cross-compiler's LD defaults to 32 bits as well.

However, you do need a linker script to specify the load address and entry point. And speaking of entry points...
mihe wrote:i686-elf-ld: warning: cannot find entry symbol _start; defaulting to 0000000008048074.
This error is because you haven't told LD what the entry point should be. You probably want to have an entry point in your ELF headers; otherwise it'll be hard for your bootloader to figure out where to jump after it's finished loading your kernel.
I suppose this entry point is specified using a linker script, am I right? I have the linker scripts in the learning queue.

I am already working on parsing the ELF file, so in the process I will learn the linker scripts. Something tells me I will end up appreciating all work ld and linker scripts do for you while trying to parse the ELF file and relocate the sections manually in memory, in plain assembly... :-)

Thanks for your help.
MichaelPetch
Member
Member
Posts: 780
Joined: Fri Aug 26, 2016 1:41 pm
Libera.chat IRC: mpetch

Re: Sequence from bootloader to kernel in c

Post by MichaelPetch »

mihe wrote:1)Thanks for this ! The path on gcc was correct, it was using the cross-compiler path, but as it has been mentioned before, LD seems to be a bit shortsighted :-) . I used the -L parameter and now it finds the library.
LD is a linker and part of binutils package, notpart of the GCC project. Libgcc is a GCC library (not a library associated with LD). It is up to GCC to tell the linker where to find it. Using GCC to act as a wrapper around LD is easiest, but as you say you want to do it yourself directly with LD, so it is up to you to to specify thelibrary paths. LD isn't short sighted. libgcc is one of possibly thousands of libraries in a multitude of directories that it has no knowledge of. It is up to the programmer to specify paths to the library directories.
Last edited by MichaelPetch on Sun Dec 09, 2018 1:02 pm, edited 1 time in total.
mihe
Member
Member
Posts: 38
Joined: Sun Oct 21, 2018 1:37 pm

Re: Sequence from bootloader to kernel in c

Post by mihe »

MichaelPetch wrote:
Octocontrabass wrote:
mihe wrote:However, you do need a linker script to specify the load address and entry point. And speaking of entry points...
I'm a proponent of using a linker script but it is possible to use the -e entrysymbol (where entrysymbol is your entry point's label) to override the default _start label and you can use the -Ttext=0xXXXXXXXX (where XXXXXXXX) is the virtual memory address (origin point) that the default linker script should start at. Of course if you convert the ELF file to binary to be loaded by the bootloader then the entry point is discarded, but if he's writing an ELF loader it would be useful.
Thanks for the information. I am a bit confused still about the linker, linker scripts, ELF format, etc... but I am researching now on the ELF format so once I am more familiar with it I will revisit the linker and link scripts.

My goal of parsing ELF myself is more for the sake of learning, than trying to do the things properly (and easier). I foresee I will have a bumpy road, but that will eventually help me understand and connect the dots.

Thanks for your help!
MichaelPetch
Member
Member
Posts: 780
Joined: Fri Aug 26, 2016 1:41 pm
Libera.chat IRC: mpetch

Re: Sequence from bootloader to kernel in c

Post by MichaelPetch »

mihe wrote:I suppose this entry point is specified using a linker script, am I right? I have the linker scripts in the learning queue.
Yes the entry point symbol name can be defined in the linker script, as well as the origin point (VMA). When you don't specify a linker script, LD uses a default one. If you wish to see the default internal linker script you can use the command ld --verbose. Scroll down to the section that starts with using internal linker script: . It is longer and more complex than what you'll need for your kernel in most cases, but I'm letting you know how you can review the one LD uses internally when you don't specify one. You may not find it useful, it is just an FYI.
MichaelPetch
Member
Member
Posts: 780
Joined: Fri Aug 26, 2016 1:41 pm
Libera.chat IRC: mpetch

Re: Sequence from bootloader to kernel in c

Post by MichaelPetch »

mihe wrote:My goal of parsing ELF myself is more for the sake of learning, than trying to do the things properly (and easier). I foresee I will have a bumpy road, but that will eventually help me understand and connect the dots.
You could take the intermediate step of creating a non position independent (static) ELF executable (kernel) and then using objcopy to convert it to a flat binary file. You can get LD to output directly as binary format with --oformat binary as well. I recommend OBJCOPY because you can generate debug information into the ELF executable and then convert that to binary (the binary would not include the debug info). This is useful if debugging with QEMU with the GDB debugger. By using binary file, you wouldn't need the ELF parsing. Just need to load the kernel into memory at the appropriate place in memory that matches your origin point (virtual memory address). The only real trick is to ensure the main code entry point to your kernel is at the beginning of the binary. That can be done with a linker script. It can also be done by creating an object file with a single function (in a .text section) and making sure that object file is listed first on the linker command line. That would ensure it gets emitted to the resulting binary first.

Of course I'm just providing info, and you can skip this step and do ELF parsing, relocations, etc as you were discussing.

A more complex example of this where the bootloader and the kernel are linked together and output as a binary image can be found here: http://www.capp-sysware.com/misc/osdev/linkedboot/ .This small bootloader uses the linker script to determine the number of sectors the kernel needs, properly zeroes out the BSS section, and transfers control to it (after entering protected mode). It takes liberties by using the fast method to enable A20 (there are better and more universal ways to enable A20 that you can find on the OSDev wiki). The code also uses old CHS disk routines so it could be used on floppy images and it is set up for 1.44MB. You could create a proper BIOS parameter block for your disk geometry and file system, but this as just a starting point. It can be modified to use extended disk reads. which simplifies the code as the CHS to LBA translation isn't needed.
mihe
Member
Member
Posts: 38
Joined: Sun Oct 21, 2018 1:37 pm

Re: Sequence from bootloader to kernel in c

Post by mihe »

MichaelPetch wrote:
mihe wrote:I suppose this entry point is specified using a linker script, am I right? I have the linker scripts in the learning queue.
Yes the entry point symbol name can be defined in the linker script, as well as the origin point (VMA). When you don't specify a linker script, LD uses a default one. If you wish to see the default internal linker script you can use the command ld --verbose. Scroll down to the section that starts with using internal linker script: . It is longer and more complex than what you'll need for your kernel in most cases, but I'm letting you know how you can review the one LD uses internally when you don't specify one. You may not find it useful, it is just an FYI.
Thanks Michael for that random nugget of wisdom. I checked the default script and it is quite cryptic, at least for now :-) The moment I finish with ELF I will continue with the linker, I think it is the natural progression, as I see both things tightly coupled, and better of understand the foundations before moving forward to more complex topics.

By the way, I just tried to change my kmain() to _start() for the sake of testing, and, even though I am not getting any warning while linking, it still creates the final elf file with that enormous offset. Interesting...:

Image
MichaelPetch
Member
Member
Posts: 780
Joined: Fri Aug 26, 2016 1:41 pm
Libera.chat IRC: mpetch

Re: Sequence from bootloader to kernel in c

Post by MichaelPetch »

mihe wrote:By the way, I just tried to change my kmain() to _start() for the sake of testing, and, even though I am not getting any warning while linking, it still creates the final elf file with that enormous offset. Interesting...:
As I alluded to in another post you need to override the default VMA (origin point) that the internal linker script is using. If you are placing your kernel at say physical memory address 0x00008000 (just an example) then you can add this LD option -Ttext=0x00008000. The default your LD is using happens to be the default VMA (0x08048000) Linux happens to use for 32-bit code.
mihe
Member
Member
Posts: 38
Joined: Sun Oct 21, 2018 1:37 pm

Re: Sequence from bootloader to kernel in c

Post by mihe »

MichaelPetch wrote:
mihe wrote:My goal of parsing ELF myself is more for the sake of learning, than trying to do the things properly (and easier). I foresee I will have a bumpy road, but that will eventually help me understand and connect the dots.
You could take the intermediate step of creating a non position independent (static) ELF executable (kernel) and then using objcopy to convert it to a flat binary file. You can get LD to output directly as binary format with --oformat binary as well. I recommend OBJCOPY because you can generate debug information into the ELF executable and then convert that to binary (the binary would not include the debug info). This is useful if debugging with QEMU with the GDB debugger. By using binary file, you wouldn't need the ELF parsing. Just need to load the kernel into memory at the appropriate place in memory that matches your origin point (virtual memory address). The only real trick is to ensure the main code entry point to your kernel is at the beginning of the binary. That can be done with a linker script. It can also be done by creating an object file with a single function (in a .text section) and making sure that object file is listed first on the linker command line. That would ensure it gets emitted to the resulting binary first.

Of course I'm just providing info, and you can skip this step and do ELF parsing, relocations, etc as you were discussing.

A more complex example of this where the bootloader and the kernel are linked together and output as a binary image can be found here: http://www.capp-sysware.com/misc/osdev/linkedboot/ .This small bootloader uses the linker script to determine the number of sectors the kernel needs, properly zeroes out the BSS section, and transfers control to it (after entering protected mode). It takes liberties by using the fast method to enable A20 (there are better and more universal ways to enable A20 that you can find on the OSDev wiki). The code also uses old CHS disk routines so it could be used on floppy images and it is set up for 1.44MB. You could create a proper BIOS parameter block for your disk geometry and file system, but this as just a starting point. It can be modified to use extended disk reads. which simplifies the code as the CHS to LBA translation isn't needed.
Thanks for the link! I had been playing before with objdump to extract sections and glue them in fixed positions, with fixed offsets, hardcoded here and there.. and that was my initial plan to have something quick up and running. Actually at that point is when I decided to start researching more ELF, because I guessed that kind of gluing works when you have a clear vision on how the ELFs are created and structured. I may even use that example to have an alternative project, in case I want to get my hands dirty on the kernel itself, and using C though.

Thanks again for sharing your knowledge!!
Post Reply