Sequence from bootloader to kernel in c

mihe · Post by **mihe** » Sun Dec 09, 2018 2:47 pm

MichaelPetch wrote:
mihe wrote:By the way, I just tried to change my kmain() to _start() for the sake of testing, and, even though I am not getting any warning while linking, it still creates the final elf file with that enormous offset. Interesting...:
As I alluded to in another post you need to override the default VMA (origin point) that the internal linker script is using. If you are placing your kernel at say physical memory address 0x00008000 (just an example) then you can add this LD option -Ttext=0x00008000. The default your LD is using happens to be the default VMA (0x08048000) Linux happens to use for 32-bit code.

That's brilliant

I did not quite understand it the first time, but it was simpler than I thought.

Now I can set the ELF VMA on any value

mihe · Post by **mihe** » Fri Dec 21, 2018 3:22 pm

Quick update to thank you again for your help and guidance, I finally managed to complete my goal! I did it overly complicated so I could learn in the process. In the end I am doing the following:

- Concatenating the full ELF file after the bootloader
- Load a bunch of sectors straight from floppy from sector 2 (1st is the stage1 loader)
- Comb the memory until the magic signature of the elf is found, and save the location
- Parse the ELF's program headers and store the segments marked as LOAD into an array, in a primitive heap
- Iterate the array and transfer one by one the segments to my desired kernel location 0x10000 (for the time being)

Octocontrabass · Post by **Octocontrabass** » Fri Dec 21, 2018 6:36 pm

mihe wrote:- Comb the memory until the magic signature of the elf is found, and save the location

Why do you need to search for it? It should always be in the same place.

mihe · Post by **mihe** » Sat Dec 22, 2018 2:22 am

Octocontrabass wrote:
mihe wrote:- Comb the memory until the magic signature of the elf is found, and save the location
Why do you need to search for it? It should always be in the same place.

The second stage keeps growing, and I just copy a lump amount of sectors from the disk containing the loader and kernel concatenated, so this way I just find where it is in every run. I guess I could align the loader and the kernel properly when concatenating the files to a well-known position, but I wanted to do that combing function to practice asm and learn a bit. Regards.

bzt · Post by **bzt** » Sat Dec 22, 2018 7:41 am

mihe wrote:The second stage keeps growing, and I just copy a lump amount of sectors from the disk containing the loader and kernel concatenated, so this way I just find where it is in every run. I guess I could align the loader and the kernel properly when concatenating the files to a well-known position, but I wanted to do that combing function to practice asm and learn a bit. Regards.

Or you could use the linker script to define an ABS label at the end of your loader (which will be the start for your kernel).

About ELF parsing, that's not that hard. If you want an example, take a look at my bootloader:
- the UEFI version (written in C) is straightforward, easy to read: https://gitlab.com/bztsrc/bootboot/blob ... ot.c#L1084
- the BIOS version (written in ASM) does the same, but it's a bit harder to read: https://gitlab.com/bztsrc/bootboot/blob ... .asm#L1582

Because I load the kernel dynamically, I have core.ptr (in C) and esi (in ASM) to point to the kernel. The steps required are as follows:
1. Check magic bytes to see if it's a valid executable binary for the architecture (I also allow to have "OS/Z" as magic instead of ELF magic, you won't need that part)
2. You have to iterate on the Program Headers looking for segments which have "loadable" flag set. Each segment have to be loaded/mapped at it's p_vaddr
3. There are at least two segments: text (for the program code) and data. Data segment has different file size and memsize. The difference must be zerod out by the ELF loader, as that's the BSS (for example data segment's memsize is 4096, but it's file size is 256. That means only the first 256 bytes are initialized and stored in the ELF, the rest must be zerod out).
4. Entry point is at a fixed offset in the ELF header (which is not a file offset, but a memory address according to the text segment's p_vaddr)

What can be tricky is, that the text segment by default does not contain the ELF header and the Program Headers. That means you have to copy the file contents from the ELF into their final poisition (due to alignment issues, text offset is for example at 0xE8 in the file, but expected to start at a page aligned address in memory). With a special linker script, you can include the ELF headers in the text segment, and with that both file offset and memory address will share the same alignment. For example:

Code: Select all

PHDRS
{
  text PT_LOAD FILEHDR PHDRS;
}

Cheers,
bzt

mihe · Post by **mihe** » Sat Dec 22, 2018 8:42 am

Thanks for the info bzt.

I definitely have to start playing with LD scripts. I have not ever used it.

Regarding parsing the elf, I use a more rudimentary way to do it, compared with your code. My asm knowledge is a still very basic, so it is a bit spaghetti, although correct. By the way, what is the meaning of "@@:"

regards

MichaelPetch · Post by **MichaelPetch** » Sat Dec 22, 2018 3:34 pm

I wrote a small proof of concept bootloader (for floppy) fro someone else last month.It can be simplified by removing the BPB and CHS if using extended disk reads on media that supports that. It is a proof of concept because it assumes fast A20 support (it should do it properly to make it compatible with most BIOSes) and it reads only one sector at a time (in a loop) which is inefficient on slow floppy media but it does get the hob done. It also relocates the bootloader to 0x600 and starts reading the kernel at 0x800. Both these locations can be easily modified in the linker script. It enters in protected mode and transfers to 32-bit code written in _C_. The bootloader does use a linker script to determine where the BSS section is so that it can be zeroed out, and to determine the location and size of the kernel sectors. The code can be found here: http://www.capp-sysware.com/misc/osdev/linkedboot/ . You can ignore the code in the two sub-directories. Rather than transferring to a kernel you could transfer to your elf loader. I'm only posting this here because it shows how you can use a linker script to control things.

NASM has a useful incbin directive to include a binary file. You could build an ELF executable first and then in a NASM assembly file use incbin to include the ELF executable directly. You can use a couple of labels to denote the start and end memory locations. A real mode example of code that loads a second stage as a binary file (without a linker script) can be found here (note: the stage2 code in this example is not mine, but someone else's): http://www.capp-sysware.com/misc/osdev/boot_nasm_reloc/ . os.asm contains the bootloader plus uses incbin to include the second stage binary right after. That is done with code like this:

Code: Select all

; Pad boot sector to 510 bytes and add 2 byte boot signature for 512 total bytes

TIMES 510-($-$$) db  0
dw 0xaa55

section .stage2 vstart=STAGE2_ABS_ADDR align=16

NUM_STAGE2_SECTORS equ (stage2_end-stage2_start+511) / 512
                                ; Number of 512 byte sectors stage2 uses.

stage2_start:
    ; Insert stage2 binary here. It is done this way since we
    ; can determine the size(and number of sectors) to load since
    ;     Size = stage2_end-stage2_start
    incbin "stage2.bin"

; End of stage2. Make sure this label is LAST in this file!
stage2_end:

I forgot you are using GNU assembler. If you create a separate ELF executable but want to include the ELF executable as raw binary data inside an object that can be linked in with your code you can use objcopy to convert the ELF executable to binary and put it in an object file with something like:

Code: Select all

objcopy --input binary --output elf32-i386 --binary-architecture i386 myfile.elf myfile.o

Doing this may seem convoluted, but it can work. OBJCOPY will even go out of its way to create the object file with some very useful symbols. If you were to use objdump -x myfile.o you'd see that it generated these symbols:

SYMBOL TABLE:
0000000000000000 l d .data 0000000000000000 .data
0000000000000000 g .data 0000000000000000 _binary_myfile_elf_start
000000000000002c g .data 0000000000000000 _binary_myfile_elf_end
000000000000002c g *ABS* 0000000000000000 _binary_myfile_elf_size

How is this useful? Because if you link myfile.o (which contains the ELF executable) to your code you can tell what the address to the start of the ELF file is, where it ends, and its size. You wouldn't need to scan memory for the beginning and the end. You have labels to the start and end of the ELF file. I wrote this Stackoverflow Answer that takes a text file and puts it into binary form inside an object and then prints out the string with some _C_ code using the generated symbols:

bzt · Post by **bzt** » Sun Dec 23, 2018 11:13 am

@mihe: don't you worry, Assembly is spaghetti code no matter what you do

Yes, if you want to work on an ELF loader, you definitely should study linker scripts and how they influence the ELF output. Btw, "@@:" means local label in fasm (similar to gas' numbered "1:" labels), you can jump to it as:

Code: Select all

jmp @f   ; jump to the first local label forward
@@:
jmp @b   ; jump to the first local label backward

@MichaelPetch: nice job! Just for the records, you don't need objcopy, ld can do exactly the same: "ld -r -b binary -o out.o in.bin". One less dependecy for your build environment

Cheers,
bzt

mihe · Post by **mihe** » Thu Dec 27, 2018 12:34 pm

Thanks for the info you guys posted, it is helping me a lot.

I am still learning about the compilation process before I move forward to actually write any code for the kernel and I have stumbled upon a problem I cannot solve. Before learning Linker scripts, I was playing with optimizations and I have decided to create two compiled kernels every time I build the entire project. Everything looked fine with -O0, -O2 and -O3 until I tried -O1. With -O1, the result changed the entry point to one of the functions that is not the one I intended as beginning of the kernel.

I have tried to use -e _start and -e "_start" but still it keeps moving to the top a different function.
I would like to note that I am compiling and assembling in one step (using -c parameter of gcc), and linking in a different step.

Some screenshots of the scenario. Right using -O1, left using -O0. With -O1 it moves the print_char function to the beginning of the .text section.

Excerpt of my makefile where I even tried to use -e _start without success:
@$(LD) -o ./build/kernel/kernel-O3.elf ./build/kernel/kernel-O3.o -nostdlib -m elf_i386 -L /root/opt/CROSS/lib/gcc/i686-elf/8.2.0/ -lgcc -Ttext=0x00010000 -e _start

In short, _start is not the beginning of the code, and I do not know how to force it. I assume that using a linker script will have the same result. According to the ld documentation, the parameter -e even has even more priority that the ENTRY directive in a linker script

This is the code I am compiling: (please do not judge the code, it is just some random test, and I am new to c as well)

Any suggestion or comment will be very welcomed!!!

Thanks in advance.

Octocontrabass · Post by **Octocontrabass** » Thu Dec 27, 2018 1:03 pm

mihe wrote:With -O1, the result changed the entry point to one of the functions that is not the one I intended as beginning of the kernel.

The entry point is not the start of the .text section! You need to parse the ELF headers to get the address of the entry point, then call (or jump to) that address to run your kernel.

mihe · Post by **mihe** » Thu Dec 27, 2018 1:18 pm

Octocontrabass wrote:
mihe wrote:With -O1, the result changed the entry point to one of the functions that is not the one I intended as beginning of the kernel.
The entry point is not the start of the .text section! You need to parse the ELF headers to get the address of the entry point, then call (or jump to) that address to run your kernel.

oh..... I completely overlooked it!! I do not know I was assuming the entry point was the beginning of the .text section. Even more embarrassing when I dedicated the last whole weekend writing the ELF parser hahahaha

I have just checked and indeed, that elf is generated with:

Entry point address: 0x10024 (where _start is when I use -O1).

...and changing the entry with -e actually moves around the entry point correctly.

This is very important, because I was blindly jumping to 0x10000 from the loader, so now I have to jump to the actual entry point.

Thanks Octocontrabass!!!

mihe · Post by **mihe** » Fri Dec 28, 2018 4:24 am

bzt wrote: 3. There are at least two segments: text (for the program code) and data. Data segment has different file size and memsize. The difference must be zerod out by the ELF loader, as that's the BSS (for example data segment's memsize is 4096, but it's file size is 256. That means only the first 256 bytes are initialized and stored in the ELF, the rest must be zerod out).

Hello bzt,

thanks for the detailed explanation!!, now that I am starting to get stuff into .bss I am revisiting this point you mentioned.

I am going to implement the part that calculates the difference and zeroes that in memory, but this raises some questions if I want to make the ELF parser future proof.

Right now I am loading all LOAD segments, which happen to be only 2 at this moment, but it is coded to process more, in case if there are more.

Do I have to check every LOAD segment to see if they contain a .bss section or is .bss always in the last LOAD segment, last sector position inside the segment?

I have tried to read random ELF files in my linux installation and seems that the statement above is true for all of them, which will make coding this easier. Is this a safe assumption?

Thanks in advance!

bzt · Post by **bzt** » Sat Dec 29, 2018 8:05 pm

mihe wrote:Right now I am loading all LOAD segments, which happen to be only 2 at this moment, but it is coded to process more, in case if there are more.

Do I have to check every LOAD segment to see if they contain a .bss section or is .bss always in the last LOAD segment, last sector position inside the segment?

I have tried to read random ELF files in my linux installation and seems that the statement above is true for all of them, which will make coding this easier. Is this a safe assumption?

Thanks in advance!

Probably that's a pretty safe assumption, but you got this the wrong way. This is your OS, you are making the rules! You can say that in your OS every ELF must have exactly two loadable segments, and the second must be the data segment

Cheers,
bzt

mihe · Post by **mihe** » Tue Jan 01, 2019 1:50 pm

bzt wrote:
mihe wrote:Right now I am loading all LOAD segments, which happen to be only 2 at this moment, but it is coded to process more, in case if there are more.

Do I have to check every LOAD segment to see if they contain a .bss section or is .bss always in the last LOAD segment, last sector position inside the segment?

I have tried to read random ELF files in my linux installation and seems that the statement above is true for all of them, which will make coding this easier. Is this a safe assumption?

Thanks in advance!
Probably that's a pretty safe assumption, but you got this the wrong way. This is your OS, you are making the rules! You can say that in your OS every ELF must have exactly two loadable segments, and the second must be the data segment

Cheers,
bzt

I implemented it this way, and it works like a charm. Not only that, but they way you mentioned to calculate takes in consideration also the total space used, included possible padding of aligns to 4th byte of .data and .bss so I do not need to worry of alignments, just one shot

Although sounds idyllic, to have my own executable format, I guess I would just stick to conventional ELF to transfer the kernel, and most likely later on for user-space apps.

Thanks again bzt !!

OSDev.org

Sequence from bootloader to kernel in c

Re: Sequence from bootloader to kernel in c

Re: Sequence from bootloader to kernel in c

Re: Sequence from bootloader to kernel in c

Re: Sequence from bootloader to kernel in c

Re: Sequence from bootloader to kernel in c

Re: Sequence from bootloader to kernel in c

Re: Sequence from bootloader to kernel in c

Re: Sequence from bootloader to kernel in c

Re: Sequence from bootloader to kernel in c

Re: Sequence from bootloader to kernel in c

Re: Sequence from bootloader to kernel in c

Re: Sequence from bootloader to kernel in c

Re: Sequence from bootloader to kernel in c

Re: Sequence from bootloader to kernel in c