flat binary c++ kernel

8infy · Post by **8infy** » Sun Apr 05, 2020 2:30 pm

Hello everyone,

I'm super new to osdev so forgive me if this is a stupid question:

So I made a simple bootloader that is able to load a file from a disk and jump to it (with switching to 32bit protected mode etc).
Now I want to use this bootloader to load my kernel file and jump to its entry point. I've decided to go with c++ for my kernel. My plan is to use raw(flat) binary format, no headers.
I have read quite a few posts on how to link c++ object files into a raw binary e.g https://stackoverflow.com/questions/164 ... 6#23502466.
But i'm still confused about how this approach would handle bss (other sections?), which are supposed to be filled with zeroes by something like an elf-loader.
Would this mean that I can't use anything like static int something; in my kernel? And if I did would this point to an invalid memory address (outside of the binary file)?
Basically my question is how many things would break with this approach?
Also, I know that I would have to call global constructors myself and that there would be no exception support (possibly no virtual functions?) but is there anything else?
How would I know what offset to jump to from my bootloader to get to the entrypoint? Is it something I can control in the linker script? If so, how?

Thanks!

Octocontrabass · Post by **Octocontrabass** » Wed Apr 08, 2020 1:12 am

8infy wrote:My plan is to use raw(flat) binary format, no headers.

Why? This seems like a bad idea.

8infy wrote:But i'm still confused about how this approach would handle bss (other sections?), which are supposed to be filled with zeroes by something like an elf-loader.

It can be filled with zeroes by your linker, if you tell it to include that section in the output. It can be filled with zeroes by your startup code, if you add appropriate symbols so your startup code can find the start and end of the section.

8infy wrote:How would I know what offset to jump to from my bootloader to get to the entrypoint?

One option that doesn't involve adding headers to your flat binary is putting your startup code in its own section at the beginning of the binary, so your loader can simply jump to the first byte of the binary.

8infy · Post by **8infy** » Wed Apr 08, 2020 1:34 am

Thanks for the answers!

Octocontrabass wrote:
8infy wrote:My plan is to use raw(flat) binary format, no headers.
Why? This seems like a bad idea.

Because I don't wanna write a full elf loader in assembly

alexfru · Post by **alexfru** » Wed Apr 08, 2020 1:47 am

8infy wrote:
Octocontrabass wrote:
8infy wrote:My plan is to use raw(flat) binary format, no headers.
Why? This seems like a bad idea.
Because I don't wanna write a full elf loader in assembly

You can use C to write most of it. See Smaller C.

8infy · Post by **8infy** » Wed Apr 08, 2020 2:39 am

alexfru wrote: You can use C to write most of it. See Smaller C.

Interesting, thanks!

nullplan · Post by **nullplan** » Wed Apr 08, 2020 4:12 am

8infy wrote:But i'm still confused about how this approach would handle bss (other sections?), which are supposed to be filled with zeroes by something like an elf-loader.
Would this mean that I can't use anything like static int something; in my kernel? And if I did would this point to an invalid memory address (outside of the binary file)?
Basically my question is how many things would break with this approach?

Well, you don't have an ELF loader. The usual method is to memset the BSS section to zero as one of the first things in the kernel. The standard linker script has the symbols _edata and _ebss for this purpose, but you might want to add some alignment to the symbols surrounding your BSS section. You can still make use of the BSS section, with or without the memset(), except without it, you cannot be sure of the initial value. The BSS section is commonly used for all data objects of life-time storage duration (so static or extern linkage) that have an initial value of zero. So "static int i;" would be one such thing. "static int i = 0;" would be to, but "static int i = -1" would not be using BSS.

If you have your stack in the BSS section, though, like so many kernels do, then it gets interesting. Then you must zero out the BSS section as early as possible, possibly as the first instructions in the kernel, because at any later time, you might overwrite the stack, and then things will break really hard.

As for what other things would break: You mentioned using C++, so you have to figure out how your compiler emits calls to global constructors. Because by calling the code this way, you likely have to call those constructors in the startup code. Many tutorials make a lot of global destructors as well, but since a kernel never exits, I wouldn't know why they'd be important.

If you want to support paging, and want to have a higher-half kernel, you will have to figure out when to do all this. It is possible all of these things are only possible after enabling paging, so maybe you need to do that first. Then you need to know the limitations you are saddling yourself with. Personally I have a loader-kernel running entirely in unpaged mode, setting up the paging for the actual kernel, that is loaded alongside the loader. Makes it easier, since I have two completely different programs in the end. Yeah, there is some duplicate code, but that code is compiled by different compilers, and once the actual kernel is running, it will jettison all page maps in the lower half, so the loader will not take up memory at run time.

8infy wrote:Also, I know that I would have to call global constructors myself and that there would be no exception support (possibly no virtual functions?) but is there anything else?

Took the words right out of my mouth.You can add support for all of these. In case of global constructors, recent versions of GCC emit the code to call them in stub functions and add the address to them in the "init_array" section. Just iterate over the init_array section, calling every pointer you find.

As for exceptions, if you want to add support for them, you have to read the relevant ABI documents, where the compiler guys tell you about how throwing exceptions works. And you should always make sure not to throw exceptions through an assembly layer, i.e. out of a syscall or an interrupt. Put an all-catching trampoline at the outermost layers of the kernel if you want to do something like that. Because there is no ABI for something like that happening. It is always invalid to be sending an exception out of the outermost stack frame of the kernel.

Virtual functions should work, though, shouldn't they? In static linking, that was just a bunch of vtables in read-only memory, so at run time, you only have to initialize the vtable pointer, which the constructor will already do. So for automatic objects, you have nothing to do, and for life-time objects, you have the above algorithm.

8infy wrote:How would I know what offset to jump to from my bootloader to get to the entrypoint? Is it something I can control in the linker script? If so, how?

Hey, you wanted a plain binary. So no header will tell you anything. So it depends on the bootloader. In most cases, what happens is that the bootloader jumps to the start of the file it loaded. Therefore you need to put the code for your entry point at the start of the file. Which is easier than it sounds, because you can just mark that with a special section and link that one in first. Or you only make that first "routine" a simple jmp instruction:

Code: Select all

.section ".text.entry", "ax",@progbits,1
jmp _start
.previous

Now you can have _start anywhere in your code section. Of course, nothing prevents you from just putting your entire _start() function into this section.

bzt · Post by **bzt** » Wed Apr 08, 2020 5:02 am

Hi and welcome!

I agree with Octocontrabass, using a raw binary is a bad idea. There's no such thing BSS in raw flat binaries, because they are just raw code (e.g. they don't store segment information). To know the BSS, you'll need an a.out klunge at a minimum.

8infy wrote:Because I don't wanna write a full elf loader in assembly

You don't have to. Assuming you're using a linker script that creates one loadable segment with combined code and data (same as with flat binaries), then you can directly read the bss size from a fixed location in the ELF header.

You'll need p_filesz (the size of the segment in the file), p_memsz from the first program header (the difference of p_memsz and p_filesz is the BSS part which your loader has to zero out), and the entry point (the memory address to jump to to pass control). That's all. No other extra ELF parsing needed (provided you don't want to change your kernel's linker script dynamically).

Here's a minimalistic linker script example:

Code: Select all

PHDRS
{
  boot PT_LOAD;                                /* one single loadable segment */
}
SECTIONS
{
    .text : {
        *(.text .text.*)                       /* code */
        *(.rodata .rodata.*)                   /* data */
        *(.data .data.*)
    } :boot
    .bss (NOLOAD) : {                          /* bss */
        *(.bss .bss.*)
    } :boot
}

And the code to load it (in fasm dialect). This code looks for the first loadable segment, but you can hardwire the offsets with the script above, and my code also checks if the segment is linked at higher half, because my loader loads kernels at -2M + 2 pages. No more than 20 SLoC in Assembly.

Cheers,
bzt

8infy · Post by **8infy** » Wed Apr 08, 2020 5:16 am

Thanks everyone for the answers!

OSDev.org

flat binary c++ kernel

flat binary c++ kernel

Re: flat binary c++ kernel

Re: flat binary c++ kernel

Re: flat binary c++ kernel

Re: flat binary c++ kernel

Re: flat binary c++ kernel

Re: flat binary c++ kernel

Re: flat binary c++ kernel