Loading ELF64 kernel from UEFI

Naxaes · Post by **Naxaes** » Thu Jan 06, 2022 8:57 am

I've been trying to load my kernel (which is of format elf64-x86-64) in a 64-bit UEFI-bootloader for a week now but can't get it to work. I've read up on the ELF and UEFI specification, read the OSDev wiki pages and watched some tutorials, but I think I'm missing something. I guess my most glaring question is:

Do I need to enable paging myself in my bootloader in order to load the kernel?

All tutorials or hobby OS's always seem to do that in the kernel and not the bootloader, but (as I've understood) I have to put the ELF file in a specific virtual memory address for it to work (which in my case is 0x100000). But since UEFI enables paging as identity mapped, I have no control whether the memory at that address is free or not. If it's not, then it'll not work.

In this tutorial they did the following:

Code: Select all

/* Allocate memory for the program segment */
UINTN Pages   = (ProgramHeader.p_memsz + 0x1000 - 1) / 0x1000;
UINTN Segment = ProgramHeader.p_paddr;
EFI_ASSERT(SystemTable->BootServices->AllocatePages(AllocateAddress, EfiLoaderData, Pages, &Segment));

/* Load the program segment into our allocated memory */
EFI_ASSERT(KernelFile->SetPosition(KernelFile, ProgramHeader.p_offset));
UINTN Size = ProgramHeader.p_filesz;
EFI_ASSERT(KernelFile->Read(KernelFile, &Size, (void*) Segment));

But this crashes with an "EFI_NOT_FOUND" when trying to allocate the pages, which I've understood means that the memory at the address is not available. I am also a bit unsure whether it's correct to allocate the segment at `ProgramHeader.p_paddr`, as the OSDev Wiki and this forum post says to put it at the virtual memory address `ProgramHeader.p_vaddr` (although they say in the forum that they're not sure). I've tried both with the same problem or with a page fault.

So what do I need to do in order to load my ELF64 kernel from UEFI? Do I change how the kernel is linked, do I enable paging myself, or do I do something in the UEFI-bootloader?

nexos · Post by **nexos** » Thu Jan 06, 2022 3:05 pm

I would load the file into a buffer using UEFI's file services, and then after ExitBootServices(), setup your own paging tables in the bootloader, and load the kernel into these.

That would be a lot simpler and more elegant then allocating memory at a static address with UEFI, when that address probably isn't available (as it wasn't in your case).

zaval · Post by **zaval** » Thu Jan 06, 2022 8:04 pm

You do not need paging to load your kernel file(s) into memory. you do need paging enabled in the end of the loader work. if your kernel file(s) reside(s) on a FAT volume, you use EFI_SIMPLE_FILE_SYSTEM_PROTOCOL and EFI_FILE_PROTOCOL to read contents of the file(s) into memory. otherwise, you use Block/Disk I/O protocols to get data at the block level and by parsing filesystem in your loader, - get from there the file contents.

Either way, when loading your ELF file, you 1) work in not the same address space, that the kernel will be running in, and thus you don't need to request any specific address to put your ELF segments into, since, as you can see, it easily results in denial. you just allocate AnyPages and put your segments there, remembering their addresses/sizes. then, while in the loader and before ExitBootSevices(), you build up the page tables and map those ranges, where your segments have been loaded to whatever virtual addresses, those segments were linked and then, you do ExitBootServices() and switch to the address space, which the mapping you've just created for.

Whether you need to enable paging yourself, depends on the architecture, and overall - what you actually need to do does. For example on x64, paging is already enabled in UEFI, on x86, on the other hand, it's not and on ARM64, you could well be running in the Hypervisor Exception Level (EL2) address space - this means, that switching mechanics depends heavily on the architecture, but always you do it in the loader after ExitBootServices(). of course, for this switch to be successful, not only mapping for the ELF segments of your kernel file(s) should be established, the whole set of entities you mean to pass to the kernel and what the kernel expects to be there, should be accessible - stack, the freshly created page tables themselves, sort of loader parameter block passed to the kernel, other things of your design, they all have to have the mapping correctly set. the trick is whatever mode/state/level, your loader runs in, numerically, addresses are always 1:1 mapped to the system address space (physical address space), in the address space, it's running in originally, so you take these numbers and put them in the CR3/TTBRx, page tables entries etc. switching the address space to your kernel AS is done in some kind of trampoline code in the loader, at the latest phase, that code does additional CPU magic, required by an architecture. writing that thing is a real quest for the developer's understanding of the architecture, they program. creating and filling in the page tables on the other hand, can be done before, during the loader operation, for example, when you load your ELF file, you add mapping, whenever some block of data has been processed, like the ELF segment for example - it's up to the logic of your loader.

nullplan · Post by **nullplan** » Fri Jan 07, 2022 1:05 am

First of all, you don't use p_paddr for anything. That field is basically a joke. Different linkers do different things to it, and nobody really knows what to write in there or what to make of it.

What I would do is: Locate the kernel file, load the first part of the kernel file (enough for ELF header and program headers). Then, for each PT_LOAD segment, you allocate a memory block that is p_memsz long, then copy p_filesz bytes from the file starting at p_offset into the new block. The alignment of the memory block is given in p_align. It is possible for p_vaddr and p_offset to be misaligned, but they must be congruent modulo the alignment. You also must zero out the difference between p_memsz and p_filesz.

Then you must prepare a new paging structure. So allocate a new page for the PML4, and set it to zero. In that new PML4, you must map all the PT_LOAD segments so they are mapped to p_vaddr. You must also map a stack for the kernel, and identity map the trampoline code (i.e. the code between loading CR3 and jumping to the kernel entry point). You will likely also need further mappings (e.g. I have a linear mapping of all physical memory starting at 0xffff800000000000, and that is very important for me), and you need to prepare an argument structure to the kernel so that it can know the layout of the memory (the kernel must know where the memory is, and also where all of the stuff you put in memory is, like the kernel stack, the kernel image, and the paging structures). Since you will have exited boot services by the time you call the kernel, you must get the memory map in the bootloader and provide it to the kernel. And while you are at it, you will likely also want to provide it with the frame buffer characteristics.

Naxaes · Post by **Naxaes** » Fri Jan 07, 2022 12:14 pm

Okay, thank you all for confirming my thoughts! So to conclude: yes, you need to set up paging and map the memory before jumping to the kernel. I.e., the Youtube tutorial is wrong.

But there was one thing that confused me though:

nullplan wrote:You must also [...] identity map the trampoline code (i.e. the code between loading CR3 and jumping to the kernel entry point).

I'm on board about everything else, but is this trampoline code something that's necessary for x86_64 on a 64-bit machine? Don't think I've heard of it. I know that when I did a BIOS-bootloader it had to do a long jump to the kernel, but understood that this is not necessary for a 64-bit UEFI-bootloader. But maybe it's not the same? A quick Googling made me think that this is for multi-core programming.

nullplan · Post by **nullplan** » Fri Jan 07, 2022 3:22 pm

Naxaes wrote:I'm on board about everything else, but is this trampoline code something that's necessary for x86_64 on a 64-bit machine?

I meant the bit of code that actually does the transition to the kernel from the bootloader. "Trampoline code" is a general term meaning anything that ends in a jump instruction to something different. And sometimes not even a jump instruction. The bit of userspace code that runs the system call that will transition from a signal handler back to the main program has been called "signal trampoline", for example. And yes, the SMP trampoline is the code that gets a new CPU from 16-bit real mode all the way to 64-bit mode and executing the kernel.

In any case, what I meant was that you will need to load CR3 with the address of the new PML4 after exiting boot services and before jumping to the kernel, but loading CR3 is only OK if the "mov cr3" instruction has the same physical address in both the old and new address spaces (otherwise behavior is undefined). The address space for the UEFI environment is defined to be identity mapped, and so the code from the "mov cr3" to the jump to the kernel entry point has to be identity mapped.

I would probably do it something like this:

Code: Select all

int mmap(void *vaddr, uint64_t paddr, size_t size, uint64_t flags);
extern noreturn void start_kernel(void *pml4, void *kstack, uint64_t kentry);
extern char trampoline_start[], trampoline_end[];
[...]

  mmap(trampoline_start, trampoline_start, trampoline_end - trampoline_start, 0); /* identity map kernel start trampoline */

Code: Select all

/* kstart.S */
.global start_kernel
.text
start_kernel:
  movq %rsi, %rsp
 .global trampoline_start
trampoline_start:
  movq %rdi, %cr3
  jmpq *%rdx
.global trampoline_end
trampoline_end:

Then mmap() must of course take care of alignment. But I shall leave that function as an exercise for the reader.

Naxaes · Post by **Naxaes** » Fri Jan 07, 2022 3:41 pm

nullplan wrote:loading CR3 is only OK if the "mov cr3" instruction has the same physical address in both the old and new address spaces (otherwise behavior is undefined).

Ah, that makes sense! Thanks!

davmac314 · Post by **davmac314** » Sun Jan 09, 2022 7:34 pm

FWIW I've written UEFI code which loads an ELF64 kernel and hands over to it using the Stivale2 protocol (almost, the protocol implementation isn't fully complete, but it's enough to boot a toy kernel that I've been working on):

https://github.com/davmac314/tosaithe/b ... e2.cc#L506

(That's part of a general bootloader which also supports chainloading other UEFI programs). Happy to discuss anything there, if it's helpful to you. To answer your specific question, it does set up some paging before jumping into the kernel.

Edit: incidentally, this does load the kernel at a nominal physical address and will bail out if that memory space is not available. Definitely this isn't good practice and isn't really necessary. Consider it a rudimentary example rather than production-ready code.

OSDev.org

Loading ELF64 kernel from UEFI

Loading ELF64 kernel from UEFI

Re: Loading ELF64 kernel from UEFI

Re: Loading ELF64 kernel from UEFI

Re: Loading ELF64 kernel from UEFI

Re: Loading ELF64 kernel from UEFI

Re: Loading ELF64 kernel from UEFI

Re: Loading ELF64 kernel from UEFI

Re: Loading ELF64 kernel from UEFI