Bootstrapping with dynamically loading libraries in a microkernel

AndrewAPrice · Post by **AndrewAPrice** » Mon Nov 11, 2024 4:32 pm

I'm exploring dynamically loading libraries in my microkernel.

Basically, I'm sick of everything linking against libc or my GUI framework (which links against Skia) blowing my binaries up to 10 MB.

When my OS is running, I use a service to parse the executable files, create an empty process, load it into memory, kick off the first thread. But, for bootstraping, I have a small ELF parser in my kernel for loading the inital set of services.

This works for statically linked binaries, but if I want to bootstrap the initial services that depend on dynamically linked libraries, this becomes more complicated, because the kernel has to resolve the dynamic linking for the first initial set of services.

Do I build two implementations of the dynamic linker - one for userspace and one for bootstrapping? Can my userspace service (that does the dynamic linking and loading after the OS is running) share the dynamically loaded libraries at bootstrapping time?

Some possible approaches:

Build two implementations of the dynamic linker (one in the kernel for bootstrapping, one in userspace). Somehow allow the userspace implementation learn about the dynamically loaded by the kernel during bootstrapping.
Statically link my userspace loader (which complicates my build system a little because I need to build two versions of several libraries - static and dynamic versions), and load that as my first service.

What do microkernels typically do?

Octocontrabass · Post by **Octocontrabass** » Wed Nov 13, 2024 10:51 pm

Is there some reason why the ld-linux.so approach wouldn't work for you?

nullplan · Post by **nullplan** » Thu Nov 14, 2024 12:31 pm

I don't know about microkernels, but on Linux, dynamic loaders have been implemented in only two ways:

Do what glibc, uclibc, and dietlibc are doing and have separate loader and libc. This necessitates duplicating some parts of libc inside of the loader. The benefit is, if you can pull it off, you have a really small loader and it can, in theory, load other libc implementations as needed. Or:
Do what musl is doing and put all of libc and the dynamic loader into the same binary. That way, there is no code duplication and you can update the libc atomically with a single rename() call.

Oh, and on the kernel level, the difference is also minuscule: A dynamic ELF file contains a PT_INTERP segment, naming the dynamic linker. If the kernel encounters it while loading the ELF file, it also loads the dynamic linker into the same address space and initially jumps to its entry point. It also sets a couple of aux vectors differently, namely AT_BASE is set to the interpreter base and AT_ENTRY is set to the main executable entry point.

Additionally, and basically orthogonally to the above, if the ELF type is ET_DYN, the loader can load the file anywhere in address space. The kernel can use this for address space layout randomization.

rdos · Post by **rdos** » Thu Nov 14, 2024 4:04 pm

I have a kernel side loader for user applications. Or, rather, several. In the past I have also supported DOS applications and 16-bit protected mode, although these are no longer actively used. All applications currently are PE format, but I have an experimental ELF loader too. I don't think loader code should be in the application.

For the kernel side, I don't support PE or ELF format for device drivers. They must use a special binary format with an RDOS header defining the code and data segments. This format is implemented in the OpenWatcom linker. I also don't build a huge kernel file, rather there is a mechanism for dynamic linking both for device drivers and user applications that is resolved by a kernel module. Therefore, each device driver can be linked as a "module" and then is loaded at boot time based on a configuration file.

Server modules (in the microkernel model) are just ordinary applications in PE format that are loaded in a bit of a special process.

I don't use DLLs a lot, and typically link applications with static libraries. It's not a big issue given that they are typically only a couple of MBs. The SSL server is the largest and is close to 5MB.

AndrewAPrice · Post by **AndrewAPrice** » Fri Nov 15, 2024 10:32 am

Octocontrabass wrote: ↑Wed Nov 13, 2024 10:51 pm Is there some reason why the ld-linux.so approach wouldn't work for you?

How does the ld-linux.so approach work in a microkernel?

nullplan · Post by **nullplan** » Fri Nov 15, 2024 11:39 am

AndrewAPrice wrote: ↑Fri Nov 15, 2024 10:32 am How does the ld-linux.so approach work in a microkernel?

I believe the biggest issue you have is your attempt to put the ELF support into a service. I don't think that works too well, since the service also has to be some kind of binary. I'd put it into the main kernel, since it is memory-management-adjacent.

Octocontrabass · Post by **Octocontrabass** » Fri Nov 15, 2024 4:44 pm

AndrewAPrice wrote: ↑Fri Nov 15, 2024 10:32 amHow does the ld-linux.so approach work in a microkernel?

I don't think there's any difference. Processes do their own dynamic linking, the only support they get from the ELF loader is where it loads and executes ld-linux.so (or whatever) instead of attempting to load and execute the dynamic executable directly.

eekee · Post by **eekee** » Fri Dec 20, 2024 10:25 am

A funny thing was going on with ld-linux.so in the 00s. It was described as just like an interpreter, and I believe there was some code in the kernel to load other 'interpreters' instead. There was talk of having Linux run Windows binaries by invoking Wine as the interpreter. Java code might already have been implemented that way.

What do I mean by "just like an interpreter"? #! scripts in Unix were traditionally handled by the shell, you couldn't just launch them with exec(). Linux changed so the kernel can recognize the #! line and invoke the interpreter with the script's filename. It can also recognize dynamically-linked ELF files and invoke ld-linux.so much like an interpreter, except I guess it may pass pointers to the already-mapped file.

From a more meta perspective, I've always intended to make the GUI into a set of services as QNX did in the 90s, but it seems like a whole other lot of design work and I'm starting to get fed up with the number of things I need to design. I'm tempted to give up on Kaph and just use a native Forth, write ad-hoc code and devlop guidelines ex post facto.

nullplan · Post by **nullplan** » Fri Dec 20, 2024 3:32 pm

eekee wrote: ↑Fri Dec 20, 2024 10:25 am There was talk of having Linux run Windows binaries by invoking Wine as the interpreter. Java code might already have been implemented that way.

That happened. It is now called binfmt_misc, and it allows you to add (actually prepend) new executable formats to the binary formats table. You give it a magic offset, length, and value, and the kernel now will run all binaries it is told to run that have the magic value in the magic place with the program given.

I have seen this used to run Windows programs with Wine or mono, and to run foreign-architecture ELF files with QEMU.

This of course opens the door to abuse, where a script file is executed, and its interpreter is a .NET EXE, and mono requires an interpreter. I think the stack can only go up to four layers before Linux just rejects it.

eekee wrote: ↑Fri Dec 20, 2024 10:25 am It can also recognize dynamically-linked ELF files and invoke ld-linux.so much like an interpreter, except I guess it may pass pointers to the already-mapped file.

Pretty much that. It recognizes ELF files in need of interpreting by the PT_INTERP program header. If found, it will continue to map the ELF file as it normally would, then map the interpreter, and prepare the aux header AT_BASE to point to the base of the interpreter and AT_PHDR to point at the program header of the main program.

eekee wrote: ↑Fri Dec 20, 2024 10:25 am I'm tempted to give up on Kaph and just use a native Forth, write ad-hoc code and devlop guidelines ex post facto.

That is the only sensible way to go. Only after you have experienced what works and what doesn't is it sensible to codify anything like guidelines.

eekee · Post by **eekee** » Sat Dec 21, 2024 6:21 am

binfmt_misc, of course! It's been so long.

Linux's 4-layers limit seems sensible. It might be configurable, not sure if it's a compile-time option or a sysctl, though my knowledge mostly predates sysctl so I guess it's compile-time.

nullplan wrote: ↑Fri Dec 20, 2024 3:32 pm
eekee wrote: ↑Fri Dec 20, 2024 10:25 am It can also recognize dynamically-linked ELF files and invoke ld-linux.so much like an interpreter, except I guess it may pass pointers to the already-mapped file.
Pretty much that. It recognizes ELF files in need of interpreting by the PT_INTERP program header. If found, it will continue to map the ELF file as it normally would, then map the interpreter, and prepare the aux header AT_BASE to point to the base of the interpreter and AT_PHDR to point at the program header of the main program.

As it maps both of these, does at least 1 need to be position-independent? Oh but the 'interpreter' could relocate the ELF file, couldn't it? I didn't understand relocateability back then, so it didn't get stored in my memory.

nullplan wrote: ↑Fri Dec 20, 2024 3:32 pm
eekee wrote: ↑Fri Dec 20, 2024 10:25 am I'm tempted to give up on Kaph and just use a native Forth, write ad-hoc code and devlop guidelines ex post facto.
That is the only sensible way to go. Only after you have experienced what works and what doesn't is it sensible to codify anything like guidelines.

That does make sense, thanks.

nullplan · Post by **nullplan** » Sat Dec 21, 2024 10:37 am

eekee wrote: ↑Sat Dec 21, 2024 6:21 am As it maps both of these, does at least 1 need to be position-independent? Oh but the 'interpreter' could relocate the ELF file, couldn't it? I didn't understand relocateability back then, so it didn't get stored in my memory.

In theory, both the interpreter and the main executable might be position dependent, just linked to different addresses. We are setting up a new address space, after all, so the entire user half is fair game. Indeed, all libraries might be position dependent as well, if we're getting down to it. Position dependent only means that the loader tries to get the requested addresses, and it is an error if it can't get them (typically, the loader sets the first argument of mmap() to the requested address and doesn't set MAP_FIXED, and errors out if the ELF type is ET_EXEC and the return value of mmap() is unequal to the requested address).

In practice of course, these days all modules are position independent. The advent of x86_64 has brought PC-relative addressing to the masses, which massively reduces the code overhead for PIC, and one of the security mitigations used to make successful exploits of the broken programs we all use on a daily basis less likely is ASLR, which works better if all modules are position independent.

This does mean that the interpreter is run with no relocations being processed, so it has to first process its own relocations. Having a position dependent interpreter would ease that pain, but obviously block off a section of addresses space just for ease of implementation, and I don't think that is a good tradeoff.

To illustrate the point about code size of PIC code: Let's take a simple C function:

Code: Select all

extern int glob_var;
void set(int x) { glob_var = x; }

In i386 position-independent mode, this is compiled into something like

Code: Select all

set:
  call 1f	# push run-time address to stack
1:
  popl %ecx # get run-time address into ECX
  addl $__GLOBAL_OFFSET_TABLE__-1b, %ecx	# get ECX to run-time address of global offset table
  movl glob_var@GOT(%ecx), %ecx	# load pointer to glob_var into ECX
  movl 4(%esp), %eax	# load value to set into EAX
  movl %eax, (%ecx)	# actually set the glob_var
  retl

Whereas on x86_64, it is just

Code: Select all

set:
  movq glob_var@GOTPCREL(%rip), %rax	# load pointer to glob_var into RAX
  movq %rdi, (%rax)	# set the glob_var
  retq

Of course, in both cases it is even shorter in position dependent mode. But in that case, you pay with more data: The global variable then gets a COPY relocation in the main executable. So the main executable then increases the size of its own .bss section and interposes the global variable for all other modules.

AndrewAPrice · Post by **AndrewAPrice** » Sun Dec 29, 2024 11:17 pm

OP here.

I did a multistage approach:

I pass in ELF files via grub multiboot modules.
I kept a simple static ELF loader in the kernel. It ignores ELF files if can't handle (dynamically linked executable, shared libraries.)
I exposed a system call to get the unhandled multiboot modules in system call space (and the first process to can this is the only process that is allowed to subsequently call this.)
I statically link my user space ELF loader. It's binary is large but it's the only statically linked binary (other than that kernel) in my OS.
I implemented the shared library loading, symbol relocation, etc. in only my user space ELF loader. Its first job is to loop through the unhandled multiboot modules and load any ELF executable and depending on any ELF libraries also passed through as multiboot modules.
ELF read only memory is mapped into executable's memory via shared memory, with multiple instances utilizing the same executable or library sharing the same read only memory.

My individual binary sizes are now smaller and memory usage is less.

OSDev.org

Bootstrapping with dynamically loading libraries in a microkernel

Bootstrapping with dynamically loading libraries in a microkernel

Re: Bootstrapping with dynamically loading libraries in a microkernel

Re: Bootstrapping with dynamically loading libraries in a microkernel

Re: Bootstrapping with dynamically loading libraries in a microkernel

Re: Bootstrapping with dynamically loading libraries in a microkernel

Re: Bootstrapping with dynamically loading libraries in a microkernel

Re: Bootstrapping with dynamically loading libraries in a microkernel

Re: Bootstrapping with dynamically loading libraries in a microkernel

Re: Bootstrapping with dynamically loading libraries in a microkernel

Re: Bootstrapping with dynamically loading libraries in a microkernel

Re: Bootstrapping with dynamically loading libraries in a microkernel

Re: Bootstrapping with dynamically loading libraries in a microkernel