Shared libraries without a MMU

ldp · Post by **ldp** » Wed Sep 04, 2024 12:16 pm

I am presently designing a small operating system for small computers. Since small computers don't have a lot of RAM, the system has to optimize for memory usage. One of the ways it does so is by sharing code common to different processes by means of shared libraries.

As I understand it, the immutable parts of a library, the .text segments seem mostly straightforward to share: it is only a question of adjusting addresses depending on the location where the library is loaded. The sharing of the mutable parts is complicated by the fact that each process has to have its own copy of it. If not, the processes that import the same library could read and write the same global variable and locks would have to be implemented to prevent race conditions, something I would rather not like to do. But if the mutable segments are at different places depending on the current process, a mechanism needs to be implemented that permits each library to know where are its mutable segments for the current process. With CPUs that have a Memory Management Unit (MMU), this is not a big problem: particular mapping schemes can give the illusion to the code present in each library that its global variables are always in the same place no matter the process in activity. But since the computers I target don't all have a MMU, I cannot rely on it to implement shared libraries.

So my question is: How to implement a shared library so that it can find the instances of its global variables that are pertinent to the current process in computers that don't have a MMU?

For example: Without the help of a MMU, how can the global variable errno have different value for each process that use libc if libc is a shared library.

A maybe important note: I plan to use a dialect of the Forth programming language that uses indirect threading, this means that the code is essentially a list of addresses of subroutines, the majority of which will pertain to shared libraries. For this reason, ideally, library calls would not suffer a too high penalty. Also, shared libraries will use other shared libraries recursively.

nullplan · Post by **nullplan** » Wed Sep 04, 2024 1:25 pm

Short answer: ELF FDPIC!

Longer answer: Basic idea is that you enable sharing of non-writable sections of a file in the kernel. So you do have mmap(), but whenever the user wants PROT_WRITE without MAP_ANONYMOUS, you fail the request.

Next, for memory management purposes, each process mapping a shared lib maps the file's non-writable parts with mmap and allocates the writable parts. Since that means the writable section is no longer a fixed distance from the code section, you also need a platform ABI that designates one register as "pointer to data". And then everything is just loaded from there.

In such a system, the sharable parts of the library are loaded only once, and the non-sharable parts are loaded once for each process that uses them.

One detail: Function descriptors. You can no longer call functions just by the address of their first instruction. Instead, you need that address, and the value to set the data pointer reg to. So function pointers are now pointers to data structures containing two words, where the first is the address of the function and the second is the data pointer, and a function call consists of loading the data pointer correctly before jumping.

ldp · Post by **ldp** » Wed Sep 04, 2024 3:14 pm

Thank you very much for your answer.

Longer answer: Basic idea is that you enable sharing of non-writable sections of a file in the kernel. So you do have mmap(), but whenever the user wants PROT_WRITE without MAP_ANONYMOUS, you fail the request.

I don't get however how can mmap() be implemented without virtual memory and without duplicating a lot of pages in physical memory.

Next, for memory management purposes, each process mapping a shared lib maps the file's non-writable parts with mmap and allocates the writable parts. Since that means the writable section is no longer a fixed distance from the code section, you also need a platform ABI that designates one register as "pointer to data". And then everything is just loaded from there.

Also, I don't understand how can, with this scheme, a library functions call other library functions with global variables. I hope my comprehension is not too naive.

I will illustrate my point. Let's say there are two libraries. The first one uses a function to increment a global variable; the second one uses a function to increment a distinct global variables and to call the function of the first library. A program then calls the function from the second library and exits.

Library 1:

Code: Select all

#include "lib1.h"

int global_var1 = 0;

void increment_var1(void)
{
	global_var1++;
}

Library 2:

Code: Select all

#include "lib1.h"
#include "lib2.h"

int global_var2 = 0;

void increment_vars(void)
{
	global_var2++;
	increment_var1();
}

Program:

Code: Select all

#include "lib2.h"

int main(void)
{
	increment_vars();
	return 0;
}

When the program launches, it reserves space for global_var1 and global_var2. It then sets the register that should contain the pointer to .data to the address of the beginning of the reserved space and calls increment_vars(). By inspecting the register, increment_vars() can increment global_var2. But how can lib2 know to which value the register should be assigned before the call to increment_var1()? There is no reason to not touch it either, since chances are that global_var2 would be incremented a second time. There is obviously something I have not understood.

nullplan · Post by **nullplan** » Wed Sep 04, 2024 9:35 pm

ldp wrote: ↑Wed Sep 04, 2024 3:14 pm I don't get however how can mmap() be implemented without virtual memory and without duplicating a lot of pages in physical memory.

If it is a shared or read-only mapping, then you can just share the pages. Say /lib/libc.so is mapped into memory the first time, you find that address 0x12345000 would work out the best, reserve that memory and read the file there. Since there is no MMU you cannot fault the file in. Next process that tries to map /lib/libc.so just gets 0x12345000 returned. With no MMU, that address means the same in both processes, and since the mapping is read-only, this is safe in both cases. If it was shared and writable, it would also be safe, since then the processes want shared memory semantics.

ldp wrote: ↑Wed Sep 04, 2024 3:14 pm When the program launches, it reserves space for global_var1 and global_var2. It then sets the register that should contain the pointer to .data to the address of the beginning of the reserved space and calls increment_vars(). By inspecting the register, increment_vars() can increment global_var2. But how can lib2 know to which value the register should be assigned before the call to increment_var1()?

When the program is initialized, the dynamic linker generates three memory blocks, one for the main module, one for lib1, and one for lib2. It generates function descriptors for everything in lib1 to use the data block for lib1, and everything in lib2 to use the data block for lib2. Calling a function then means stashing your current data pointer on stack, loading the correct one from the function descriptor, and then jumping to the code address. On return, you just reload your data pointer from stack.

Depending on tooling this can become even more optimized. E.g. you can put the data pointer spill and reload into the PLT stub, so if the function turns out to be local, the call can just be direct. E.g. let's imagine we used PowerPC64 with r2 as the data pointer and function descriptors. For the stuff below, the compiler would generate

Code: Select all

bl increment_vars
nop

If the linker detects that increment_vars is local, it simply binds the reference to the local entry point and is done. If it is in a different module, it generates a PLT:

Code: Select all

std r2, 24(r1)
ld r12, increment_vars@got(r2)
ld r2, 8(r12)
ld r12, 0(r12)
mtctr r12
bctr

And replaces the nop with "ld r2, 24(r1)". The ABI says that every function that calls other functions must allocate at least 32 bytes of stack for various purposes. increment_vars@got points to the function descriptor, in which the first value is the actual code pointer and the second is the data pointer.

Oh, and accessing another module's global data goes through the GOT, as usual.

OSDev.org

Shared libraries without a MMU

Shared libraries without a MMU

Re: Shared libraries without a MMU

Re: Shared libraries without a MMU

Re: Shared libraries without a MMU