Dynamic linking rant

nullplan · Post by **nullplan** » Thu Dec 21, 2023 12:51 pm

This is just something that's been rolling around in my brain for a while, and I thought I would finally verbalize it in one place, rather than let most of it only be present in that unhinged discussion I had with bzt that one time.

Not to bury the lead, my opinion should be well enough on display in the title alone, but let me make it plain: I don't like dynamic linking. But to give it its fair shake:

1. The good points

OK, yes, dynamic linking has its good points. Its presence theoretically allows reduced memory footprint (although this is hard to measure in any OS with shared file mappings).

It also solves a logistical problem, in that it allows a quicker rollout of a patched library, if the patch does not break ABI. With dynamic linking, you update the library file (with removal first!), then restart whatever processes are still using the old one, and you're done. With static linking, you have to update the library, and then re-link all the binaries using the file, then install all the binaries and restart all the processes running the binaries, which is significantly more work.

There is also dynamic loading, which is quite nice. It allows programs written for that to be extended with new plugins after they were written, and it requires dynamic linking to work.

But from the intro, it should be clear that this is not where I will end the post, so on to the bad points.

2. Overburdening complexity

Dynamic libraries are a prototypical example of a simple idea that got away from people. Basic idea was to share library code among multiple processes. But the plumbing to make this all work ended up taking up more space than it saved. There's the need to use position-independent code, which at least in those architectures not using PC-relative addressing is still relatively expensive. There's the GOT and the PLT. There's the dynamic linker, which is no small piece of code. There's symbol tables, symbol visibility, symbol version, init/fini ordering. There is the matter of the dynamic linker having to relocate itself on startup, which is no mean feat. And of course, dynamic TLS modules are quite complicated to support right. All of the relocation types are arch-specific, but also kinda not, since most archs recycle the same ideas. But finding the right abstractions is a chore.

It is also undefined whether dlclose() actually unloads the given module, with arguments for and against which I'm not going to get into. But with static linking, none of this is even a concern! (OK, with static PIE, the PIC and the relocation bootstrapping is, but that is tiny if done right.)

3. What space does it save?

The argument I've heard time and again for dynamic linking is that it saves space. Rather than having a copy of printf() and all the functions printf() calls in each process, you have one loaded in the system for all processes to share. While dynamic linking achieves this, it also does far more than this. First of all, static linking tricks with weak symbols don't work, so stuff you don't need gets linked in too. But also, stuff you never call is linked in just by being in the same library. A simple "hello world" program contains a copy of system() in its address space, just for sharing a library with puts().

If you look at it closely, you see that dynamic linking only saves on disk space, since the library functions do not need to be in the program image file. But the space is still taken up in memory when the program is loaded. So it is a technique that saves on disk space at the expense of memory space, when disk space has always been the more abundant of the two on any system I have ever seen. Even my parents' 386 with a 100MB hard drive had more disk space than memory space.

On the topic of memory space: All processes running the same program share text space. Busybox solves the same problem faster than dynamic libraries and with less overhead!

4. Security problems everywhere

The list of security issues caused in the past is endless. Through dynamic lookup of library files, automatic execution of initializers, and examination of environment variables, the dynamic linker creates a massive attack surface for malicious users to exploit. These were often privilege escalation attacks by use of setuid executables. But also arbitrary code execution by manipulation of common variables by attackers has been a problem in the past (e.g. when an attacker can influence the environment of a CGI program).

5. Welcome to DLL hell.

You want a level of hell that has Doomguy giving up, look no further than attempting to solve a diamond problem among dynamic modules. Distributions have the problem relatively under control these days through the use of sophisticated package managers, but this is inventing new technology to get around the hurdles we put up ourselves. What is even the point? Why not take the hurdles down?

6. Compatibility? Gone.

With statically linked programs, there is a certain degree of compatibility. The only form of dynamic linking going on in there is the linkage to the kernel, and that interface is well specified. If you try a call that doesn't exist, you get -ENOSYS back. If you try a call that did exist, you get the same behavior you always got if that is an option, or else -ENOSYS. A statically linked program for Linux 0.x, compiled in the mid-90ies can be executed without changes on the most modern laptop in the world, and work as well as it ever did, even if the exact same source code would get you a different program today.

This means you can implement graceful fallback to older algorithms. You tried statx() on an old kernel and it didn't work? OK, try fstatat() instead. But with dynamic linking? Well, since they introduced symbol versions, you cannot downgrade libraries. If your library doesn't have a symbol version, the program just fails. Recently had a program sent to me that was compiled for Ubuntu, and I'm running Debian, and the program couldn't run. I had to install a Ubuntu jail just for that one program. That is bloody well insane!

7. It's all spiraling in on itself!

And the final end to this insanity, the reason why I'm writing this: There are now formats like snap and flatpack that attempt to make it possible to package binary programs for all distributions. Which is a laudable goal I simply would have solved with static linking (put everything into one big file and ship it). But how do they do it? File system images containing dynamically linked programs and all dependencies. So we have come full circle. Every program has its own printf(), only now we have the massive dynamic linking machinery hanging around, and all the libraries have a usage count of 1.

8. Conclusion

So why not just stay with static linking? There are workarounds for not getting the good parts (e.g. continuous integration can ease the logistical problem of library updates, the plugin thing can be solved statically if everything is open source) and you cut out so much fluff you don't need.

klange · Post by **klange** » Thu Dec 21, 2023 8:17 pm

Some points in both directions I want to address:

There is also dynamic loading, which is quite nice. It allows programs written for that to be extended with new plugins after they were written, and it requires dynamic linking to work.

the plugin thing can be solved statically if everything is open source

You don't need dynamic linking, and you don't need things to be open-source - you can distribute plugins as object files and use initializers to make the rest of the program aware of them when they are linked in.

Dynamic libraries are a prototypical example of a simple idea that got away from people. Basic idea was to share library code among multiple processes. But the plumbing to make this all work ended up taking up more space than it saved. There's the need to use position-independent code, which at least in those architectures not using PC-relative addressing is still relatively expensive. There's the GOT and the PLT. There's the dynamic linker, which is no small piece of code. There's symbol tables, symbol visibility, symbol version, init/fini ordering. There is the matter of the dynamic linker having to relocate itself on startup, which is no mean feat. And of course, dynamic TLS modules are quite complicated to support right. All of the relocation types are arch-specific, but also kinda not, since most archs recycle the same ideas. But finding the right abstractions is a chore.

Technically, several of these things aren't necessary for dynamic linking. You can be build shared objects that aren't PIC, you can make a dynamic loader that is fully static and loaded at a fixed offset so it doesn't need to relocate itself (I do this!)...

If you look at it closely, you see that dynamic linking only saves on disk space, since the library functions do not need to be in the program image file. But the space is still taken up in memory when the program is loaded.

This is wrong - or at least, it should be wrong in theory. First, when the library is mapped, it's not all loaded into memory (on a real OS, at least...): it gets loaded page by page as it is accessed by some process using it. Part of why GOTs and PLTs are a thing is to avoid touching as much of the code as possible by centralizing the places that need to be written to by relocations, minimizing the number of modified pages but also minifying the number of accessed pages. You can even potentially save memory over static linking if you reference a large function (or tree of functions) but then never end up calling them based on runtime conditions. There's also lazy binding - symbol relocations that aren't actually done until needed.

This is the caveat for what I mentioned earlier, though: If you have non-PIC shared libraries, they'll contain relocations within the code, and suddenly you have to write to all of it and it's no longer shared between processes by CoW and this really messes up the lazy binding benefits so you probably brought in far more of the library than you would have liked...

With statically linked programs, there is a certain degree of compatibility. The only form of dynamic linking going on in there is the linkage to the kernel, and that interface is well specified. If you try a call that doesn't exist, you get -ENOSYS back. If you try a call that did exist, you get the same behavior you always got if that is an option, or else -ENOSYS. A statically linked program for Linux 0.x, compiled in the mid-90ies can be executed without changes on the most modern laptop in the world, and work as well as it ever did, even if the exact same source code would get you a different program today.

This benefit of static linking relies on the kernel providing a backwards-compatible interface - something Linus has regularly gone on rampages to defend for Linux, but allow me to offer a counterpoint: I changed the ABI for my syscall interface recently, to support the syscall instruction. In order to do that, I had to reorder argument registers. I was able to do this without recompiling a single application in my package repository because they all made system calls through the dynamically-linked libc.

And that's not something limited to the libc: My windowing system uses client-side decorations. All applications that want title bars link with a library that provides them. A bit back, I added support for a minimize button: All I had to do was change that library - again, I didn't have to recompile anything in the package repository and suddenly all of my applications had this button. Around the same time, I added a hover highlight for those buttons, and again, I didn't have to recompile anything but the decorator library to support this in all of my applications.

This means you can implement graceful fallback to older algorithms. You tried statx() on an old kernel and it didn't work? OK, try fstatat() instead. But with dynamic linking? Well, since they introduced symbol versions, you cannot downgrade libraries. If your library doesn't have a symbol version, the program just fails. Recently had a program sent to me that was compiled for Ubuntu, and I'm running Debian, and the program couldn't run. I had to install a Ubuntu jail just for that one program. That is bloody well insane!

I feel like this is a failure of conventions (and a failure of glibc, in particular) more than it is one of dynamic linking in general - why do we require that symbols resolve? Why can't we have an interface to check if a function was successfully bound at runtime? Why is there no easy way to target an older version when I build so we don't run into this with symbol versioning? Bah!

nullplan · Post by **nullplan** » Fri Dec 22, 2023 2:19 am

klange wrote:You don't need dynamic linking, and you don't need things to be open-source - you can distribute plugins as object files and use initializers to make the rest of the program aware of them when they are linked in.

I was thinking of something like mupen64, which is where I did something like this the first time. mupen64 is an N64 emulator that has a plugin architecture, allowing it to dynamically load video plugins and such. It does that by searching a search path for DLLs (or SOs in the UNIX version) and loading it, then looking for specific symbols. Simply turning the plugins into static libraries and pushing them into the main executable would fail because of double definitions. But, since both mupen64 and the plugins were open-source, I could change them to use function pointer tables. So for static mode, I privated all the erstwhile external functions in the plugin and listed them in the function pointer table, then listed the function pointer tables all in an array. This way, most of the architecture was kept intact without entwining the code blocks too much. And you could still choose the plugin to use with a config option.

Your approach would only work if I have the main program available as source code or object files.

klange wrote:when the library is mapped, it's not all loaded into memory (on a real OS, at least...): it gets loaded page by page as it is accessed by some process using it.

This isn't different from static linking. At least on Linux, the main executable is also faulted into address space, just the same as any library file. If there are rarely used functions on a page you never use, they will not be loaded from disk unless called.

No, the use case where shared libraries might make up for their wastefulness is when many processes (running different programs) are sharing the same library. But that is precisely what I am bemoaning here: With the increasing containerization I see going on, this sharing is made impossible. So the one thing that might have made the whole thing worth it is thrown out the window.

There are more things I haven't even talked about: All of this sharing doesn't come for free. Whenever a library is mapped, the kernel has to look up the path, then notice that the file is already open, then notice that parts of it are already mapped read-only, so the mappings can be shared. All of this is time simply not spent for statically linked programs.

klange wrote:This benefit of static linking relies on the kernel providing a backwards-compatible interface - something Linus has regularly gone on rampages to defend for Linux, but allow me to offer a counterpoint: I changed the ABI for my syscall interface recently, to support the syscall instruction. In order to do that, I had to reorder argument registers. I was able to do this without recompiling a single application in my package repository because they all made system calls through the dynamically-linked libc.

When Linux added support for the syscall instruction to i386, it did so by adding a new system call mechanism (through the AT_SYSINFO aux header). The old one remained intact.

But point taken. I have acknowledged the logistical issue of updating a library with static linking before. And I maintain that while this is a benefit of dynamic linking, it does not make up for all of its shortcomings.

klange wrote:Bah!

My thoughts exactly.

klange · Post by **klange** » Fri Dec 22, 2023 3:45 am

nullplan wrote:Your approach would only work if I have the main program available as source code or object files.

That's actually the whole idea: Distribute applications as their constituent object files. Then they can be relinked with updated versions of libraries. Lots of proprietary software for Linux, particularly in industrial environments, used to be distributed this way, but I think it's fallen out of favor. It's also a valid way to comply with the LGPL in a proprietary application while still using static linking.

nullplan wrote:When Linux added support for the syscall instruction to i386, it did so by adding a new system call mechanism (through the AT_SYSINFO aux header).

But the thing AT_SYSINFO points to is a dynamic library - albeit a "virtual" one, it still uses the normal ELF mechanisms.

nullplan · Post by **nullplan** » Fri Dec 22, 2023 10:40 am

klange wrote:That's actually the whole idea: Distribute applications as their constituent object files. Then they can be relinked with updated versions of libraries. Lots of proprietary software for Linux, particularly in industrial environments, used to be distributed this way, but I think it's fallen out of favor. It's also a valid way to comply with the LGPL in a proprietary application while still using static linking.

Ah, wonderful. Yes, I once had the displeasure of analyzing what the LGPL would mean for us at work, and using dynamic linking, or allowing users to download object files was actually discussed, since LGPL requires that the user be allowed to update the library at their leisure. Of course, we could just not do that, since none of our customers would want to install anything on the system that wasn't approved by us. In the end we noticed we could do without LGPL libraries, and that certainly helps.

klange wrote:But the thing AT_SYSINFO points to is a dynamic library - albeit a "virtual" one, it still uses the normal ELF mechanisms.

This is one place where I approve. This isn't a dynamic library that could have been static but just wasn't, this is a dynamic way for the kernel to publish faster system calls to applications. This is very different from writing a version of stat() that only works on Ubuntu.

I know of only one other library that similarly is used to abstract the hardware of the machine - libGL.so. And its mere existence seriously throws a wrench in the works of the "just link statically" argument. Fundamentally, the issue here is that hardware differences get smoothed over on the library level, not the kernel level, as is the case for literally any other piece of hardware. Of course, if you have the object or source files, then you can just link libGL statically. Oh well. OpenGL has also fallen out of favor of late, so I'm hoping the problem will just go away if I ignore it hard enough.

rdos · Post by **rdos** » Wed Dec 27, 2023 4:53 am

I only use dynamic linking for isolation of code. I have a MID dll to isolate the measuring code and I have a fiscal dll to isolate the fiscal interface. I certainly do not link to a dynamic libc, as that would include everything in the runtime library, when I'm not likely to use much of it.

I have a particular issue with SSL, which is huge and would improve as a DLL. However, I plan to solve this by making SSL a feature of the kernel (a device driver). This means I don't need to link to the SSL library, which will draw in most of the functions, and I will not have the vulnerability of having SSL in user space, and potentially shared between processes as well. Having SSL in the kernel makes it like a shared library, but also isolates the internal data so it cannot be tampered with by user processes.

I've done the same with other APIs, like the font API and the graphics API.

linguofreak · Post by **linguofreak** » Fri Dec 29, 2023 1:22 pm

klange wrote:And that's not something limited to the libc: My windowing system uses client-side decorations.

As a user, I am *really* not a fan of CSD. It enables crappy behavior by application devs (by making it easier to write software that doesn't respect the desktop theme) and breaks the UI convention of having a region of the window that only interacts with the windowing server about what the user wants the server to do with the client without the client snooping.

nullplan · Post by **nullplan** » Fri Dec 29, 2023 2:29 pm

rdos wrote:I have a particular issue with SSL, which is huge and would improve as a DLL. However, I plan to solve this by making SSL a feature of the kernel (a device driver). This means I don't need to link to the SSL library, which will draw in most of the functions, and I will not have the vulnerability of having SSL in user space, and potentially shared between processes as well. Having SSL in the kernel makes it like a shared library, but also isolates the internal data so it cannot be tampered with by user processes.

You know, the UNIX solution to this would be to have a TLS server that provides some sort of simpler protocol to establish TLS connections itself. I'm thinking of a server that accepts local connections and then has some simple thing front-loaded to get the config stuff out of the way (active or passive open, and the socket address), and then it just tunnels everything, encrypting in one direction and decrypting in the other. Then the TLS complexity would only be in one process and you wouldn't run the TLS code at elevated privileges. Just a thought if you have UNIX domain sockets or something similar. I suppose TCP via localhost might also suffice.

linguofreak wrote:As a user, I am *really* not a fan of CSD.

Now that I come to think about it, seconded. There is rarely a good reason for applications to get out of having decorations applied to them in the standard way.

klange · Post by **klange** » Fri Dec 29, 2023 3:27 pm

linguofreak wrote:It enables crappy behavior by application devs (by making it easier to write software that doesn't respect the desktop theme)

I disagree with this take and, in fact, assert that it is server-side decoration that does this: With no ability to extend or integrate with the decorator, server-side decorated environments drive application developers to simply opt out and force them to implement their own decorations from scratch, and that results in far worse crimes against desktop themes.

nullplan wrote:Now that I come to think about it, seconded. There is rarely a good reason for applications to get out of having decorations applied to them in the standard way.

Server-side decorated environments like X will always offer an escape hatch, and as long as that escape hatch exists someone will abuse it to make Chrome.

thewrongchristian · Post by **thewrongchristian** » Fri Dec 29, 2023 8:12 pm

nullplan wrote:
rdos wrote:I have a particular issue with SSL, which is huge and would improve as a DLL. However, I plan to solve this by making SSL a feature of the kernel (a device driver). This means I don't need to link to the SSL library, which will draw in most of the functions, and I will not have the vulnerability of having SSL in user space, and potentially shared between processes as well. Having SSL in the kernel makes it like a shared library, but also isolates the internal data so it cannot be tampered with by user processes.

Sounds a lot like SysV STREAMS. Open your connection to the remote server however STREAMS does that, PUSH a TLS driver onto the stack, let the driver negotiate the TLS connection, then encrypt anything written to the file descriptor, and decrypt anything read from the file descriptor.

I wonder how sockets won originally, STREAMS sounds a so much more UNIX like solution (which is to be expected, it came from V8 UNIX). Sockets won out on performance grounds, I understand, but I fail to see how STREAMS could be implemented so badly as to lose out to BSD sockets in performance. Both are in the kernel, and so should have had similar performance opportunities.

nullplan wrote: You know, the UNIX solution to this would be to have a TLS server that provides some sort of simpler protocol to establish TLS connections itself. I'm thinking of a server that accepts local connections and then has some simple thing front-loaded to get the config stuff out of the way (active or passive open, and the socket address), and then it just tunnels everything, encrypting in one direction and decrypting in the other. Then the TLS complexity would only be in one process and you wouldn't run the TLS code at elevated privileges. Just a thought if you have UNIX domain sockets or something similar. I suppose TCP via localhost might also suffice.

Sounds exactly like stunnel?

rdos · Post by **rdos** » Mon Jan 01, 2024 4:55 am

nullplan wrote:
rdos wrote:I have a particular issue with SSL, which is huge and would improve as a DLL. However, I plan to solve this by making SSL a feature of the kernel (a device driver). This means I don't need to link to the SSL library, which will draw in most of the functions, and I will not have the vulnerability of having SSL in user space, and potentially shared between processes as well. Having SSL in the kernel makes it like a shared library, but also isolates the internal data so it cannot be tampered with by user processes.
You know, the UNIX solution to this would be to have a TLS server that provides some sort of simpler protocol to establish TLS connections itself. I'm thinking of a server that accepts local connections and then has some simple thing front-loaded to get the config stuff out of the way (active or passive open, and the socket address), and then it just tunnels everything, encrypting in one direction and decrypting in the other. Then the TLS complexity would only be in one process and you wouldn't run the TLS code at elevated privileges. Just a thought if you have UNIX domain sockets or something similar. I suppose TCP via localhost might also suffice.

It's not a problem to run the TLS code with elevated privileges. It will run in one code selector, one data selector and one heap selector only. For test, I'm even allocating every object as a 48-bit pointer, making sure OpenSsl doesn't have problems with overwrites.

It's of course possible to run it as a server process (at user level), but it would cost too much in performance. Extensive segmentation does cost a bit too, but not as much as having it as a server process.

While I probably would make it compatible with file descriptors, my first goal is to create it's own API, which needs to include certificates, but possibly other things as well in the future.

rdos · Post by **rdos** » Tue Jan 02, 2024 12:32 pm

On second thoughts, it might be better to run SSL as a server process. It turns out that I need both receive and transmit buffers between SSL and the application. I also need a server thread to handle SSL. Because of that,the additional overhead would be small.

AndrewAPrice · Post by **AndrewAPrice** » Sat Jan 13, 2024 8:59 pm

klange wrote: You don't need dynamic linking, and you don't need things to be open-source - you can distribute plugins as object files and use initializers to make the rest of the program aware of them when they are linked in.

You could do AOT linking. Perhaps even in the package manager at install/update time (or even on the package manager server at upload time.)

With dynamic linking (even if done AOT as mentioned) you hope someone doesn't make a non-backwards compatible change to a library you depend on. Even if the ABI is identical, I could imagine there are breaking changes. For example, a video game depends on a physics library and one day the physics library fixes a bug in some calculation, and now the cannon balls can't hit the castle and the level is now unpassable yet no ABI broke.

Likewise static linking isn't completely immune either, especially in a microkernel where there may not be a "fopen" system call but instead your program depends on talking to other services for every day functionality and hopefully they are backwards compatible.

nullplan · Post by **nullplan** » Sat Jan 13, 2024 11:19 pm

AndrewAPrice wrote:Likewise static linking isn't completely immune either, especially in a microkernel where there may not be a "fopen" system call but instead your program depends on talking to other services for every day functionality and hopefully they are backwards compatible.

If you have a system call ABI that may or may not handle the "file open" syscall depending on what services are present, then you have a major problem that needs fixing. Not even OS-9 does that.

AndrewAPrice · Post by **AndrewAPrice** » Sun Jan 14, 2024 11:30 am

nullplan wrote:
AndrewAPrice wrote:Likewise static linking isn't completely immune either, especially in a microkernel where there may not be a "fopen" system call but instead your program depends on talking to other services for every day functionality and hopefully they are backwards compatible.
If you have a system call ABI that may or may not handle the "file open" syscall depending on what services are present, then you have a major problem that needs fixing. Not even OS-9 does that.

Sorry, I mean to say there is no "file open" system call. My microkernel doesn't know about files and windows but provides system calls to discover services and perform RPCs.

So in this environment, even if the system calls are stable and backwards compatible, there's still the extra burden when working with microkernels to ensure that the ecosystem of services you depend on are backwards compatible.

Btw I am in favor of statically linking. I like the idea of whole-program-optimization. However a simple GUI calculator program is 9MB in my OS.

I compile using GCC with "-fdata-sections -ffunction-sections -g -O3 -fomit-frame-pointer -fverbose-asm -m64 -ffreestanding -nostdlib -nostdinc++ -mno-red-zone -c -std=c++20 -MD -MF" and link with "-Wl,--gc-sections -O3 -g -s -nostdlib -nodefaultlibs -nolibc -nostartfiles -z max-page-size=1 -T userland.ld -o <output file> -Wl,--start-group <input files> -Wl,--end-group -Wl,-lgcc" and this linker file. I'd appreciate it if anyone has any advice on how I can shrink my binaries.

I suspect it's because my UI library uses skia for drawing, and by touching it for a few things (e.g. drawing some text and buttons) I end up touching a lot of unused but plausibly reachable code paths and all of skias dependencies get linked in.

OSDev.org

Dynamic linking rant

Dynamic linking rant

Re: Dynamic linking rant

Re: Dynamic linking rant

Re: Dynamic linking rant

Re: Dynamic linking rant

Re: Dynamic linking rant

Re: Dynamic linking rant

Re: Dynamic linking rant

Re: Dynamic linking rant

Re: Dynamic linking rant

Re: Dynamic linking rant

Re: Dynamic linking rant

Re: Dynamic linking rant

Re: Dynamic linking rant

Re: Dynamic linking rant