Writing an ELF dynamic linker
- NickJohnson
- Member
- Posts: 1249
- Joined: Tue Mar 24, 2009 8:11 pm
- Location: Sunnyvale, California
Writing an ELF dynamic linker
I'm working on getting shared library loading working on my OS, but I've been getting confused going through the ELF documentation and various pages about Linux process execution, and I'd like some things cleared up. My setup is also a bit unique, so I'm not sure which pieces of documentation apply to me and which don't. This is essentially what I have:
- I'm working in protected mode x86 with all the usuals of paging
- I'm doing the loading entirely in userspace (don't question this: I've already discussed it in another thread)
- I'm writing my own libc from scratch
- I can load executable files into memory (only with the C library)
- I can request shared memory containing shared library files in ELF format (only with the C library)
- I can move pages around
- I can intercept page faults (only with the C library)
- I can check page permissions
Essentially, this is what I'm envisioning for process execution:
1. libc loads executable into memory
2. libc parses executable and makes a list of required shared libraries
3. libc requests shared libraries to be mapped into memory
4. libc requests dynamic linker to be mapped into memory (into a special region)
5. libc transfers control to dynamic linker, and gives it a list of the loaded libraries and their locations
6. linker moves executable/libraries into a special region
7. linker frees all but that special region (i.e. the old process, including the old libc)
8. linker moves executable and libraries to final positions
9. linker does it's stuff (i.e. links; also not sure exactly how this works)
10. linker passes control to new executable
11. linker stays resident to do lazy binding? (not sure about this)
So, here are my questions:
- Does the above sequence make sense?
- How exactly does the PLT work? Is the dynamic linker still needed to service lazy binding?
- How do I do PIC in assembly, for both the linker and the soon-to-be shared libc?
- Where is the source for the Linux ld-linux.so? Any other good example code?
I also have the shared library requests serviced by a server (I have a microkernel), which needs to load the libraries off the disk/ramdisk before it can give access to them. Is there any way around statically linking all of the boot drivers, because this server is needed for using any shared libraries? How does Linux solve this sort of bootstrapping problem with shared libraries?
Thanks,
Nick Johnson
- I'm working in protected mode x86 with all the usuals of paging
- I'm doing the loading entirely in userspace (don't question this: I've already discussed it in another thread)
- I'm writing my own libc from scratch
- I can load executable files into memory (only with the C library)
- I can request shared memory containing shared library files in ELF format (only with the C library)
- I can move pages around
- I can intercept page faults (only with the C library)
- I can check page permissions
Essentially, this is what I'm envisioning for process execution:
1. libc loads executable into memory
2. libc parses executable and makes a list of required shared libraries
3. libc requests shared libraries to be mapped into memory
4. libc requests dynamic linker to be mapped into memory (into a special region)
5. libc transfers control to dynamic linker, and gives it a list of the loaded libraries and their locations
6. linker moves executable/libraries into a special region
7. linker frees all but that special region (i.e. the old process, including the old libc)
8. linker moves executable and libraries to final positions
9. linker does it's stuff (i.e. links; also not sure exactly how this works)
10. linker passes control to new executable
11. linker stays resident to do lazy binding? (not sure about this)
So, here are my questions:
- Does the above sequence make sense?
- How exactly does the PLT work? Is the dynamic linker still needed to service lazy binding?
- How do I do PIC in assembly, for both the linker and the soon-to-be shared libc?
- Where is the source for the Linux ld-linux.so? Any other good example code?
I also have the shared library requests serviced by a server (I have a microkernel), which needs to load the libraries off the disk/ramdisk before it can give access to them. Is there any way around statically linking all of the boot drivers, because this server is needed for using any shared libraries? How does Linux solve this sort of bootstrapping problem with shared libraries?
Thanks,
Nick Johnson
Re: Writing an ELF dynamic linker
Nick, have you read this ?
If a trainstation is where trains stop, what is a workstation ?
Re: Writing an ELF dynamic linker
Hi,
The above is what I would recommend (see the INTERP field of ELF headers - they all point to ld.so), however your technique will work and does makes sense.
The first instruction unconditionally branches to an address, which is initially set up to point to the instruction after it. It then calls your dynamic linker callback with two arguments - the symtab index of the function to link (foo_idx) and the value of GOT[1], which is a constant you store which by convention identifies the shared object/executable, so your dynamic linker can link correctly.
When the linker has found the address of "foo", it is expected to store this in GOT[foo_idx] then JMP to &foo. Then, the next time foo@plt is called the "jmp GOT[foo_idx]" jumps directly to &foo.
Hope this helps,
James
It does, but it's not how I would do it. If I understand it correctly you're doing a fork/exec style of process creation, with the fork() done by the kernel and the exec() done by the libc? Normally both are done by the kernel, then the kernel loads the dynamic linker and the dynamic linker does the rest itself (loading the executable and all shared libraries (which would include libc), then linking them).- Does the above sequence make sense?
The above is what I would recommend (see the INTERP field of ELF headers - they all point to ld.so), however your technique will work and does makes sense.
With a linked function "foo", a call in the executable will look something like this:- How exactly does the PLT work? Is the dynamic linker still needed to service lazy binding?
Code: Select all
call foo@plt
...
foo@plt: jmp GOT[foo_idx]
1: push foo_idx
push GOT[1]
jmp GOT[0]
GOT[0] = address of your dynamic link callback
GOT[1] = library/shared object identifier
GOT[foo_idx] = &1
When the linker has found the address of "foo", it is expected to store this in GOT[foo_idx] then JMP to &foo. Then, the next time foo@plt is called the "jmp GOT[foo_idx]" jumps directly to &foo.
Don't use any absolute addresses.- How do I do PIC in assembly, for both the linker and the soon-to-be shared libc?
No idea, sorry!- Where is the source for the Linux ld-linux.so? Any other good example code?
Hope this helps,
James
- NickJohnson
- Member
- Posts: 1249
- Joined: Tue Mar 24, 2009 8:11 pm
- Location: Sunnyvale, California
Re: Writing an ELF dynamic linker
@Solar: The reason for my design is that I have already implemented the protocols for loading the executable/shared libraries in the libc, and it would be a lot of work to add these to the dynamic linker (and impossible to do it in the kernel), so I might as well make the libc do the loading and reduce the size of the linker.
So, where exactly would that stub you posted end up? Is that what the PLT contains? Is that stub part of the shared library, or the calling executable, or does the dynamic linker have to create it? How many PLTs are there?
@gerryg400: Thanks, that is much better than the ELF docs.
So, where exactly would that stub you posted end up? Is that what the PLT contains? Is that stub part of the shared library, or the calling executable, or does the dynamic linker have to create it? How many PLTs are there?
@gerryg400: Thanks, that is much better than the ELF docs.
Re: Writing an ELF dynamic linker
Hi,
Secondly, the stub is part of the PLT. The PLT contains one of these stubs for each function that is imported from another shared object/executable. The PLT is always for calling out of the current object, not for other objects to call in. So that PLT is part of the calling executable (also note that shared libraries can link to other shared libraries..)
There are generally quite a few entries in the table, but one table per dynamic object (.so, executable, etc).
James
Firstly, thanks for mixing me up with Solar, I feel blessedNickJohnson wrote:@Solar: The reason for my design is that I have already implemented the protocols for loading the executable/shared libraries in the libc, and it would be a lot of work to add these to the dynamic linker (and impossible to do it in the kernel), so I might as well make the libc do the loading and reduce the size of the linker.
So, where exactly would that stub you posted end up? Is that what the PLT contains? Is that stub part of the shared library, or the calling executable, or does the dynamic linker have to create it? How many PLTs are there?
@gerryg400: Thanks, that is much better than the ELF docs.
Secondly, the stub is part of the PLT. The PLT contains one of these stubs for each function that is imported from another shared object/executable. The PLT is always for calling out of the current object, not for other objects to call in. So that PLT is part of the calling executable (also note that shared libraries can link to other shared libraries..)
There are generally quite a few entries in the table, but one table per dynamic object (.so, executable, etc).
James
Re: Writing an ELF dynamic linker
I'd guessed that, but it seemed @$$-backwards so I wasn't sure enough to say it.berkus wrote:ld-linux.so.2 comes from glibc.
- Brynet-Inc
- Member
- Posts: 2426
- Joined: Tue Oct 17, 2006 9:29 pm
- Libera.chat IRC: brynet
- Location: Canada
- Contact:
Re: Writing an ELF dynamic linker
This is where you can find OpenBSD's ld.so source, but I'm not sure if it'll be useful to you.
- NickJohnson
- Member
- Posts: 1249
- Joined: Tue Mar 24, 2009 8:11 pm
- Location: Sunnyvale, California
Re: Writing an ELF dynamic linker
Hi again,
I'm trying to figure out the best way of getting my dynamic linker into place. I'm going to need it for process execution during init, but my shared library loading needs a bunch of stuff to be running, so I have a bit of a chicken-and-egg problem. My idea is to statically link all of the dynamic linker into the C library, but in a separate ELF section, and make the whole C library PIC. Then I should be able to copy the linker into its memory region right out of the C library, by specifying limits on that ELF section. That will work, right, even without relocations? As long as I don't call anything in the C library from it once it has been moved? I just want to make sure I'm not being stupid.
Edit:
Basically, this is what I want to know: if I move a chunk of PIC code, which doesn't reference anything outside of itself, to a new virtual address, will it still run?
I'm trying to figure out the best way of getting my dynamic linker into place. I'm going to need it for process execution during init, but my shared library loading needs a bunch of stuff to be running, so I have a bit of a chicken-and-egg problem. My idea is to statically link all of the dynamic linker into the C library, but in a separate ELF section, and make the whole C library PIC. Then I should be able to copy the linker into its memory region right out of the C library, by specifying limits on that ELF section. That will work, right, even without relocations? As long as I don't call anything in the C library from it once it has been moved? I just want to make sure I'm not being stupid.
Edit:
Basically, this is what I want to know: if I move a chunk of PIC code, which doesn't reference anything outside of itself, to a new virtual address, will it still run?
Re: Writing an ELF dynamic linker
Hi,
You'll need to run the relocations again. Even if code is compiled PIC, it may still contain PC-relative branches or loads, which are resolved at load time (at which point the address is fixed so this is safe). So yes, you can move it again, but you'll need to re-dynamic-link it.NickJohnson wrote:Hi again,
I'm trying to figure out the best way of getting my dynamic linker into place. I'm going to need it for process execution during init, but my shared library loading needs a bunch of stuff to be running, so I have a bit of a chicken-and-egg problem. My idea is to statically link all of the dynamic linker into the C library, but in a separate ELF section, and make the whole C library PIC. Then I should be able to copy the linker into its memory region right out of the C library, by specifying limits on that ELF section. That will work, right, even without relocations? As long as I don't call anything in the C library from it once it has been moved? I just want to make sure I'm not being stupid.
Edit:
Basically, this is what I want to know: if I move a chunk of PIC code, which doesn't reference anything outside of itself, to a new virtual address, will it still run?
- NickJohnson
- Member
- Posts: 1249
- Joined: Tue Mar 24, 2009 8:11 pm
- Location: Sunnyvale, California
Re: Writing an ELF dynamic linker
So, wait: if you move PIC, you have to relocate it? Doesn't that defeat the purpose of PIC?
Re: Writing an ELF dynamic linker
No, it doesn't - most code isn't made to be moved dynamically at runtime. PIC applies to load time - it can be loaded anywhere. You still have to apply some relocations at load time. If these were absolute relocations (i.e. jump absolute), you wouldn't have any bother. But in x86_64, many relocations are PC-relative, so if you change the value of PC (by relocating the code again) you invalidate that jump. So you need to relink it.NickJohnson wrote:So, wait: if you move PIC, you have to relocate it? Doesn't that defeat the purpose of PIC?
EDIT: For clarification, by "relink" I mean redo the dynamic relocations you had to do at load time.
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: Writing an ELF dynamic linker
I see a lot of twisted truths that border on being just wrong here:
PIC means that it can be loaded and started at any address without any modification. That includes that all code and read-only sections can be present in physical memory once and used at various locations in different programs. By definition, the moment resources have to be patched in any way at load time, it ceases to be PIC and becomes relocatable code.
Absolute jumps are prohibited in PIC as they can not be moved and still point to where they were originally pointing. Relative jumps are valid however, since when you move code as a whole, the jump target changes as much as the jump source, making the difference zero. The same goes for any addressing modes that add the program counter in the effective address.
Moving at runtime is not a requirement however. And therefore, for efficiency, the code may translate a relative address to an absolute address when a labeled resource is referenced, and use that further on. If later you move the process in memory, those absolute addresses become invalid. Just like you can set up a funfair's merry-go-round on any square with the needed free space, it is nonsense to drive a merry-go-round around town when people are riding it. And this is probably the point JamesM was trying to make, applicable or not.
PIC means that it can be loaded and started at any address without any modification. That includes that all code and read-only sections can be present in physical memory once and used at various locations in different programs. By definition, the moment resources have to be patched in any way at load time, it ceases to be PIC and becomes relocatable code.
Absolute jumps are prohibited in PIC as they can not be moved and still point to where they were originally pointing. Relative jumps are valid however, since when you move code as a whole, the jump target changes as much as the jump source, making the difference zero. The same goes for any addressing modes that add the program counter in the effective address.
Moving at runtime is not a requirement however. And therefore, for efficiency, the code may translate a relative address to an absolute address when a labeled resource is referenced, and use that further on. If later you move the process in memory, those absolute addresses become invalid. Just like you can set up a funfair's merry-go-round on any square with the needed free space, it is nonsense to drive a merry-go-round around town when people are riding it. And this is probably the point JamesM was trying to make, applicable or not.
- NickJohnson
- Member
- Posts: 1249
- Joined: Tue Mar 24, 2009 8:11 pm
- Location: Sunnyvale, California
Re: Writing an ELF dynamic linker
Okay, but I'm only running/accessing the dynamic linker sections after I've moved them. It's not like I'm trying to move code while it's running.
I've tried implementing what I thought would work, but it doesn't. The C library has a section ".dl" that contains all of the dynamic linker's stuff. I copy the segment containing it (this segment only contains that section) to 0xC0000000, and the functions work, but any data accesses (to data in .dl) don't. Do I also need to copy the GOT along with the data/code, even when the code doesn't reference outside of the dynamic linker? Do I have to do relocations in the GOT?
I've tried implementing what I thought would work, but it doesn't. The C library has a section ".dl" that contains all of the dynamic linker's stuff. I copy the segment containing it (this segment only contains that section) to 0xC0000000, and the functions work, but any data accesses (to data in .dl) don't. Do I also need to copy the GOT along with the data/code, even when the code doesn't reference outside of the dynamic linker? Do I have to do relocations in the GOT?
Re: Writing an ELF dynamic linker
Quite. And you'd bork the entire call stack, but I didn't mention this as I assumed the code wouldn't be running when he moved it.Moving at runtime is not a requirement however. And therefore, for efficiency, the code may translate a relative address to an absolute address when a labeled resource is referenced, and use that further on. If later you move the process in memory, those absolute addresses become invalid. Just like you can set up a funfair's merry-go-round on any square with the needed free space, it is nonsense to drive a merry-go-round around town when people are riding it. And this is probably the point JamesM was trying to make, applicable or not.
The point is, if you patch a PC-relative jump to somewhere outside the current object then move that object, it is no longer valid. Now, this may be moot as all jumps outside an object *should* reside in the PLT and are thus accessed absolutely through the GOT. But I could've sworn I've seen proper relocations had to be made, for "extern" data and suchlike. Combuster?
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: Writing an ELF dynamic linker
I don't use ELF PIC, but only relocatable code, so I have to look up the details myself too. As far as the story goes, EBX should point to the GOT, which is a list of addresses that should be external to the library, i.e. from one library to a dependency. My brief googling hasn't established yet if its customary to have all internal references in the GOT as well instead of directly offsetting from the Base Register.
Position independence as defined becomes a tricky one there: is the data section patched at load time (especially when dealing with references to itself), then it technically does not qualify as PIC. If a loop is executed at run time to load the symbols (so that fixing the GOT becomes the task of crt0.o), then it does qualify by that definition.
A probably interesting question here is: if you can omit the GOT and instead use relocatable code with some smart allocation scheme to maximize code sharing (akin to loading Windows DLLs), what is the practical difference in memory use (less in GOT, but maybe more because of having several different relocated versions of the same library).
Position independence as defined becomes a tricky one there: is the data section patched at load time (especially when dealing with references to itself), then it technically does not qualify as PIC. If a loop is executed at run time to load the symbols (so that fixing the GOT becomes the task of crt0.o), then it does qualify by that definition.
A probably interesting question here is: if you can omit the GOT and instead use relocatable code with some smart allocation scheme to maximize code sharing (akin to loading Windows DLLs), what is the practical difference in memory use (less in GOT, but maybe more because of having several different relocated versions of the same library).