Run function in a striped library in linux

Programming, for all ages and all languages.
Post Reply
logos
Posts: 1
Joined: Wed Oct 05, 2022 4:52 am

Run function in a striped library in linux

Post by logos »

If a dynamically load a library in Linux that only has one function call as a entry point to it.
How do I call that function without any header files?

For example lets say somebody had compiled the following, just as an easy case to reason about...

int function(int a, int b){
return a + b ;
}

And it was compiled with...
gcc thefile.c -o thefile.so -fPIC -shared

And now you are going to run that function in Linux. And you only know that this elf can be loaded as a dynamically loaded shared library.
And the function name was altered and is unknown or stripped from the elf

So you want to load it with dlopen in Linux and run that function and print out the result in the terminal.
How would you go about to run this function in a C program in Linux?

And lets say that you suppose it returns a int and you want to print out that value with for example printf...

printf("return value: %d",fn(3,7));

So now you are first going to load that lib with dlopen.
And lets say now that this function can have any name, or that name is stripped away from that elf as supposed.

How would you go about to load it in a C program and run that function without altering the loaded elf?
Octocontrabass
Member
Member
Posts: 5513
Joined: Mon Mar 25, 2013 7:01 pm

Re: Run function in a striped library in linux

Post by Octocontrabass »

Symbol resolution is normally done by name, so I'm not sure you can have a symbol without a name...

But, assuming it truly is the only symbol in the shared library, I think you could find it by using dl_iterate_phdr() to find the PT_DYNAMIC segment and parsing it for the only symbol. I have no idea what kind of pitfalls you may run into by attempting this.
nullplan
Member
Member
Posts: 1767
Joined: Wed Aug 30, 2017 8:24 am

Re: Run function in a striped library in linux

Post by nullplan »

logos wrote:And the function name was altered and is unknown or stripped from the elf
Well, it's impossible for the function name to be truly gone, however, it is possible for an obfuscator to change both the name of the symbol and of the undefined references in programs that use the function. ELF only supports importing a function by name.
logos wrote:How would you go about to load it in a C program and run that function without altering the loaded elf?
You need the name of the function. To that end, you need to iterate over all dynamic symbols defined in the shared object. There are two possibilities: Either section headers still exist or they don't. Section headers are not required in shared objects, but sometimes they help. So you open the file normally as a binary file, and you read the ELF header from it. The ELF header tells you where the section headers are. In the section headers, you look for a section of type SHT_DYNSYM. You note down the section's link, offset, and size fields. The link goes to the corresponding SHT_STRTAB section. You load the entire string table into memory, then iterate over the symbol table (which is easy, since you have offset and size). For each symbol, if its section index is not STN_UNDEF, then it is a defined symbol and you can list its name. You can also list more information about the symbol if that will help you in your hunt for whatever White Whale you are chasing.

If section headers don't exist, you can iterate over the program headers. Those must exist, or else the file is invalid. The program headers will contain one segment of type PT_DYNAMIC. Note down its offset. They will also contain several segments of type PT_LOAD. Note down their offsets and virtual addresses. Anyway, the dynamic segment is a list of machine words, and it is always pairs of them. Somewhere in there, you are going to find a word of value DT_SYMTAB, DT_STRTAB, DT_HASH, or DT_GNU_HASH. Only one of the latter two is required. The words following each of them are virtual addresses. Map them to offsets by looking through your list of LOAD segments, finding which segment the address falls into, then subtracting the virtual address of the segment and adding the offset. So this way you will find symbol table, string table, and one of the hash tables. The only real problem is to find the number of symbols. If you found a DT_HASH table, you are done: The hash table is a list of 32-bit words, and the number of symbols is the second word in the list.

If you are less lucky, you only have a DT_GNU_HASH table. That one does not list the number of symbols explicitly. Best choice you have there is to look for the highest number in the bucket array, then go to that number in the chain array, and follow the chain to its end. Then you have the highest symbol index.

However you do it, now you have a symbol table, a string table, and a number of symbols, and you can just list them out again. Once you have the name, you can just call dlopen() and dlsym() as normal.
Octocontrabass wrote:But, assuming it truly is the only symbol in the shared library, I think you could find it by using dl_iterate_phdr() to find the PT_DYNAMIC segment and parsing it for the only symbol. I have no idea what kind of pitfalls you may run into by attempting this.
There are a few. For one thing, glibc and musl cannot agree on whether to add the base address of the library to the words in the dynamic section. Actually, glibc cannot even agree with itself, since it is inconsistent between architectures. Therefore I would suggest looking at a copy of the file retrieved by simple file I/O calls. And as above, iterating over the symbol table through the dynamic section is the more complicated way.
Carpe diem!
Post Reply