Re:N00b designing a dynamic linker

amee2000 · Post by **amee2000** » Sat Sep 03, 2005 6:14 pm

I'll reuse this thread 'cause I don'T want to spam the forum.

Currently I'm hooking interrupts to provide an API but I'm thinking about a dynamic linker. Unfortunately I found almost no info how a dynamic linker actually works (in contrast to what it does) so I'll design my own one. My concept is based on the following assumption: It does not make any sense to call something at offset address 0xFFFF in any segment.

The hex pattern of "JMP 0x1234:0xFFFF" in 16 bit x86 assembly is "9A FF FF 34 12". What I do is the following:
I scan over the entire binary and look for 0x9A.
On match, I check wether the following word matches 0xFFFF.
Then I look the word following that 0xFFFF up in a table and insert the correct segment and offset address into the code, beginning after where I found the 0x9A.
If I cannot locate the requested function, I can either a) warn and use the address of my "kernel panic" code or b) give up with a critical error message. The first method would make it possible to use an application at least partially even if not all dependencies are satisfied where the second one would rather follow the motto "everything or nothing".

Alternatively I could also use a specific segment address to identify calls that need to be linked but I consider the chances that someone wants to call something in a specific segment higher than that he wants to call the last addressabe location of any segment and checking for a specific segment address seems slightly easier to me because it is closer to the opcode.

???) Which of the two choices above is better?
???) How safe is this concept?

AR · Post by AR » Sat Sep 03, 2005 6:29 pm

The dynamic linker depends on the executable format used, an ELF linker for instance will look up the symbol table in the binary then find the GOT and set the addresses according to the directions and ids provided in the headers.

It looks like your using real mode still, this doesn't apply in real mode as DOS doesn't have libraries to begin with. Dynamic linkers are only used to bind programs to libraries, it doesn't bind to the kernel (On both Windows and Linux kernel calls are made via interrupts or SYSENTER/SYSCALL).

amee2000 · Post by **amee2000** » Sun Sep 04, 2005 4:31 am

Yes I'm still in real mode but I don't want to write another DOS clone. And I'm aware that ELF uses a symbol table but I think its very unlikely to use up all the 64k symbols possible, so the 16 bitID should be sufficient for my needs. And I can add a per-program translation table that converts 16-bit IDs to, say, 32 or 64 bit identifiers later.

But what against dynamic linking of kernel-provided functions? It is my biggest 'library', now and for some stange reason, I don't like interrupts much...

AR · Post by AR » Sun Sep 04, 2005 4:03 pm

Well firstly it isn't secure to let programs tinker with the kernel but in real mode that doesn't matter anyway, the more compelling reason is that the programs would be linked at specific offsets within the kernel making it practically impossible to change the kernel in anyway without recompiling all the apps as well however if you want to develop your own dynamic linking format then there isn't really anything that will prevent you from doing so.

amee2000 · Post by **amee2000** » Sun Sep 04, 2005 5:51 pm

But they could get the address of the kernel from the IVT, if they wanted.

"practically impossible to change the kernel in anyway without recompiling all the apps"
Thats exactly what I want the dynamic linker for. My format would also have the 'advantage' that it isn't an actual format. Linking would take place transparently. the only thing that might confuse someone not aware of the linker is that the kernel seems to be spread across the whole memory, segment by segment

But theres another thing that came to my mind: what if someone needs "9A FF FF" for some reason in the data segment. The chances are 1 to 0x1000000 which isn't very likely but linking the users data is most likely not what he wants. I can't see any way to seperate between code and data without requiring the executables to provide these information.

Code: Select all

entry:
        JMP  start          ; jump over header
        NOP                 ; padding
        NOP

; header is at offset 0x06 and consists only of one value that
; tells me where to stop linking
HEADER  DW  data

start:
        ;<code>

data:
        ; data

AR · Post by AR » Sun Sep 04, 2005 8:06 pm

The distinction of code and data is irrelevant in real mode, either can be used for both, you'll just have to declare that the last byte is reserved and if they use it for something then it's their problem (one of the great benefits of using real mode for anything other than booting).

A cleaner method without linking directly against the kernel would be a thunk table, create a table of thunks that will indirectly call the kernel, that way the programs can still be statically linked (which is pretty much a requirement) but you can still change the kernel. (Definition: A "thunk" is a term used in PE executables for an indirect library call, it is basically just a jump instruction that jumps to the appropriate offset within a library).

You just generate the thunks at boot time in a known area of memory and the apps can just far call the thunk which then far jumps into the kernel, when the kernel far returns it will be as though the thunk was never there.

Pype.Clicker · Post by **Pype.Clicker** » Mon Sep 05, 2005 1:52 am

amee2000 wrote: I'll reuse this thread 'cause I don'T want to spam the forum.

Currently I'm hooking interrupts to provide an API but I'm thinking about a dynamic linker. Unfortunately I found almost no info how a dynamic linker actually works (in contrast to what it does) so I'll design my own one. My concept is based on the following assumption: It does not make any sense to call something at offset address 0xFFFF in any segment.

You shall love the online book "linker and loaders" ...

About the "scan for 9A FF FF and patch", i'd rather suggest to have a sort of symbol table, indeed. You might use the content of the current "JMP xxxx" instruction to encode the offset to the next "JMP xxxx" to be patched with the same symbol, but really, you want to avoid that
"mov edx, 0x9affff" get patched aswell ...

Otherwise, you may use the above-mentionned "thunk" stuff (also called "trampolines" or another funny name in AmigaOS): have a table of "jmp xxxx:xxxx" at the head of your program, patch this only and then have "call trampoline_table+function_number*trampoline_size" to reach another piece of code.