Hi,
SpyderTL wrote:1. Give every loadable module its own memory segment, and use a 0 base address for all memory address references. Use far calls for functions, and segment:offset addresses for memory access. Would be simple enough in Real Mode, but Protected Mode would require a significant amount of memory management. Would be slower than Position Dependent code, because segment registers would need to be loaded before memory could be accessed. This approach is pretty much required for CPU Virtual Memory support.
This approach would be extremely slow due to frequent segment register loads (and frequent protection checks), and because some modern CPUs are smart enough to avoid adding the segment base address to offsets when they know the segment base address is zero. It also means that you need to modify GDT entries during task switches to ensure protection (otherwise one task can load a different task's segments and access another task's data).
SpyderTL wrote:2. Use BX as base address. This seems to be a standard of sorts, but it requires code to be modified to not trash the BX register. SI and DI are also options, but are often needed for other specific instructions.
3. Use BP as base address. I've seriously considered going this route, as I'm not currently using Stack Frames. Although losing Stack Frame support may be to high of a cost, in the end.
For this case, everything that references memory will be complicated by it. For a simple example, "call foo" would become "lea eax,[foo + ebx]; call eax" (and now both EBX and EAX can't be used for real work).
Also; when there aren't enough general purpose registers you end up with temporary values being "spilled" (stored on the stack), which makes code slower (increased number of references to memory). For 16-bit and 32-bit code, there are only 7 general purpose registers that code is free to use (including EBP), which creates too much "spilling" already. Consuming a register for a base address makes this even worse and will hurt performance by increasing "spilling" and increasing the number of references to memory.
Note: It doesn't matter much which register you use. If you waste EBX for a base address, then EBP can be used for frame pointer (if frame pointers are enabled) or can be free for normal use (if frame pointers are disabled); and if you waste EBP for a base address, then EBX can be used for frame pointer (if frame pointers are enabled) or can be free for normal use (if frame pointers are disabled). The main difference is the "default segment" (e.g. memory references using EBP use SS as the default segment, which makes EBP preferable as a frame pointer if SS isn't the same as DS as it can avoid segment override prefixes).
Finally; for this method, without paging you can't protect one task from another, and without paging it'd severely limit the amount of space processes can use (e.g. for a 32-bit OS; all processes will have to share the same ~3 GiB of space, and you couldn't have 10 processes using ~3 GiB each like all other OSs can). If you use paging to avoid these problems, then I can't see any advantage of using position independent code for normal executables in the first place.
SpyderTL wrote:I know Windows and Linux use PE/ELF address tables, and actually modify the PD code as it is loaded to patch the addresses in the code with the correct address.
Modifying the code to fix addresses creates a different problem. For most OS's (using paging) if pages of code aren't modified, then you can use "memory mapped files" to avoid wasting RAM, and (if/when the pages are in RAM) you can have one copy in physical RAM that is mapped into the file system's cache and also mapped into none or more processes at the same time. For example; if an executable has 1234 KiB of code and there are 5 instances of the program running, then you might have 1000 KiB of RAM used by all 5 instances and the file system cache (with 234 KiB of it left on disk and not in RAM at all). If you have to modify the executable to fix up relocations then you can't do this. For example; if an executable has 1234 KiB of code and there are 5 instances of the program running, then you need to consume between 6170 KiB of RAM (if there's nothing in the file system cache) and 7404 KiB of RAM (if there's a full copy in the file system cache) instead of only 1000 KiB.
Note: This is why DLLs in Windows have a "default virtual address" (so that when the DLL is running at its default address no fix ups for relocation are needed and no RAM needs to be wasted). It's also why most *nix systems use a "global offset table" for shared libraries instead, so that only the GOT is modified and not all the code (and very little RAM is wasted regardless of what address the library is running at).
SpyderTL wrote:Does anyone else have any ideas or suggestions?
All "position independent code" solutions must sacrifice performance in some way; and different methods just sacrifice performance in different ways. To avoid sacrificing performance, use fixed addresses (instead of position independent code) wherever possible, and paging.
Note: Paging also sacrifices some performance (e.g. TLB misses, etc); but (unlike all the different methods of implementing position independent code) it is very powerful/flexible and (if it's used properly) it can avoid a huge amount of overhead for other things. Basically it's a small performance loss that's cancelled out by a huge performance gain. Often, beginners don't understand paging and don't see how it can improve performance and only see the overhead (e.g. TLB misses, etc), and end up making the mistake of avoiding paging to the hope of improving performance (without realising that their misguided attempt at improving performance will make performance worse).
Cheers,
Brendan