Total move towards position independent code

OSwhatever · Post by **OSwhatever** » Sat Jun 29, 2013 3:00 pm

Position independent code has advantages when it comes to position the code in the virtual address space. Much of the code running is position independent in some way anyway as many run-times are provided as dynamically linked modules. More and more programs take advantage of this modularity.

Historically, user program binaries has been linked to a fixed address for operating systems which has worked well but it limits if any changes to base address is needed.

My question is really why there haven't been any greater lobbying towards position independent code. There is only one ISA that can made that step so that PIC doesn't harm performance and is the default addressing mode, that's x86-64. All other major architectures supports PIC by explicitly using the PC register as offset in the instructions which impacts the code. ARMv8 which is fairly new has done nothing to improve this and use the similar strategy as previous versions.

FallenAvatar · Post by **FallenAvatar** » Sat Jun 29, 2013 3:39 pm

Because PIC has been faked for years with relocation tables, and that already works. For PIC, compiliers and linkers needed to be updated, and that code is not as tested.

- Monk

P.S. I much prefer PIC even given the above.

Brendan · Post by **Brendan** » Sun Jun 30, 2013 9:39 am

Hi,

OSwhatever wrote:My question is really why there haven't been any greater lobbying towards position independent code. There is only one ISA that can made that step so that PIC doesn't harm performance and is the default addressing mode, that's x86-64. All other major architectures supports PIC by explicitly using the PC register as offset in the instructions which impacts the code. ARMv8 which is fairly new has done nothing to improve this and use the similar strategy as previous versions.

For normal executables (excluding shared libs), I can only think of one advantage for using PIC - it would allow more flexible address layout randomisation. This is a "mostly negligible" advantage (you'd hope executables don't need to rely on something like address layout randomisation to make it harder to exploit security problems that shouldn't exist to begin with).

For 64-bit 80x86, there is some impact on performance. For a simple example, consider an array of function pointers that is created at compile time but may be modified at run-time. Worse; if you don't know where the code will be then you also don't know where the code's data will be (e.g. the ".data" section). For a simple example, consider an array of "pointer to char" that is created at compile time but may be modified at run-time.

So the question is; does the "mostly negligible" advantage outweigh the "mostly negligible" disadvantage? My guess is that 99.9% of people can't know if it's worthwhile or not; and would choose "position dependant" simply because it's easier and more portable.

Cheers,

Brendan

AbstractYouShudNow · Post by **AbstractYouShudNow** » Sun Jun 30, 2013 11:01 am

That's simply an executable file is meant to be an incomplete representation of the program's address space, and some assembly programs use it. That is, they rely on their data to be placed at some specific address. And because in assembly, it is possible to address variables by directly using the address, the assembler can't generate relocation information that is 100% guaranteed to be exact and complete. And as Brendan pointed out, this would just be a nonsense.

bluemoon · Post by **bluemoon** » Sun Jun 30, 2013 11:16 am

PIC code would cost overhead for export global objects between interfaces, which an absolute address would be required.

For example:

Code: Select all

static FooClass global_foo;
int set_foo() {
  do_something( &global_foo );
}

While global object is utterly ugly, and the above code looks totally non-sense, it is still used in some case, even in apache plugin:

Code: Select all

typedef struct {
    int         enabled;      /* Enable or disable our module */
    const char *path;         /* Some path to...something */
    int         typeOfAction; /* 1 means action A, 2 means action B and so on */
} example_config;
static example_config config;
static void register_hooks(apr_pool_t *pool) {
    config.enabled = 1;
    config.path = "/foo/bar";
    config.typeOfAction = 0x00;
    ap_hook_handler(example_handler, NULL, NULL, APR_HOOK_LAST);
}

TL;DR: There is light overhead for resolving the address of global PIC object in runtime compared to load-time relocated object.

SpyderTL · Post by **SpyderTL** » Mon Jul 08, 2013 3:46 pm

I've been thinking about converting all of my code to be Position Independent code. Although I haven't tried implementing any of these yet, here are some ideas that I've come up with.

1. Give every loadable module its own memory segment, and use a 0 base address for all memory address references. Use far calls for functions, and segment:offset addresses for memory access. Would be simple enough in Real Mode, but Protected Mode would require a significant amount of memory management. Would be slower than Position Dependent code, because segment registers would need to be loaded before memory could be accessed. This approach is pretty much required for CPU Virtual Memory support.
2. Use BX as base address. This seems to be a standard of sorts, but it requires code to be modified to not trash the BX register. SI and DI are also options, but are often needed for other specific instructions.
3. Use BP as base address. I've seriously considered going this route, as I'm not currently using Stack Frames. Although losing Stack Frame support may be to high of a cost, in the end.

I know Windows and Linux use PE/ELF address tables, and actually modify the PD code as it is loaded to patch the addresses in the code with the correct address.

Does anyone else have any ideas or suggestions?

Brendan · Post by **Brendan** » Tue Jul 09, 2013 12:01 am

Hi,

SpyderTL wrote:1. Give every loadable module its own memory segment, and use a 0 base address for all memory address references. Use far calls for functions, and segment:offset addresses for memory access. Would be simple enough in Real Mode, but Protected Mode would require a significant amount of memory management. Would be slower than Position Dependent code, because segment registers would need to be loaded before memory could be accessed. This approach is pretty much required for CPU Virtual Memory support.

This approach would be extremely slow due to frequent segment register loads (and frequent protection checks), and because some modern CPUs are smart enough to avoid adding the segment base address to offsets when they know the segment base address is zero. It also means that you need to modify GDT entries during task switches to ensure protection (otherwise one task can load a different task's segments and access another task's data).

SpyderTL wrote:2. Use BX as base address. This seems to be a standard of sorts, but it requires code to be modified to not trash the BX register. SI and DI are also options, but are often needed for other specific instructions.
3. Use BP as base address. I've seriously considered going this route, as I'm not currently using Stack Frames. Although losing Stack Frame support may be to high of a cost, in the end.

For this case, everything that references memory will be complicated by it. For a simple example, "call foo" would become "lea eax,[foo + ebx]; call eax" (and now both EBX and EAX can't be used for real work).

Also; when there aren't enough general purpose registers you end up with temporary values being "spilled" (stored on the stack), which makes code slower (increased number of references to memory). For 16-bit and 32-bit code, there are only 7 general purpose registers that code is free to use (including EBP), which creates too much "spilling" already. Consuming a register for a base address makes this even worse and will hurt performance by increasing "spilling" and increasing the number of references to memory.

Note: It doesn't matter much which register you use. If you waste EBX for a base address, then EBP can be used for frame pointer (if frame pointers are enabled) or can be free for normal use (if frame pointers are disabled); and if you waste EBP for a base address, then EBX can be used for frame pointer (if frame pointers are enabled) or can be free for normal use (if frame pointers are disabled). The main difference is the "default segment" (e.g. memory references using EBP use SS as the default segment, which makes EBP preferable as a frame pointer if SS isn't the same as DS as it can avoid segment override prefixes).

Finally; for this method, without paging you can't protect one task from another, and without paging it'd severely limit the amount of space processes can use (e.g. for a 32-bit OS; all processes will have to share the same ~3 GiB of space, and you couldn't have 10 processes using ~3 GiB each like all other OSs can). If you use paging to avoid these problems, then I can't see any advantage of using position independent code for normal executables in the first place.

SpyderTL wrote:I know Windows and Linux use PE/ELF address tables, and actually modify the PD code as it is loaded to patch the addresses in the code with the correct address.

Modifying the code to fix addresses creates a different problem. For most OS's (using paging) if pages of code aren't modified, then you can use "memory mapped files" to avoid wasting RAM, and (if/when the pages are in RAM) you can have one copy in physical RAM that is mapped into the file system's cache and also mapped into none or more processes at the same time. For example; if an executable has 1234 KiB of code and there are 5 instances of the program running, then you might have 1000 KiB of RAM used by all 5 instances and the file system cache (with 234 KiB of it left on disk and not in RAM at all). If you have to modify the executable to fix up relocations then you can't do this. For example; if an executable has 1234 KiB of code and there are 5 instances of the program running, then you need to consume between 6170 KiB of RAM (if there's nothing in the file system cache) and 7404 KiB of RAM (if there's a full copy in the file system cache) instead of only 1000 KiB.

Note: This is why DLLs in Windows have a "default virtual address" (so that when the DLL is running at its default address no fix ups for relocation are needed and no RAM needs to be wasted). It's also why most *nix systems use a "global offset table" for shared libraries instead, so that only the GOT is modified and not all the code (and very little RAM is wasted regardless of what address the library is running at).

SpyderTL wrote:Does anyone else have any ideas or suggestions?

All "position independent code" solutions must sacrifice performance in some way; and different methods just sacrifice performance in different ways. To avoid sacrificing performance, use fixed addresses (instead of position independent code) wherever possible, and paging.

Note: Paging also sacrifices some performance (e.g. TLB misses, etc); but (unlike all the different methods of implementing position independent code) it is very powerful/flexible and (if it's used properly) it can avoid a huge amount of overhead for other things. Basically it's a small performance loss that's cancelled out by a huge performance gain. Often, beginners don't understand paging and don't see how it can improve performance and only see the overhead (e.g. TLB misses, etc), and end up making the mistake of avoiding paging to the hope of improving performance (without realising that their misguided attempt at improving performance will make performance worse).

Cheers,

Brendan

OSDev.org

Total move towards position independent code

Total move towards position independent code

Re: Total move towards position independent code

Re: Total move towards position independent code

Re: Total move towards position independent code

Re: Total move towards position independent code

Re: Total move towards position independent code

Re: Total move towards position independent code