Page 1 of 1

IA-32 Thread-local storage for shared libraries

Posted: Sat Jan 18, 2020 7:00 pm
by max
Hey guys,

I had already implemented thread-local storage support for simple statically linked binaries, which worked fine like this:
  • Create a copy of the TLS master for each thread
  • Put the address of the self-referencing user-thread-object which resides right after the TLS content into an GDT entry
  • Write the index of the GDT entry into the GS register
But how does this work with shared linking of my executable? I have prepared the following test case:

test.bin

Code: Select all

__thread int foo = 12;

int main(int argc, char** argv)
{
	foo = 5;
	errno = 4;
	mySharedLibraryFunction();
}
test.so

Code: Select all

__thread int bar = 25;

void mySharedLibraryFunction()
{
	bar = 3;
	errno = 12;
}
I've had some log outputs in the above code (simplified it just for here) and what I've investigated from this is the following:
  • For "foo" everything seems to work normally, reading and writing it.
  • The executable contains a relocation entry for "errno":

    Code: Select all

    08049660  0000010e R_386_TLS_TPOFF   00000000   errno
  • My "libc.so" also contains a relocation entry for "errno":

    Code: Select all

    000385e0  00004e23 R_386_TLS_DTPMOD3 00000000   errno
    000385e4  00004e24 R_386_TLS_DTPOFF3 00000000   errno
  • My shared library contains relocation entries for "errno" and "bar":

    Code: Select all

    000017e4  00000123 R_386_TLS_DTPMOD3 00000000   errno
    000017e8  00000124 R_386_TLS_DTPOFF3 00000000   errno
    000017ec  00000923 R_386_TLS_DTPMOD3 00000000   bar
    000017f0  00000924 R_386_TLS_DTPOFF3 00000000   bar
  • In runtime, when the shared library tries to write "errno" or "bar", a function called "___tls_get_addr" is called for which I have created a stub but not yet found out what it's supposed to do...
I have loaded all the TLS segments to memory for each ELF object. Then for each thread I would create a copy of all of those segments.

But what are the relocation entries supposed to be filled with? I can't just put some fixed address in there because it would have to change when switching threads...
Also I guess "__tls_get_addr" is called at runtime to get the TLS location for the current thread, but what exactly should it return?

Thankful for any help already!
Greets

Re: Thread-local storage for shared libraries

Posted: Sun Jan 19, 2020 8:01 am
by max
Got this link from doug16k which could help a lot, I'll update this once I got it: https://uclibc.org/docs/tls.pdf

Re: Thread-local storage for shared libraries

Posted: Mon Jan 20, 2020 4:22 am
by max
Okay, that documentation explained it well.

What I've done now is:
  • Load all TLS segments from each object and put them all in a "TLS master image". It has the structure [executable TLS|user thread object|shared lib TLS|shared lib TLS...]
  • Remember the offset for each object within this area. For example executable has 0x0, shared library is (size of executable TLS + size of user-thread-object + 0x0) etc.
  • R_386_TLS_TPOFF relocation: insert offset of the symbol relative to user-thread-object. This relocation is enough to make an access from the executable to a tls variable within a shared library possible.
  • R_386_TLS_DTPMOD32 relocation: I don't really use this, but can be used to get the module id passed to __tls_get_addr
  • R_386_TLS_DTPOFF32 relocation: also insert offset of the symbol relative to user-thread-object
Now for __tls_get_addr: I added a syscall (g_task_get_tls) which returns a pointer to the user-thread-object within the TLS of the current thread. When the function is then called at runtime, I use the address from the syscall + the passed offset to calculate the adress of the requested symbol.

Not sure if this is anyway the recommended way to do it but it works fine and keeps the TLS copy for each thread I have pretty slim. Also no need to remember any additional information as I always get the right offset passed to __tls_get_addr already.

Re: IA-32 Thread-local storage for shared libraries

Posted: Mon Jan 20, 2020 12:12 pm
by Korona
You probably don't want to use a syscall for each __tls_get_addr. It gets called a lot: every TLS access from every shared library will use it (unless the library was linked in the static TLS model; this would prevent dlopen()ing it). Also note that dynamic TLS cannot be allocated/sized at load time. To further complicate the matter, there are libraries (Mesa) that use static TLS for dlopen()ed objects for performance's sake :roll:. C libraries for Linux handle this by overallocating the static TLS area.

Re: IA-32 Thread-local storage for shared libraries

Posted: Mon Jan 20, 2020 12:59 pm
by max
Hey Korona,

thanks for your reply. Ah nice to know - so I might want to speed that up somehow.

But static TLS should still work, as the main executable has static TLS right? My references there are already fixed.
In which cases is a dynamic TLS used?

Thanks!!

Re: IA-32 Thread-local storage for shared libraries

Posted: Mon Jan 20, 2020 1:10 pm
by Korona
Static TLS is used for the executable and all libraries that the executable is linked to at link time. Dynamic TLS is only required for dlopen()ed libraries.

Re: IA-32 Thread-local storage for shared libraries

Posted: Mon Jan 20, 2020 1:31 pm
by max
Ah, thanks! :)