Thread local storage
Thread local storage
So I need TLS (thread local storage), because of multithreading (it makes it easier). But as I think about how I can port e.g. a c library I came across a problem of TLS.
You have some functions tls_get_slot, tls_free_slot, tls_get_data and tls_set_data. The problem I have is that I cannot allocate a slot and assume then when I allocate a slot in another thread that it will be the same slot number (I need this e.g. for a c library, so that I can use a global var to save the slot number and use this slot in every thread).
So I thought it would be a good idea to have 2 more functions tls_get_slot_proc and tls_free_slot_proc. With this you get a slot which will be set to used for all threads in a process (also for all threads which will be created after this call). To make this work the thread only slots will grow up and the proc slots will grow down.
I would like to know what others think about this system and if 1024 slots (1 page) would be enough (I think so because you could use 1 slot and save a mem addr where you then can store all your thread local storage)?
One problem with this system is that it could become a performance problem if I need to test if a slot is free for every thread.
You have some functions tls_get_slot, tls_free_slot, tls_get_data and tls_set_data. The problem I have is that I cannot allocate a slot and assume then when I allocate a slot in another thread that it will be the same slot number (I need this e.g. for a c library, so that I can use a global var to save the slot number and use this slot in every thread).
So I thought it would be a good idea to have 2 more functions tls_get_slot_proc and tls_free_slot_proc. With this you get a slot which will be set to used for all threads in a process (also for all threads which will be created after this call). To make this work the thread only slots will grow up and the proc slots will grow down.
I would like to know what others think about this system and if 1024 slots (1 page) would be enough (I think so because you could use 1 slot and save a mem addr where you then can store all your thread local storage)?
One problem with this system is that it could become a performance problem if I need to test if a slot is free for every thread.
Re: Thread local storage
Why would there be thread only slots? TLS slots are always global to a process. TLS slots used by only a single thread are pointless, since the point of TLS is that every thread uses the same slot to access their own data.
1024 slots are more than enough, since each module usually uses only 1 slot at most.
1024 slots are more than enough, since each module usually uses only 1 slot at most.
Re: Thread local storage
I haven´t used TLS yet, so I don´t know how it is used.Gigasoft wrote: TLS slots used by only a single thread are pointless, since the point of TLS is that every thread uses the same slot to access their own data.
As I would use TLS every shared library would use a slot for TLS and I think that I so can solve my problem with returning 64bit values from a syscall. I only need to say that the returning value is in 2 TLS slots which are preserved for returning such a value (like slot 0 and 1).Gigasoft wrote: 1024 slots are more than enough, since each module usually uses only 1 slot at most.
Thanks for the hint!
- Owen
- Member
- Posts: 1700
- Joined: Fri Jun 13, 2008 3:21 pm
- Location: Cambridge, United Kingdom
- Contact:
Re: Thread local storage
You'd be surprised how applications can churn through TLS keys. 1024 may not be enough.
Re: Thread local storage
For returning 64-bit values from a syscall, I'd just use two registers (usually, eax and edx). To return large structures, the caller can pass a pointer, just make sure the pointer points to user space and be prepared to handle exceptions that occur if the pointer is invalid.
Re: Thread local storage
So maybe I should design it so that user can call for more mem.Owen wrote: You'd be surprised how applications can churn through TLS keys. 1024 may not be enough.
But here comes the problem syscall/sysenter (fast ring change instructions) use edx and so I can´t use it. The way with the TLS I could also say that if eax != 0 then you got an error and the error code is in the TLS, so that the return value is always in the TLS and I could give better errors back (and the error checking is easier).Gigasoft wrote: For returning 64-bit values from a syscall, I'd just use two registers (usually, eax and edx). To return large structures, the caller can pass a pointer, just make sure the pointer points to user space and be prepared to handle exceptions that occur if the pointer is invalid.
- Owen
- Member
- Posts: 1700
- Joined: Fri Jun 13, 2008 3:21 pm
- Location: Cambridge, United Kingdom
- Contact:
Re: Thread local storage
Then return in ebx or ecx,
or edx or esi,
or edi or ebp,
or even esp.
(OK, don't return in esp. thats awkward)
or edx or esi,
or edi or ebp,
or even esp.
(OK, don't return in esp. thats awkward)
Re: Thread local storage
According to the ABI of x86 you are only allowed to clobber eax, ecx and edx. Sysexit/Sysret are using ecx and edx, so you only have left eax for returning a value, but if I say that in my os the api is that in eax you will find "0" or an error code and the return value is in the TLS then it is so and the programs have to live with that
I know that in Haiku if you want to return a 64bit value you have to set some flag and then you will return as if the syscall were made with an int instruction. I don´t know how big the difference between returning with sysexit/sysret and iret are, but as you also need to create a stack for the iret instruction it will be not too little.
And as you most of the time will use some wrapper function for calling a kernel service you wont see a difference. Only assembly programmers need to know it (but with them I have also another problem (and all other OS´s which use sysenter, not syscall, will have this).
I know that in Haiku if you want to return a 64bit value you have to set some flag and then you will return as if the syscall were made with an int instruction. I don´t know how big the difference between returning with sysexit/sysret and iret are, but as you also need to create a stack for the iret instruction it will be not too little.
And as you most of the time will use some wrapper function for calling a kernel service you wont see a difference. Only assembly programmers need to know it (but with them I have also another problem (and all other OS´s which use sysenter, not syscall, will have this).
- Owen
- Member
- Posts: 1700
- Joined: Fri Jun 13, 2008 3:21 pm
- Location: Cambridge, United Kingdom
- Contact:
Re: Thread local storage
Syscalls don't follow the ABI anyway; what is the problem? If you're calling from (say) inline assembly anyway, you can correctly instruct the compiler to optimize around it; if calling from assembly itself, then you need to know the system call details.
The kernel should know nothing about TLS; it should be entirely managed in user space, except for perhaps asking the kernel to point %gs or %fs at it.
The kernel should know nothing about TLS; it should be entirely managed in user space, except for perhaps asking the kernel to point %gs or %fs at it.
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: Thread local storage
There be a slight contradiction, my friendOwen wrote:The kernel should know nothing about TLS; (...) asking the kernel to point %gs or %fs at it.
A kernel's design must know about TLS to the extent it can support that, It's not like userspace can load GS or FS with something and expect it to magically work...?
Re: Thread local storage
A kernel could have a system call which sets LDT entries to anything specified by the caller (while making sure they don't contain anything funky). Then it's up to the caller to implement TLS on top of this.
- Owen
- Member
- Posts: 1700
- Joined: Fri Jun 13, 2008 3:21 pm
- Location: Cambridge, United Kingdom
- Contact:
Re: Thread local storage
For the kernel, they're just arbitrary extra pointer registers which userspace can't directly write No reason they have to be TLSCombuster wrote:There be a slight contradiction, my friendOwen wrote:The kernel should know nothing about TLS; (...) asking the kernel to point %gs or %fs at it.
A kernel's design must know about TLS to the extent it can support that, It's not like userspace can load GS or FS with something and expect it to magically work...?
(You might use one to point to a pointer to the system call vector, for example, so apps just do a call (%fs:0) to get to the kernel
Re: Thread local storage
The problem with sysenter is that you need to save the user esp in user space, so you can´t be sure, that the value which you think is the saved user esp is the right value (a security problem). I don´t know how other OS´s check if the stack which was given to the syscall code is present and right, but at the moment I have no such checks.
@back to topic
The kernel needs to know about TLS, because it has to create (or better change) a segment descriptor and I find it easier if I give the user a standard TLS of 4kb (1024 entries) and the rest like getting a free slot is done in an user space library. But actually a 4byte value (a pointer) would be enough, because a thread only needs 1 ptr which then points to the TLS. Then you wont need to preallocate 4kb (for my os) and all things will be done in user space (maybe I´m taking this approach, but I like the other more).
@back to topic
The kernel needs to know about TLS, because it has to create (or better change) a segment descriptor and I find it easier if I give the user a standard TLS of 4kb (1024 entries) and the rest like getting a free slot is done in an user space library. But actually a 4byte value (a pointer) would be enough, because a thread only needs 1 ptr which then points to the TLS. Then you wont need to preallocate 4kb (for my os) and all things will be done in user space (maybe I´m taking this approach, but I like the other more).
Re: Thread local storage
When copying the parameters for a system call, just check that the parameters are outside the virtual address range reserved for the kernel, and have an exception handling system so that you can catch accesses to a non-present stack and either return an error code or pass the exception to user mode. You must implement address checking and exception handling for every user buffer passed to a system call, too. Remember that an user buffer can become invalid at any time (if another thread frees it) unless you implement a memory locking mechanism to prevent that from happening.
Re: Thread local storage
I copy the parameters from the user stack to the kernel stack, only problem is the function which does this, doesn´t know anything about the parameters. So maybe I should change my syscall system so that I use wrapper function and check the parameters for the kernel function and then call the kernel function. At the moment my kernel functions are called directly by the syscall interface.Gigasoft wrote: When copying the parameters for a system call, just check that the parameters are outside the virtual address range reserved for the kernel, and have an exception handling system so that you can catch accesses to a non-present stack and either return an error code or pass the exception to user mode. You must implement address checking and exception handling for every user buffer passed to a system call, too. Remember that an user buffer can become invalid at any time (if another thread frees it) unless you implement a memory locking mechanism to prevent that from happening.