Thread Local Storage

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply

Should a kernel provide support for TLS?

Yes, using paging, by changing mappings for each thread
4
18%
Yes, using segmentation, changing a segment register for each thread
6
27%
Yes, using some other method (please specify)
1
5%
No, leave it up to user mode code
11
50%
No, leave it up to a kernel extension/module
0
No votes
 
Total votes: 22

senaus
Member
Member
Posts: 66
Joined: Sun Oct 22, 2006 5:31 am
Location: Oxford, UK
Contact:

Thread Local Storage

Post by senaus »

In developing my kernel, there is one design policy which I simply cannot decide on; that is, whether or not to provide Thread Local Storage support natively in the kernel.

So I ask you, have you implemented TLS in your kernel or not? If not, do you intend to? How do you intend to do it?

Of course the easiest route (in the short term, possibly) would be to leave it out of the kernel completely, placing the burden on compilers/libraries in user mode. I'm not quite sure which is the best option, however!

Any input would be appreciated.

Thanks,
Sean
User avatar
Candy
Member
Member
Posts: 3882
Joined: Tue Oct 17, 2006 11:33 pm
Location: Eindhoven

Post by Candy »

I intend to use TLS as a form of storing data, effectively at the top of the user stack, in such a form that the user can't access it directly but only through a specific register. I voted the second which is actually partly true, I intend to use it as only a base register for accessing this bit thread-independently. There's also a process-local storage for the kernel.
User avatar
bluecode
Member
Member
Posts: 202
Joined: Wed Nov 17, 2004 12:00 am
Location: Germany
Contact:

Post by bluecode »

I would/will use one of fs/gs for thread local storage. At least in longmode these two segment registers were designed for that purpose. But the actual allocation of the memory will be up to the userspace application. So basically the library just tells the kernel where fs/gs should point to. I voted for segmentation, but actually you just change some machine specific register.

But generally I would avoid using different page mappings within one process.
senaus
Member
Member
Posts: 66
Joined: Sun Oct 22, 2006 5:31 am
Location: Oxford, UK
Contact:

Post by senaus »

Hmm, top of the user stack... Would this expand upwards, or is it just static? Sounds simple enough to me. Do you set this register in the image on the kernel stack and then leave it?

I originally thought about making the whole user stack TLS, so all user stacks are at the same address, then I realised how bad the performance would be due to the TLB flushing every time! It would make implementing thread migration a hell of a lot easier though...
durand
Member
Member
Posts: 193
Joined: Wed Dec 21, 2005 12:00 am
Location: South Africa
Contact:

Post by durand »

I used to use a kernel-based TLS approach in a previous kernel version which supported only a single processor. In this environment where there is 1 and only 1 thread executing at the same time, it was really easy to implement.

However, when I added support for SMP, my previous method no longer worked and I had a lot of trouble working out an elegant and clean way to implement TLS in the new environment where paging techniques could no longer be used.

In the end I implemented userland TLS rather easily and it turned out to be a far cleaner solution than anything I thought of in the kernel. I modelled it after the same method being used in unix/linux and Windows to an extent. (pthread_key_create, etc)

So, knowing this now, I reckon there's no real point to putting TLS in the kernel when you can just as easily do it in userland. Maybe you can get some sort of increased performance by putting the logic in the kernel. But my personal preference would be userland.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Thread Local Storage

Post by Brendan »

Hi,

I didn't vote, as I don't think there's a "correct" answer for all kernels...

I implement TLS using paging, with a different mapping per thread (a different address space for each thread, where "process space" is mapped into all thread's address spaces).

My reasons for this are that (on SMP machines) no locking is required to change TLS pages (as only one thread/CPU can possibly access them). To improve scalability (e.g. lock contention on "many-CPU" systems) I encourage programmers to use TLS for as much as possible because of this.

It also protects data belonging to one thread from other threads, so that if one thread has a bug that corrupts data it can't corrupt data in another threads TLS which could make debugging easier, and make the OS a little more secure.

My last reason is 32-bit systems, where it gives the process access to more linear address space. For e.g. if a process has 10 threads, with 1 GB of "process space" and 2 GB of "thread space" per thread, then the process as a whole has access to a maximum of 21 GB of linear address space (rather than 3 GB).

The main disadvantage is TLB flushing during thread switches, but this isn't necessarily a problem depending on how often you do thread switches between threads that belong to the same process. For example, if you've got 2 processes with 2 threads each (called p1t1, p1t2, p2t1 and p2t2), then if your scheduler switches from p1t1 -> p2t1 -> p1t2 -> p2t2 there is no extra thread switch costs because each thread switch changes to a different process anyway. If your scheduler switches from p1t1 -> p1t2 -> p2t1 -> p2t2 then there is extra TLB flushing (for the p1t1 -> p1t2 and the p2t1 -> p2t2 thread switches), but this only matters for TLB entries that correspond to "process space" (as the thread local storage is meant to be local to the thread regardless of how it's implemented).

For me, each thread has it's own priority and the scheduler typically switches to the highest priority thread (regardless of which process it belongs to), so it's more likely that I'll be switching between processes instead of switching between threads that belong to the same process.

The other "disadvantage" is data on a thread's stack. For me each thread's stack is at the top of it's TLS, and therefore can't be accessed by other threads. This could be a problem in some situations. For example:

Code: Select all

volatile char doneFlag = 'N';
volatile int *data;

void main(void) {
   int foo;

   foo = 1234;
   *data = foo;
   spawnThread(thread2);
   while(doneFlag == 'N') { }
}

void thread2(void) {
   int bar;

   bar = *data;
   printf("The value %d is has nothing to do with the variable "foo".\n", bar);
   doneFlag = 'Y';
   exitThread();
}
This looks easy to avoid for the example above, but if you're porting a large application it could cause problems that are hard to find.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Post by Combuster »

I have to agree with brendan: there are many ways to do this. Since i'm building an exokernel, the freedom of choice is left to userspace. However to make this work, the kernel needs the basic support for that as well, so you'd end up with partial TLS support in the kernel, and the rest of TLS code in userspace.
But that's just my design philosophy...
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
senaus
Member
Member
Posts: 66
Joined: Sun Oct 22, 2006 5:31 am
Location: Oxford, UK
Contact:

Post by senaus »

Thanks for the interesting ideas! I think I'll leave TLS for now, until I start work on the next version of the memory manager. My Nanokernel (which I'm working on now) will provide the support for both the user mode and paging methods, I'll adopt one of these when the time comes.

Cheers,
Sean

Code: Select all

-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GCS/M/MU d- s:- a--- C++++ UL P L++ E--- W+++ N+ w++ M- V+ PS+ Y+ PE- PGP t-- 5- X R- tv b DI-- D+ G e h! r++ y+
------END GEEK CODE BLOCK------
Post Reply