Thread Local Storage

senaus · Post by **senaus** » Wed Jan 17, 2007 4:19 pm

In developing my kernel, there is one design policy which I simply cannot decide on; that is, whether or not to provide Thread Local Storage support natively in the kernel.

So I ask you, have you implemented TLS in your kernel or not? If not, do you intend to? How do you intend to do it?

Of course the easiest route (in the short term, possibly) would be to leave it out of the kernel completely, placing the burden on compilers/libraries in user mode. I'm not quite sure which is the best option, however!

Any input would be appreciated.

Thanks,
Sean

Candy · Post by **Candy** » Wed Jan 17, 2007 4:59 pm

I intend to use TLS as a form of storing data, effectively at the top of the user stack, in such a form that the user can't access it directly but only through a specific register. I voted the second which is actually partly true, I intend to use it as only a base register for accessing this bit thread-independently. There's also a process-local storage for the kernel.

bluecode · Post by **bluecode** » Wed Jan 17, 2007 5:27 pm

I would/will use one of fs/gs for thread local storage. At least in longmode these two segment registers were designed for that purpose. But the actual allocation of the memory will be up to the userspace application. So basically the library just tells the kernel where fs/gs should point to. I voted for segmentation, but actually you just change some machine specific register.

But generally I would avoid using different page mappings within one process.

senaus · Post by **senaus** » Wed Jan 17, 2007 5:31 pm

Hmm, top of the user stack... Would this expand upwards, or is it just static? Sounds simple enough to me. Do you set this register in the image on the kernel stack and then leave it?

I originally thought about making the whole user stack TLS, so all user stacks are at the same address, then I realised how bad the performance would be due to the TLB flushing every time! It would make implementing thread migration a hell of a lot easier though...

durand · Post by **durand** » Wed Jan 17, 2007 5:41 pm

I used to use a kernel-based TLS approach in a previous kernel version which supported only a single processor. In this environment where there is 1 and only 1 thread executing at the same time, it was really easy to implement.

However, when I added support for SMP, my previous method no longer worked and I had a lot of trouble working out an elegant and clean way to implement TLS in the new environment where paging techniques could no longer be used.

In the end I implemented userland TLS rather easily and it turned out to be a far cleaner solution than anything I thought of in the kernel. I modelled it after the same method being used in unix/linux and Windows to an extent. (pthread_key_create, etc)

So, knowing this now, I reckon there's no real point to putting TLS in the kernel when you can just as easily do it in userland. Maybe you can get some sort of increased performance by putting the logic in the kernel. But my personal preference would be userland.

Brendan · Post by **Brendan** » Thu Jan 18, 2007 12:48 am

Hi,

I didn't vote, as I don't think there's a "correct" answer for all kernels...

I implement TLS using paging, with a different mapping per thread (a different address space for each thread, where "process space" is mapped into all thread's address spaces).

My reasons for this are that (on SMP machines) no locking is required to change TLS pages (as only one thread/CPU can possibly access them). To improve scalability (e.g. lock contention on "many-CPU" systems) I encourage programmers to use TLS for as much as possible because of this.

It also protects data belonging to one thread from other threads, so that if one thread has a bug that corrupts data it can't corrupt data in another threads TLS which could make debugging easier, and make the OS a little more secure.

My last reason is 32-bit systems, where it gives the process access to more linear address space. For e.g. if a process has 10 threads, with 1 GB of "process space" and 2 GB of "thread space" per thread, then the process as a whole has access to a maximum of 21 GB of linear address space (rather than 3 GB).

The main disadvantage is TLB flushing during thread switches, but this isn't necessarily a problem depending on how often you do thread switches between threads that belong to the same process. For example, if you've got 2 processes with 2 threads each (called p1t1, p1t2, p2t1 and p2t2), then if your scheduler switches from p1t1 -> p2t1 -> p1t2 -> p2t2 there is no extra thread switch costs because each thread switch changes to a different process anyway. If your scheduler switches from p1t1 -> p1t2 -> p2t1 -> p2t2 then there is extra TLB flushing (for the p1t1 -> p1t2 and the p2t1 -> p2t2 thread switches), but this only matters for TLB entries that correspond to "process space" (as the thread local storage is meant to be local to the thread regardless of how it's implemented).

For me, each thread has it's own priority and the scheduler typically switches to the highest priority thread (regardless of which process it belongs to), so it's more likely that I'll be switching between processes instead of switching between threads that belong to the same process.

The other "disadvantage" is data on a thread's stack. For me each thread's stack is at the top of it's TLS, and therefore can't be accessed by other threads. This could be a problem in some situations. For example:

Code: Select all

volatile char doneFlag = 'N';
volatile int *data;

void main(void) {
   int foo;

   foo = 1234;
   *data = foo;
   spawnThread(thread2);
   while(doneFlag == 'N') { }
}

void thread2(void) {
   int bar;

   bar = *data;
   printf("The value %d is has nothing to do with the variable "foo".\n", bar);
   doneFlag = 'Y';
   exitThread();
}

This looks easy to avoid for the example above, but if you're porting a large application it could cause problems that are hard to find.

Cheers,

Brendan

Combuster · Post by **Combuster** » Thu Jan 18, 2007 3:39 am

I have to agree with brendan: there are many ways to do this. Since i'm building an exokernel, the freedom of choice is left to userspace. However to make this work, the kernel needs the basic support for that as well, so you'd end up with partial TLS support in the kernel, and the rest of TLS code in userspace.
But that's just my design philosophy...

senaus · Post by **senaus** » Sat Jan 20, 2007 11:53 am

Thanks for the interesting ideas! I think I'll leave TLS for now, until I start work on the next version of the memory manager. My Nanokernel (which I'm working on now) will provide the support for both the user mode and paging methods, I'll adopt one of these when the time comes.

Cheers,
Sean

OSDev.org

Thread Local Storage

Should a kernel provide support for TLS?

Thread Local Storage

Re: Thread Local Storage