thread friendly malloc

FlashBurn · Post by **FlashBurn** » Sat Jan 22, 2005 11:40 am

I just thought about how I will implement my malloc in the kernel, so that it is thread friendly. Now I want to know what you are thinking of my idea and that you maybe could solve a problem of this idea!

I will have a sort of slab allocator, but like the one in Sollaris (if I?me rembering right). This means you call a function which creates a pool of the size you give the function. With this function I get memory for my thread strucs. They are 568bytes in size, but maybe they become bigger.

Every thread struc will have 2 special fields. The start and the end of a free list. With that every thread has its own free list and it can, in theory, call malloc without the need of a check if some other thread is using malloc at the same time.

My problem is now that I don?t know how I get it managed that I can acces this 2 fields from userspace.

At the moment a userspace program can acces the memory from 0GiB to 3GiB. And a ring0 program can acces all memory.

Curufir · Post by **Curufir** » Sat Jan 22, 2005 11:47 am

Isn't malloc a userspace thing? From memory it's a book-keeping wrapper around the kernel primitive memory allocation routines.

durand · Post by **durand** » Sat Jan 22, 2005 12:43 pm

Typically, calls to the kernel are expensive and if you're going to go to the kernel for every memory allocation, it will slow down your system quite a bit. It's noticable.

Usually, the userland malloc requests large chunks of memory from the kernel on it's first call. Then it (userland) divides the large chunks per every malloc request. The userland malloc+free does it's own book-keeping and information about which parts of the chunks are free or how many chunks are being used.

When the large chunk has been totally used up, the malloc procedure requests more memory from the kernel for it to keep distributing.

And then, to make it thread-safe, just provide your userland malloc/free procedures with a spinlock (or whatever) for the critical sections.

So, it could reduce the kernel calls from a heckuva-lot per application run to a lot less. (1.. 2.. 3.. )?

FlashBurn · Post by **FlashBurn** » Sat Jan 22, 2005 12:54 pm

The solution with the spinlock isn?t thread friendly! First I thought of using the fs and gs register so that every thread can access its thread struc and so I wouldn?t need a kernel call. Also a malloc implementation needn?t to use these fields.

The only problem wiht this variant is that I don?t know how I could let the thread access its own thread struc without much overhead!

mystran · Post by **mystran** » Sat Jan 22, 2005 5:22 pm

I'd not use spinlocks in userspace but I agree that protecting your userspace malloc with a mutex is fine. Make your mutex fast instead. If you look at how Linux implements "futex" (it's explained somewhere in the web), you'll notice that you only need to call kernel when either (1) the lock you are trying to take is already taken or (2) you unlock a lock.

In fact, if I'm not mistaken you can optimize the second case to only go to kernel when someone is waiting for the lock you unlock. This way one'd only ever need to call kernel when two threads try to access the critical section at the same time. Anyway, I have to check that my logic about the second case is sound...

FlashBurn · Post by **FlashBurn** » Sun Jan 23, 2005 2:22 am

If I?m remembering right, you could acces your TCB in Windoze over the gs or fs register? If this is so, does anyone here knows how Windoze this does?

mystran · Post by **mystran** » Sun Jan 23, 2005 3:26 am

No idea how Windows does it, but here's a possible way:

In addition to normal GDT (or LDT) entries put an additional small (sized to TCB size or something) per-CPU data segment there, with an offset to where your TCB is. Set that segment as the fs (or gs) and change the segment offset in your scheduler (or somewhere) when you switch to a thread.

This way the TCB is still visible through normal means (if you use 4GB segments for cs, ds and/or ss) but fs (or gs) relative addresses are automatically adjusted for the current thread.

Alternatively, if you per-process thread-limit is small enough and the TCB's are always on the same area in each process, you could just make one GDT (or LDT) entry for each possible TCB location, in which case you avoid tweaking the offset in the fly.

FlashBurn · Post by **FlashBurn** » Sun Jan 23, 2005 5:11 am

Thanks for that solution. I don?t know why I didn?t come with this solution by myself?!

Now the fs register points to the thread struc and is read only and the gs register points to the 2 fields in the thread struc for the memory manager and is writeable.

With this working I can now start my malloc implementation.

OSDev.org

thread friendly malloc

thread friendly malloc

Re:thread friendly malloc

Re:thread friendly malloc

Re:thread friendly malloc

Re:thread friendly malloc

Re:thread friendly malloc

Re:thread friendly malloc

Re:thread friendly malloc