OSDev.org

Posted: **Sat Sep 05, 2009 5:21 pm**

Draft algorithm for handling "large" objects in the Slab allocator better - "Hidden pointer slabs"

Problem

Objects in the slab allocator have metadata associated with them (free/allocated). The slab itself also has metadata (next slab in chain).
With small object sizes, this is stored in a bitmap+footer at the end of the slab. With larger object sizes, this becomes more and more
wasteful as the footer+bitmap size becomes small compared to the object, resulting in wasted space.

I haven't got the original slab paper in front of me, but to my knowledge it suggests allocating another slab for metadata when object sizes become large. Linking this metadata slab with the original slab requires some sort of mapping, which is both time and space overhead.

Abstract

The following algorithm seeks to reduce (to zero) the amount of metadata required for blocks of any size (over a certain minimum), while
still maintaining fast access times.

The algorithm works by realising that the space inside a slab is not sacrosanct. Obviously, allocated objects' memory cannot be overwritten, but *free* objects' can.

Assumptions

Slab sizes within a cache are constant. Object sizes within a cache are constant.

Data structures

The cache maintains two pointers, termed the "free" and "partial" pointers.

The free pointer points to the start of a slab, all objects within which are free (or it points to zero and is invalid). In turn, the first address in that slab points to another fully free slab. To return space to the page allocator (virtual memory manager) only requires looking at this list - every slab can be freed.

Every free (non-allocated) object contains a structure, as described below:

Code: Select all

struct
{
    size_t magic;
    uintptr_t next, prev;
}

The partial pointer points to an object in a slab that is free - i.e. to an instance of the above structure. The magic number is optional and its use if used is described later, under "recovery". The next pointer points to the next free object. The free objects thus create a linked list, terminated by a "next" pointer with value 0. Similarly the prev pointer points to the previous object. This again is optional, and is described in "recovery". Note that it should not be possible to access any object in any slab in the free list by following the partial list. They are disjoint sets.

Allocation

Code: Select all

if Partial_Pointer != 0:
    N := Partial_Pointer
    Partial_Pointer := N.next
    N.magic = 0                 # Only if using magic number: overwrite with non-magic number when object allocated.
    return N

else:
    N := Free_Pointer
    Free_Pointer := N.next
    N.magic = 0
    for all objects except index 0 in N's slab, where M[i] is the i'th object:
        M[i].magic = MAGIC_NUMBER
        M[i].prev  = (i == 1) ? 0 : M[i-1]
        M[i].next  = (i == last_object) ? 0 : M[i+1]
    Partial_Pointer.prev := M[1]
    M[1].next := Partial_Pointer
    Partial_Pointer := M[1]
    return N

Freeing

Code: Select all

# N is the object to free.
N.next := Partial_Pointer
N.prev := 0
N.magic = MAGIC_NUMBER
Partial_Pointer.prev := N
Partial_Pointer := N

Recovery

This is where unused slabs can be given back to the virtual memory manager: i.e. moving slabs from the partial list to the free list.
This can be done at free() time, but can easily be batched until the processor is idle.

METHOD 1: Using the magic number

This spools through the partial list, and for each slab seen checks magic numbers in each object to determine if the object is allocated
or not. If they're all free, all objects are removed from the partial list using the doubly linked list (constant time) and the slab is
moved to the free list where it can be pruned.

This has the drawback of using a magic number where false positives can occur.

Code: Select all

N := Partial_Pointer
while N is valid:
    S := address of slab that N is in. Find this by rounding N's address down to a multiple of the slab size.
    for all objects in S, T[i]:
        if T[i].magic != MAGIC_NUMBER:
            N := N.next
            continue

    # Slab is free, move to free list. First though, we need to remove T[i] from the partial list
    for all objects in S, T[i]:
        T[i].next.prev = T[i].prev
        T[i].prev.next = T[i].next

    T[0].next = Free_Pointer
    Free_Pointer.prev = T[0]
    Free_Pointer = T[0]

METHOD 2: Without a magic number

This method is the same as above apart from instead of checking magic numbers, the entire partial list is checked for the presence of each
slab's member object. This is O(n^2) as opposed to O(n), but doesn't suffer the false positive of the above method, and doesn't require a
magic number field or previous field in the structure, as if the algorithm indexes through the list anyway, removal from a singly linked list can be done in the same time, so the structure only needs to be the size of one pointer.

If this is done in idle time progressively, the time order shouldn't be too much of a problem.

CONCLUSION

Presented is pseudocode that allows constant-time (O(1)) allocation and freeing of objects size >= N, where N = 2 * sizeof(pointer) + sizeof(magic number), or merely N = sizeof(pointer), if the second recovery algorithm is used.

Only two pointers are required in the cache, and no overhead at all is required, the linked list is maintained in already-allocated memory.

- James Molloy, 5 Sep 2009

Posted: **Sun Sep 06, 2009 4:17 pm**

On behalf of seemingly a lot of folks, (judging by the time this has been here and the proportionate number of replies), I'd like to forward the idea that the concept seems solid, but I haven't the slightest clue as to what it really is: I believe it's supposed to be a heap management algorithm of sorts (?), but I can't yet wrap my mind around it. But doubtlessly: if you've planned it this well, then it should be all you wanted and more.

As for me, I was never too happy with the 'block' thing for handling memory. I use a method of splitting up individual pages into 'allocation units'. To be honest, as long as your team at Pedigree is okay with it, I doubt there's anything us less experienced folk can really say that overrides the opinions of such a well organized collaboration.

P.S: thanks for the source. I browse your repo all the time.

Posted: **Sun Sep 06, 2009 4:46 pm**

Yes, it's an extension of the slab allocator, which is used in the heap.

As for me, I was never too happy with the 'block' thing for handling memory. I use a method of splitting up individual pages into 'allocation units'.

That's similar conceptually to a slab allocator.

Posted: **Sun Sep 06, 2009 4:58 pm**

^^^ Thanks.

When I referred to an algorithm that splits up each page given to an app into allocation units, I wasn't being particularly specific, though: I haven't seen any other algorithm like mine as far and wide as I've read.

In my kernel, every application has its OWN mini-memory manager given to it by the kernel's main memory manager. Each process' mini-manager (I call it a confluence, in alignment with the general river design I use) keeps track of the number of pages given to it (and therefore the process it is assigned to) by the Main Memory Manager.

The Main MMgr is responsible for keeping track of the individual pages in memory and it hands over the pages to the individual, independent MiniMgrs. The Mini Mgrs do per-process-auditing, and use a structure for auditing similar to the Page Table / Page Dir / PTE arrangement.

Each page given by the Main MemMgr creates a new entry in an auditing table that records the number of bytes used in the page given, and the number of bytes free for that page. In other words, any allocation that is more than 4KB is fulfilled by asking for the number of relevant pages from the kernel. All allocations less than 4KB are fulfilled by going through the audit records and looking for a record that tracks a page that is not fully used, and has enough space for that allocation.

I don't span small allocations across multiple pages. Only large allocations, or allocations that can't fit into any of the currently not-full pages require fresh pages. It's a bit more complex than that, but I'm keeping it short.

Posted: **Mon Sep 07, 2009 11:31 am**

@JamesM

If I understand you´re algo right then it might get problems under heavy load, when memory could be returned (and needed by other apps), but the partial list could not be parsed because of no free cpu time. One has to decide what is better, constant time under normal load (your algo) or faster under heavy load (normal slab allocator). But I have to say that this is only a assumption, I can´t proof it.

How well will it work under multithreading? I hope that I get soon to the stage where I will design (or better search for a good design

) for my memory manager. But I need one which is good for 1 CPU systems and one which is good for many CPU systems.

Posted: **Wed Sep 09, 2009 3:06 am**

FlashBurn wrote:@JamesM

If I understand you´re algo right then it might get problems under heavy load, when memory could be returned (and needed by other apps), but the partial list could not be parsed because of no free cpu time. One has to decide what is better, constant time under normal load (your algo) or faster under heavy load (normal slab allocator). But I have to say that this is only a assumption, I can´t proof it.

How well will it work under multithreading? I hope that I get soon to the stage where I will design (or better search for a good design ) for my memory manager. But I need one which is good for 1 CPU systems and one which is good for many CPU systems.

Hi,

I don't believe this problem will be run into. The allocate and free functions only add and remove from the head of the list. They don't need to traverse it. They're thus constant time no matter what the load.

Parsing through the list can be done in idle time, and is not mission-critical. Without it the heap won't shrink once it's grown, but that's the way dlmalloc works anyway!

I've extended the algorithm to scale to multiprocessor systems. I'll share it if you like: I've named it the "SLAM" allocator ("SLAB" à la James Molloy), and it's completely lockfree. The MP extension is still lockfree, and still allocates in constant time with the same leading constant. So it's pretty fast!

Cheers,

James

Posted: **Wed Sep 09, 2009 10:32 am**

I always like to have a look at code, but maybe you could only/also describe how this lockfree algo works and how you made it multithreading save. So that we don´t look at your code and make copy & paste, but have to think about your implementation to use it.

Posted: **Wed Sep 09, 2009 12:26 pm**

FlashBurn wrote: maybe you could only/also describe how this lockfree algo works and how you made it multithreading save.

There's several techniques to doing concurrent algorithms, we should probably put a wiki page together.

Posted: **Wed Oct 16, 2013 2:30 am**

I bring this from the dead for informational purposes only.
I've been rewriting SlamAllocator to Rust, for pcmattman's rustic project, and I realized something.
Even though recovery wasn't in the Pedigree C++ implementation, I will need it, and it should be safe and fast.

The possibility of false positives isn't good enough for Rust, the magic was wasting 4 bytes, and a performance degradation isn't appealing either.

If I throw out the magic field, a free node has next and prev pointers. An allocated node is made up of an AllocHeader and the user data. The header has only one pointer, to the Cache instance specialized for that allocation size, for free.

(this might be getting boring already)

The Node *next pointer of a free node overlaps the Cache *cache pointer in an allocated node.
A Node can't be a Cache and vice-versa.
Moreover, the Cache doing the recovery could just do do isAllocated = (*node == this) (or self in Rust).

We end up with no false positives, O(n) recovery, and a minimal non-wasteful allocation size of sizeof(void*), with the overhead of sizeof(void*). On top of the lockfree design and decent performance of SLAM.

OSDev.org

Peer review: "Hidden pointer slabs".

Peer review: "Hidden pointer slabs".

Re: Peer review: "Hidden pointer slabs".

Re: Peer review: "Hidden pointer slabs".

Re: Peer review: "Hidden pointer slabs".

Re: Peer review: "Hidden pointer slabs".

Re: Peer review: "Hidden pointer slabs".

Re: Peer review: "Hidden pointer slabs".

Re: Peer review: "Hidden pointer slabs".

Re: Peer review: "Hidden pointer instance slabs".