The algorithm I usually use is sort of like a slab allocator, but with caches for different sized blocks instead of specific objects. The algorithm works in two layers: the lower layer handles large, uniform chunks of memory, and the upper layer handles allocating smaller blocks from the larger ones. In essence, it is an allocator made up of a lot of smaller allocators.
The lower allocator has a fixed BLOCKSIZE, which must be a power of two (for reasons explained later), and is best when it is a multiple of the page size. The entire allocator cannot allocate any blocks larger than BLOCKSIZE (a special-case allocator can be added to compensate for this). Large blocks also must be BLOCKSIZE-aligned: this is how they are associated with smaller blocks. The allocator is usually a simple linked list, and expands the heap when it needs more blocks. Each large block in the allocator has a header, of this form:
Code: Select all
struct large_block {
struct large_block *next;
struct small_block *contents;
size_t small_blocksize;
size_t refcount;
(optionally, a binary semaphore)
};
When a series of small blocks of size n are needed, a large block is "split" by making a linked list out of the non-header memory of small blocks of size n. small_blocksize is then set to n, and refcount to 0. The block itself is removed from the lower allocator's free list and added to a used list.
The upper allocator has a series of linked lists, each corresponding to a different allocation size. The number and size of these lists are arbitrary: I usually use powers of two. These lists hold large blocks of the proper small_blocksize. When one of these lists is empty, it is filled by adding a split large block from the lower allocator. When a small block is allocated, it is taken from the closest open large block (the binary sempahore of the large block can be used to implement thread-safety here, and the multiple blocks per list make the algorithm lock-free); the refcount of the large block is increased. When a small block is freed, the large block's refcount is decreased and the small block is added back to the large block; if the refcount is now zero, the large block is also freed.
This algorithm is a bit complicated, but it gets great performance for uniform allocations, and uses less memory than pretty much any other allocator when allocating a lot of small blocks (because small blocks, when allocated, keep no metadata). It also can be easily made lock-free as described. I currently have an implementation in
my libc that does not have the fine-grained locking, but is otherwise complete; it's not well commented though, and uses a lot of OS-specific stuff.
- Graphical Representation
I also use a similar algorithm in my kernel (the one described previously is in my libc), where the heap does not have to be multithreaded, does not need to free memory once it has been allocated (small blocks are recycled, large ones are not), and the size of a freed block is known. A well-commented reference implementation for that simpler algorithm can be found in
my kernel source.