Peer review: "Hidden pointer slabs".
Posted: Sat Sep 05, 2009 5:21 pm
Draft algorithm for handling "large" objects in the Slab allocator better - "Hidden pointer slabs"
Problem
Objects in the slab allocator have metadata associated with them (free/allocated). The slab itself also has metadata (next slab in chain).
With small object sizes, this is stored in a bitmap+footer at the end of the slab. With larger object sizes, this becomes more and more
wasteful as the footer+bitmap size becomes small compared to the object, resulting in wasted space.
I haven't got the original slab paper in front of me, but to my knowledge it suggests allocating another slab for metadata when object sizes become large. Linking this metadata slab with the original slab requires some sort of mapping, which is both time and space overhead.
Abstract
The following algorithm seeks to reduce (to zero) the amount of metadata required for blocks of any size (over a certain minimum), while
still maintaining fast access times.
The algorithm works by realising that the space inside a slab is not sacrosanct. Obviously, allocated objects' memory cannot be overwritten, but *free* objects' can.
Assumptions
Slab sizes within a cache are constant. Object sizes within a cache are constant.
Data structures
The cache maintains two pointers, termed the "free" and "partial" pointers.
The free pointer points to the start of a slab, all objects within which are free (or it points to zero and is invalid). In turn, the first address in that slab points to another fully free slab. To return space to the page allocator (virtual memory manager) only requires looking at this list - every slab can be freed.
Every free (non-allocated) object contains a structure, as described below:
The partial pointer points to an object in a slab that is free - i.e. to an instance of the above structure. The magic number is optional and its use if used is described later, under "recovery". The next pointer points to the next free object. The free objects thus create a linked list, terminated by a "next" pointer with value 0. Similarly the prev pointer points to the previous object. This again is optional, and is described in "recovery". Note that it should not be possible to access any object in any slab in the free list by following the partial list. They are disjoint sets.
Allocation
Freeing
Recovery
This is where unused slabs can be given back to the virtual memory manager: i.e. moving slabs from the partial list to the free list.
This can be done at free() time, but can easily be batched until the processor is idle.
METHOD 1: Using the magic number
This spools through the partial list, and for each slab seen checks magic numbers in each object to determine if the object is allocated
or not. If they're all free, all objects are removed from the partial list using the doubly linked list (constant time) and the slab is
moved to the free list where it can be pruned.
This has the drawback of using a magic number where false positives can occur.
METHOD 2: Without a magic number
This method is the same as above apart from instead of checking magic numbers, the entire partial list is checked for the presence of each
slab's member object. This is O(n^2) as opposed to O(n), but doesn't suffer the false positive of the above method, and doesn't require a
magic number field or previous field in the structure, as if the algorithm indexes through the list anyway, removal from a singly linked list can be done in the same time, so the structure only needs to be the size of one pointer.
If this is done in idle time progressively, the time order shouldn't be too much of a problem.
CONCLUSION
Presented is pseudocode that allows constant-time (O(1)) allocation and freeing of objects size >= N, where N = 2 * sizeof(pointer) + sizeof(magic number), or merely N = sizeof(pointer), if the second recovery algorithm is used.
Only two pointers are required in the cache, and no overhead at all is required, the linked list is maintained in already-allocated memory.
- James Molloy, 5 Sep 2009
Problem
Objects in the slab allocator have metadata associated with them (free/allocated). The slab itself also has metadata (next slab in chain).
With small object sizes, this is stored in a bitmap+footer at the end of the slab. With larger object sizes, this becomes more and more
wasteful as the footer+bitmap size becomes small compared to the object, resulting in wasted space.
I haven't got the original slab paper in front of me, but to my knowledge it suggests allocating another slab for metadata when object sizes become large. Linking this metadata slab with the original slab requires some sort of mapping, which is both time and space overhead.
Abstract
The following algorithm seeks to reduce (to zero) the amount of metadata required for blocks of any size (over a certain minimum), while
still maintaining fast access times.
The algorithm works by realising that the space inside a slab is not sacrosanct. Obviously, allocated objects' memory cannot be overwritten, but *free* objects' can.
Assumptions
Slab sizes within a cache are constant. Object sizes within a cache are constant.
Data structures
The cache maintains two pointers, termed the "free" and "partial" pointers.
The free pointer points to the start of a slab, all objects within which are free (or it points to zero and is invalid). In turn, the first address in that slab points to another fully free slab. To return space to the page allocator (virtual memory manager) only requires looking at this list - every slab can be freed.
Every free (non-allocated) object contains a structure, as described below:
Code: Select all
struct
{
size_t magic;
uintptr_t next, prev;
}
Allocation
Code: Select all
if Partial_Pointer != 0:
N := Partial_Pointer
Partial_Pointer := N.next
N.magic = 0 # Only if using magic number: overwrite with non-magic number when object allocated.
return N
else:
N := Free_Pointer
Free_Pointer := N.next
N.magic = 0
for all objects except index 0 in N's slab, where M[i] is the i'th object:
M[i].magic = MAGIC_NUMBER
M[i].prev = (i == 1) ? 0 : M[i-1]
M[i].next = (i == last_object) ? 0 : M[i+1]
Partial_Pointer.prev := M[1]
M[1].next := Partial_Pointer
Partial_Pointer := M[1]
return N
Code: Select all
# N is the object to free.
N.next := Partial_Pointer
N.prev := 0
N.magic = MAGIC_NUMBER
Partial_Pointer.prev := N
Partial_Pointer := N
This is where unused slabs can be given back to the virtual memory manager: i.e. moving slabs from the partial list to the free list.
This can be done at free() time, but can easily be batched until the processor is idle.
METHOD 1: Using the magic number
This spools through the partial list, and for each slab seen checks magic numbers in each object to determine if the object is allocated
or not. If they're all free, all objects are removed from the partial list using the doubly linked list (constant time) and the slab is
moved to the free list where it can be pruned.
This has the drawback of using a magic number where false positives can occur.
Code: Select all
N := Partial_Pointer
while N is valid:
S := address of slab that N is in. Find this by rounding N's address down to a multiple of the slab size.
for all objects in S, T[i]:
if T[i].magic != MAGIC_NUMBER:
N := N.next
continue
# Slab is free, move to free list. First though, we need to remove T[i] from the partial list
for all objects in S, T[i]:
T[i].next.prev = T[i].prev
T[i].prev.next = T[i].next
T[0].next = Free_Pointer
Free_Pointer.prev = T[0]
Free_Pointer = T[0]
This method is the same as above apart from instead of checking magic numbers, the entire partial list is checked for the presence of each
slab's member object. This is O(n^2) as opposed to O(n), but doesn't suffer the false positive of the above method, and doesn't require a
magic number field or previous field in the structure, as if the algorithm indexes through the list anyway, removal from a singly linked list can be done in the same time, so the structure only needs to be the size of one pointer.
If this is done in idle time progressively, the time order shouldn't be too much of a problem.
CONCLUSION
Presented is pseudocode that allows constant-time (O(1)) allocation and freeing of objects size >= N, where N = 2 * sizeof(pointer) + sizeof(magic number), or merely N = sizeof(pointer), if the second recovery algorithm is used.
Only two pointers are required in the cache, and no overhead at all is required, the linked list is maintained in already-allocated memory.
- James Molloy, 5 Sep 2009