How do you allocate?
Posted: Tue Mar 18, 2025 12:35 pm
One thing I have been struggling with lately is the best approach to allocators. I have been reading through how Linux does it, and am vaguely familiar with Darwin and their zone allocator. But "allocator structure aside" - i.e. whether or not you use object-sized zones or otherwise - the algorithms to support allocation are likely similar at some level.
The way I see it - there are two "domains" of allocators where you allocate either objects/chunks or address space. With a possible third more complex variant of address space allocation that is the physical memory allocator. However, if you generalize chunks of address space as the "objects" of the allocation and squint your eyes maybe the entire concept can be generalized to some extent.
I'm curious how you all approached and solved this problem.
1. Physical address space allocation
2. Virtual address space allocation
3. Heap allocation
(1) Is the problem of allocating potentially sparse physical pages of real memory. As I'm sure you all have seen, firmware is not particularly tidy with its usage of platform RAM. The resulting usable physical pages often ends up being 10 to 20 non-contiguous spans of RAM.
(2) Is the allocation of virtual address bits into the page table of a given task. The virtual address space is vast, many millions of times larger than the physical memory available on the system. Where do you begin placing valid mappings in such a large usable range? What do you do when you need to map in more RAM into your kernel heap because RAM usage is ballooning? What about when the kernel RAM usage draws back .. do you unmap the VA range that heap grew into? What happens when RAM usage grows again? Do you reuse the previous VA allocated or just grow upward indefinitely? (Both are in-the-wild approaches to VA allocators, by the way).
(3) Assumes that you solved both (1) and (2) and now have a heap configured with some size. At this point you begin servicing calls from the kernel runtime for e.g. malloc/calloc/kalloc or whatever you decide to call it. You have RAM mapped into the VA space of your kennel (in the simple case) and are serving a request to return some arbitrary N-byte sized chunk of RAM. The first time I implemented a heap I just set arbitrarily fixed boundaries in VA space and gave myself a fixed N-size heap implemented with dlmalloc (see also). This works okay to get started, but having a fixed size heap either limits your upward growth because it is not dynamic, or is inefficient because it allocates more physical memory than your kernel actually uses.
The way I see it - there are two "domains" of allocators where you allocate either objects/chunks or address space. With a possible third more complex variant of address space allocation that is the physical memory allocator. However, if you generalize chunks of address space as the "objects" of the allocation and squint your eyes maybe the entire concept can be generalized to some extent.
I'm curious how you all approached and solved this problem.
1. Physical address space allocation
2. Virtual address space allocation
3. Heap allocation
(1) Is the problem of allocating potentially sparse physical pages of real memory. As I'm sure you all have seen, firmware is not particularly tidy with its usage of platform RAM. The resulting usable physical pages often ends up being 10 to 20 non-contiguous spans of RAM.
(2) Is the allocation of virtual address bits into the page table of a given task. The virtual address space is vast, many millions of times larger than the physical memory available on the system. Where do you begin placing valid mappings in such a large usable range? What do you do when you need to map in more RAM into your kernel heap because RAM usage is ballooning? What about when the kernel RAM usage draws back .. do you unmap the VA range that heap grew into? What happens when RAM usage grows again? Do you reuse the previous VA allocated or just grow upward indefinitely? (Both are in-the-wild approaches to VA allocators, by the way).
(3) Assumes that you solved both (1) and (2) and now have a heap configured with some size. At this point you begin servicing calls from the kernel runtime for e.g. malloc/calloc/kalloc or whatever you decide to call it. You have RAM mapped into the VA space of your kennel (in the simple case) and are serving a request to return some arbitrary N-byte sized chunk of RAM. The first time I implemented a heap I just set arbitrarily fixed boundaries in VA space and gave myself a fixed N-size heap implemented with dlmalloc (see also). This works okay to get started, but having a fixed size heap either limits your upward growth because it is not dynamic, or is inefficient because it allocates more physical memory than your kernel actually uses.