OSDev.org

Posted: **Tue Mar 18, 2025 12:35 pm**

One thing I have been struggling with lately is the best approach to allocators. I have been reading through how Linux does it, and am vaguely familiar with Darwin and their zone allocator. But "allocator structure aside" - i.e. whether or not you use object-sized zones or otherwise - the algorithms to support allocation are likely similar at some level.

The way I see it - there are two "domains" of allocators where you allocate either objects/chunks or address space. With a possible third more complex variant of address space allocation that is the physical memory allocator. However, if you generalize chunks of address space as the "objects" of the allocation and squint your eyes maybe the entire concept can be generalized to some extent.

I'm curious how you all approached and solved this problem.

1. Physical address space allocation
2. Virtual address space allocation
3. Heap allocation

(1) Is the problem of allocating potentially sparse physical pages of real memory. As I'm sure you all have seen, firmware is not particularly tidy with its usage of platform RAM. The resulting usable physical pages often ends up being 10 to 20 non-contiguous spans of RAM.

(2) Is the allocation of virtual address bits into the page table of a given task. The virtual address space is vast, many millions of times larger than the physical memory available on the system. Where do you begin placing valid mappings in such a large usable range? What do you do when you need to map in more RAM into your kernel heap because RAM usage is ballooning? What about when the kernel RAM usage draws back .. do you unmap the VA range that heap grew into? What happens when RAM usage grows again? Do you reuse the previous VA allocated or just grow upward indefinitely? (Both are in-the-wild approaches to VA allocators, by the way).

(3) Assumes that you solved both (1) and (2) and now have a heap configured with some size. At this point you begin servicing calls from the kernel runtime for e.g. malloc/calloc/kalloc or whatever you decide to call it. You have RAM mapped into the VA space of your kennel (in the simple case) and are serving a request to return some arbitrary N-byte sized chunk of RAM. The first time I implemented a heap I just set arbitrarily fixed boundaries in VA space and gave myself a fixed N-size heap implemented with dlmalloc (see also). This works okay to get started, but having a fixed size heap either limits your upward growth because it is not dynamic, or is inefficient because it allocates more physical memory than your kernel actually uses.

Posted: **Wed Mar 19, 2025 2:23 pm**

pragmatic wrote: ↑Tue Mar 18, 2025 12:35 pm I'm curious how you all approached and solved this problem.

1. Physical address space allocation

For each physical memory range passed via the multiboot header, I create a bitmap based allocator.

Memory is partitioned into ISA DMA memory (which I don't use for any ISA DMA), 32-bit DMA memory (accessible directly from PCI on PC without an IOMMU) and then the rest (which I don't support, because I don't implement PAE yet,

pragmatic wrote: ↑Tue Mar 18, 2025 12:35 pm 2. Virtual address space allocation

Virtual address space is described by binary search tree based segment maps, indexed by base address. A segment has a base address and size to describe the range of the memory segment, and the segment provides the required information to the memory mapping system to implement demand paging.

I implement the following segment types:

Kernel heap - Special segment with minimal overhead. Kernel heap pages are populated on demand, using a fixed at compile time virtual heap (currently 64MB). Even though the heap size is "fixed", memory is only allocated to the heap on demand, so the actual memory usage is usually lower.
Direct mapped segments - One to one mapping from virtual to physical memory. Used for mapping the kernel text and bss to the memory we're loaded into by the bootloader, as well as big fixed mappings like the framebuffer. Perfect candidate for mapping with huge pages, but I don't implement that yet.
File backed segment - Reads come from a vnode. For private mappings, writes go to an anonymous memory sink, but for shared mappings, writes go back to the mmap vnode.
Anonymous segment - Reads come from a single COW empty page. Writes go to anonymous memory sink.

Allocating a new segment walks the tree, looking for a gap between existing segments of sufficient size for the new mapping.

Segment allocation is rare enough that I haven't bothered optimizing this O(n) algorithm.

pragmatic wrote: ↑Tue Mar 18, 2025 12:35 pm 3. Heap allocation

Currently in flux.

My kernel uses a slab allocator. A slab is a single page, and contains objects of the same type.

I'm experimenting with a replacement allocator, which separates all the heap meta-data from the actual data, so the meta-data is isolated from the actual data.

The benefits are two-fold:

Buffer overflow is unlikely to corrupt the heap.
Data pages can be made read-only, while still allowing meta-data updates.

Both heaps implement conservative mark/sweep garbage collection, and the latter feature will be useful when I implement generational, concurrent and/or incremental garbage collection.

Posted: **Wed Mar 19, 2025 2:27 pm**

Hi,

I've done a fairly simple style allocator.

https://www.fysnet.net/bucket/index.php

I also have three "domains", as you call them. I will call them "Allocators". Each process/task may allocate a heap. This Heap contains one or more Buckets, with each Bucket containing one or more Pebbles. Each Bucket must be either physically addressable or virtually addressable. One or the other, not both.

When the Heap is created, or when a Bucket needs to be resized or added to, the Heap Allocator calls upon the Virtual Allocator. If the request is for virtual memory, this Allocator tries to allocate the requested memory, in part calling the physical memory Allocator, and then if successful, returns a contiguous block of virtual memory. If the request is for physical memory, the virtual Allocator passes the request to the physical Allocator, which if successful, returns a contiguous block of physical memory to the virtual Allocator which then passes it to the Heap Allocator.

Both types of allocation, physical or virtual, are done almost exactly the same, except for the fact that the physical memory will return a physical address and a count of physically contiguous blocks, while the virtual memory will return a virtual address and a count of blocks that are virtually contiguous and may or may not be physically contiguous.

The physical and virtual Allocators keep track of their own allocation, meaning all three Allocators are independent of one another. With this in mind, one can change how each Allocator allocates memory without effecting either of the other two.

It is designed to be simple and for the beginner (not implying that you are by any means). For example, if the beginner doesn't want to worry about the physical or virtual part, he/she can simply set aside a (large) block of physically contiguous memory and let the Bucket (Heap) Allocator do all the work. The downside to this is that there can only be one Heap used by all processes/tasks.

The specification for this Allocator does have some advantages. For example, if the caller needs physical memory below the 16Meg mark, a single flag can be used to ensure this happens. It also has room to place a character string (or other data) in each Pebble so that a "dump" of the Bucket can be used for debugging, showing what is still allocated, and by what process/task.

Just one way of doing it, maybe not the best way, but surely not the worst.

- Ben

Posted: **Sun Mar 23, 2025 10:11 am**

thewrongchristian wrote: ↑Wed Mar 19, 2025 2:23 pm For each physical memory range passed via the multiboot header, I create a bitmap based allocator.

Simple. This is what I have done in then past. And, as I recall, this is the same sort of physical allocator that linux uses (a buddy allocator).

thewrongchristian wrote: ↑Wed Mar 19, 2025 2:23 pm Virtual address space is described by binary search tree based segment maps, indexed by base address.

I've seen others mention this, and it seems like a good approach. I hadn't considered a BST but I think I will consider something similar.

What I've been struggling with is primarily the order of events of this early bootstrap. It goes something like this (if using a buddy allocator):

1. Your kernel gets execution, disable MMU/paging if enabled
2. Create page tables with kernel mapped into some fixed VA + KASLR slide (if supported)
-- Kernel is now running with virtual addressing and caching enabled. Now we need to initialize the runtime allocators/heap --
3. Collect E820/EFI map from firmware to determine ranges of usable physical memory
4. Subtract the kernel physical load address range from the usable memory map (since it is of course unavailable)
5. Compute the number of bitmaps and total bits required to represent all free physical memory you learned about from (3)
6. Find a suitable location in ranges of physical RAM from (3) to use as the bitmaps for (5)
7. Map in this physical RAM into VA space .. using what allocator? Do you just shove this into a fixed address in VA space?
8. Allocations from the kernel heap/zone allocator will sbrk (or similar), and allocate physical memory, then allocate unused VA address space, then map it in the page table. The heap implementation will then return a chunk from the newly mapped in chunk

My struggle conceptually is around the steps 4, 5, 6, and 7. I can pick a suitable location in physical memory to use as my buddy bitmap allocator data structure - but where does that get mapped into VA space? At this point we don't have a heap configured to allocate objects to use in our BST that implements the virtual address allocator, of course.

If you want your kernel to be truly dynamic and support any platform configuration then you can't make any of the above statically configured. Namely, you can't use a fixed-size BST or preallocate you bitmaps in your data section. I suppose you technically could, but at least in my opinion it would be in poor form.

So what do you do for these early transitory phases where you need VA space to map in your PA allocator, but you need the PA allocator in place to run the VA space allocator? I can think of few ugly ways of bootstrapping back and forth, but none are particularly 'clean'. In particular something like an early fixed-size VA allocator (in your data section) than eventually migrates into the real heap after it is running .. but it's all very ugly.

So, what do you all do?

Posted: **Sun Mar 23, 2025 11:16 am**

pragmatic wrote: ↑Sun Mar 23, 2025 10:11 am 1. Your kernel gets execution, disable MMU/paging if enabled
2. Create page tables with kernel mapped into some fixed VA + KASLR slide (if supported)
-- Kernel is now running with virtual addressing and caching enabled. Now we need to initialize the runtime allocators/heap --
3. Collect E820/EFI map from firmware to determine ranges of usable physical memory

I think this is already going at it the wrong way. Why disable the MMU if you want to use it in the end, anyway? That's just extra steps. Here's how I do it:

Pre-paging code and main kernel are in different binaries, because they will be linked for different environments.
Bootloader (UEFI/GRUB) loads and executes pre-paging code.
Pre-paging code loads main kernel (if it can't get the bootloader to do it)
Pre-paging code initializes paging according to what the main kernel needs.
Pre-paging code also passes on the memory info from the host environment, with the kernel location already marked as reserved.
Pre-paging code ends by executing a trampoline that enables paging and jumps to main kernel entry point.

This also enables me to have multiple versions of the pre-paging code. I have a different one for GRUB and UEFI. The main kernel is a 64-bit kernel and requires all physical memory to be mapped at the start of kernel space. The pre-paging code just does that and then transfers control to the kernel. The pre-paging code can allocate memory by adding it to the list of reserved memory the main kernel receives. The main kernel can then take control of paging when it starts executing.

The main problem you appear to have is the same everyone struggles with at the start: How to access physical memory for the purposes of paging. And the answer is the same in most cases: Map all physical memory.

Posted: **Sun Mar 23, 2025 5:14 pm**

nullplan wrote: ↑Sun Mar 23, 2025 11:16 am Why disable the MMU if you want to use it in the end, anyway? That's just extra steps.

The idea is to move the system to a well-known state. By disabling the MMU you remove any potentially weird cache or map configurations. It's probably a safe bet to assume the system is identity mapped if the MMU is enabled, but I just went the extra step here and assumed the system may be in an unknown state. Linux, in fact, does this too. I've seen it more than once in other code also.

Ultimately you might save some wall-clock time by leaving the MMU enabled and just swap the root page directory, I suppose. But I'd rather the safer route. But I do see your point.

nullplan wrote: ↑Sun Mar 23, 2025 11:16 am Here's how I do it:
Pre-paging code and main kernel are in different binaries, because they will be linked for different environments.

Bootloader (UEFI/GRUB) loads and executes pre-paging code.

Pre-paging code loads main kernel (if it can't get the bootloader to do it)

Pre-paging code initializes paging according to what the main kernel needs.

Pre-paging code also passes on the memory info from the host environment, with the kernel location already marked as reserved.

Pre-paging code ends by executing a trampoline that enables paging and jumps to main kernel entry point.

I more or less follow a similar flow, except I have a monolithic kernel design with respect to early boot and main kernel. On entry the head of the kernel does some early core and architecture-specific initialization. Eventually it calls into the main kernel entry point of the core kernel running with virtual memory enabled and relocations applied.

But, right now, there is no heap at this time. And this is the problem I am describing today. Maybe I should just ensure that by the time the core kernel is called that there is some initial heap support. The MMU is enabled immediately before the main kernel entry is called. So I can just initialize the heap with MMU disabled .. which is essentially the same thing as doing it with an identity map after the MMU is enabled (albeit perhaps a tiny bit slower wall-clock time).

nullplan wrote: ↑Sun Mar 23, 2025 11:16 am This also enables me to have multiple versions of the pre-paging code. I have a different one for GRUB and UEFI.

I see the benefit. In my design, if my EFI entry point runs then I know it's EFI. Otherwise it's a legacy boot where the bootloader just begins executing at the head of the image. So in that way I just support both and indicate to my code how we are booting based on how the loader decides to start my kernel's execution.

nullplan wrote: ↑Sun Mar 23, 2025 11:16 am The main problem you appear to have is the same everyone struggles with at the start: How to access physical memory for the purposes of paging. And the answer is the same in most cases: Map all physical memory.

To the extent possible I don't want to flat/identity map the entire RAM of the platform. That would certainly be simpler. But at some level it is riskier to maintain such a map with respect to memory corruption. It could happen without the identity map, sure, but having that map just widens the attack surface.

Posted: **Mon Mar 24, 2025 2:36 pm**

pragmatic wrote: ↑Sun Mar 23, 2025 10:11 am What I've been struggling with is primarily the order of events of this early bootstrap. It goes something like this (if using a buddy allocator):

1. Your kernel gets execution, disable MMU/paging if enabled
2. Create page tables with kernel mapped into some fixed VA + KASLR slide (if supported)
-- Kernel is now running with virtual addressing and caching enabled. Now we need to initialize the runtime allocators/heap --
3. Collect E820/EFI map from firmware to determine ranges of usable physical memory
4. Subtract the kernel physical load address range from the usable memory map (since it is of course unavailable)
5. Compute the number of bitmaps and total bits required to represent all free physical memory you learned about from (3)
6. Find a suitable location in ranges of physical RAM from (3) to use as the bitmaps for (5)
7. Map in this physical RAM into VA space .. using what allocator? Do you just shove this into a fixed address in VA space?
8. Allocations from the kernel heap/zone allocator will sbrk (or similar), and allocate physical memory, then allocate unused VA address space, then map it in the page table. The heap implementation will then return a chunk from the newly mapped in chunk

My struggle conceptually is around the steps 4, 5, 6, and 7. I can pick a suitable location in physical memory to use as my buddy bitmap allocator data structure - but where does that get mapped into VA space? At this point we don't have a heap configured to allocate objects to use in our BST that implements the virtual address allocator, of course.

If you want your kernel to be truly dynamic and support any platform configuration then you can't make any of the above statically configured. Namely, you can't use a fixed-size BST or preallocate you bitmaps in your data section. I suppose you technically could, but at least in my opinion it would be in poor form.

So what do you do for these early transitory phases where you need VA space to map in your PA allocator, but you need the PA allocator in place to run the VA space allocator? I can think of few ugly ways of bootstrapping back and forth, but none are particularly 'clean'. In particular something like an early fixed-size VA allocator (in your data section) than eventually migrates into the real heap after it is running .. but it's all very ugly.

So, what do you all do?

I map the kernel 1 to 1 high memory (grub loads the kernel at 1MB, and I offset this to 0xf0100000), and assume the data after my .bss section will become the heap and forms the basis of my bootstrap allocator. My bootstrap allocator is a very simple bump allocator, and allocations from early in the boot process come from this bump allocator.

Once the heap proper is initialized, the current bootstrap bump allocator pointer is rounded to the next page boundary, and that then becomes the base pointer of the heap.

I make the assumption that there is sufficient memory to allocate my bootstrap data structures, which basically boils down to the physical memory bitmaps and the heap per page backing structure (which I could statically allocate as it's fixed size.) It is a reasonable assumption, if I haven't got enough memory to allocate bootstrap structures, I have bigger problems.

My bootstrap allocator is described in this comment. It is mostly still relevant:

viewtopic.php?p=305656#p305656

Posted: **Mon Mar 24, 2025 9:32 pm**

thewrongchristian wrote: ↑Mon Mar 24, 2025 2:36 pm My bootstrap allocator is a very simple bump allocator, and allocations from early in the boot process come from this bump allocator.

Huh. So do you leave these allocations in place after you initialize the final runtime heap? In other words, you have some "fixed sized" heap initially with a cheap allocator. And likely no support for freeing those allocations. You use this to bootstrap the more complex runtime heap allocator and just leave this early heap in place?

That's not a terrible idea. Simple, and not too wasteful if you know ahead of time approximately what your early bootstrap allocation sizes will be.

Posted: **Wed Mar 26, 2025 3:41 pm**

pragmatic wrote: ↑Mon Mar 24, 2025 9:32 pm
thewrongchristian wrote: ↑Mon Mar 24, 2025 2:36 pm My bootstrap allocator is a very simple bump allocator, and allocations from early in the boot process come from this bump allocator.
Huh. So do you leave these allocations in place after you initialize the final runtime heap? In other words, you have some "fixed sized" heap initially with a cheap allocator. And likely no support for freeing those allocations. You use this to bootstrap the more complex runtime heap allocator and just leave this early heap in place?

Precisely.

The data structures remain in use, so they're never required to be freed and therefore not at all wasted.

None of the structures allocated are fixed size, so the area is similarly not fixed size. It just grows until no more is needed, then capped at that point.

OSDev.org

How do you allocate?

How do you allocate?

Re: How do you allocate?

Re: How do you allocate?

Re: How do you allocate?

Re: How do you allocate?

Re: How do you allocate?

Re: How do you allocate?

Re: How do you allocate?

Re: How do you allocate?