OSDev.org

Posted: **Sat Dec 01, 2012 8:41 pm**

I've started considering what I should use for a memory allocator in my OS project, and I've come up with a design that I think might work. I doubt that it's as good as more sophisticated designs, especially under multithreaded operation, but it looks like it will be plenty efficient for development.

Each memory block would contain three metadata items: its allocation status, its length, and the length of the previous block. There would probably be some mechanism to pack those values down to limit overhead; I think that I can get the memory overhead for a small block down to a single word. The free blocks would be kept as a set of linked lists, with each list holding blocks of a single size or of a small range of sizes. The list nodes would be stored in the free blocks with one node per block. Allocation would be simple: find the list with the smallest block size that will still hold the desired number of bytes, pop a block from the list, and, if the block is larger than the allocation size by at least two words, split the extra space into a new block that is added to the free lists. Freeing a block would involve pushing it onto the free list and checking its neighbors to see if it can be coalesced with any adjacent blocks. If the lists are doubly-linked, coalescing the blocks should take constant time.

I'm unsure as to how original this method is (probably not very original at all) or if it would be competitive with other systems, but I can't think of any major flaws in it except for the possibility of heap fragmentation and potential concurrency issues. My OS design would probably emphasize thread-local heaps, so concurrency won't be as great of an issue as it would be in other systems, but I don't want a design that will leave threads fighting over locks all the time.

So, what do you think? Is this a decent allocator design, or should I just go with some other design?

Posted: **Sun Dec 02, 2012 2:05 am**

That seems like a pretty typical list-based allocator design. It should work fine, although you're right about it probably doing poorly in multithreaded environments without some modification. The nice thing about allocators is that you usually don't assume much about how they work internally, so you can upgrade the algorithm if necessary later.

One of the more common allocators of that type is dlmalloc, a variant of which is used in glibc, if you want some reference.

Posted: **Mon Dec 03, 2012 4:16 am**

I might take a look at dlmalloc's source; it sounds like it would be a good place to start.

I'll need to figure out how to handle paging; that's supposed to be one of dlmalloc's main weaknesses, so I think I'll focus on a virtual page allocator first. I could do it with a linked list, but I'd like to be able to free pages in the middle of a page block, which doesn't seem to work well with a linked list when I want to be able to coalesce blocks, so I'm thinking about using something like a sorted list or a tree to store free page blocks. I'm expecting the number of free page blocks to be fairly low under normal circumstances, so it probably won't have to be as efficient as the finer-grained allocators, but I'd like it to be able to hold up under worst-case scenarios.

The physical page allocation should be easy; I'll probably just combine a watermark allocator and a linked-list system. I just need to worry about virtual pages...

Posted: **Tue Dec 04, 2012 7:48 am**

As noted above your system is a farily standard implementation of a typical malloc() (not to be confused with a virtual or physical memory allocator). I would make one suggestion: if you have to tag each memory block then when allocating fairly small blocks (e.g. 8/16 bytes), which are probably going to be the most common ones you allocate, you will use up a significant amount of space for your tags in comparison with the memory block they describe. Some allocators get around this by using a bitmap or buddy for small allocations. For example you split a 4096 byte page into blocks of 8 bytes, with the first 4096/8 = 512 bits = 64 bytes being a bitmap describing the used/free status of each chunk. Then finding a free block is relatively cheap: given 8 byte comparisons you need at least 8 compares to find an 8 byte block with a free bit set, then 1 32 bit compare, 1 16 bit compare, 1 8 bit compare, 1 4 bit comapare, 1 2 bit compare and 1 1 bit compare = 14 comparisons. If you store a value saying which bit number was last allocated/freed you can eliminate comparison for a large chunk of the bitmap.

For most virtual memory allocators you only need to tag some parts of the address space as used (e.g. the kernel, the running process, the page tables, stacks and the heap). Most malloc implementations expect the heap to be contiguous and only require support to extend or shrink it (by moving the end-of-heap marker). I suggest you implement this as some sort of list of tags stored entirely within kernel space: for each tag you'd need something like start address, length, type, read/write/supervisor/execute flags etc +/- prev/next members if using a linked list.

For (additional) reading about malloc implementations I'd suggest jemalloc and the hoard allocator.

Regards,
John.

Posted: **Tue Dec 04, 2012 11:15 am**

For the small allocations, I'll probably use a more compact list of free blocks with just pointers and no metadata because I won't need to coalesce those blocks unless an entire chunk of them is freed. That should be optimally compact (as long as I store the pointers in the blocks themselves) while still having close to constant-time performance. To reduce initialization costs, I'll probably combine the list with a watermark allocator and only use the list for free blocks that are below the watermark.

A contiguous heap might end up being a little more manageable, so I might start with that; I just want to be able to handle edge cases where a program frees up a whole lot of memory and ends up with just two words allocated 2GB apart. That's a lot of memory to waste...

I'm thinking about using an AVL tree as the acceleration structure for page allocation; that seems to be what Linux itself uses (if I read the wiki correctly). I'm not as worried about the kernel's structures just yet, since the first version of the allocator will be for the kernel itself, but I'll have to give it some thought.

OSDev.org

Memory allocation

Memory allocation

Re: Memory allocation

Re: Memory allocation

Re: Memory allocation

Re: Memory allocation