OSDev.org

Posted: **Thu May 06, 2010 9:58 am**

In my current project, I need a malloc/free style allocator for several different types of memory. Before I go and implement anything, I wanted to ask which systems you all thought were best. Normally I would just go with a linked list approach, but I figured there might be other algorithms I haven't heard about that I might go "Hey, that's perfect!" as well as the fact that I might like to try something new. I would like to note that in this environment there's basically no protection of memory so storing the management structures separate from the memory itself has little benefit. Ideally it should also be easy to remap locations if their virtual address changes (programs themselves are supposed to handle the contents of allocated range in this event, so that won't be a problem).

Posted: **Thu May 06, 2010 10:39 am**

The algorithm I usually use is sort of like a slab allocator, but with caches for different sized blocks instead of specific objects. The algorithm works in two layers: the lower layer handles large, uniform chunks of memory, and the upper layer handles allocating smaller blocks from the larger ones. In essence, it is an allocator made up of a lot of smaller allocators.

The lower allocator has a fixed BLOCKSIZE, which must be a power of two (for reasons explained later), and is best when it is a multiple of the page size. The entire allocator cannot allocate any blocks larger than BLOCKSIZE (a special-case allocator can be added to compensate for this). Large blocks also must be BLOCKSIZE-aligned: this is how they are associated with smaller blocks. The allocator is usually a simple linked list, and expands the heap when it needs more blocks. Each large block in the allocator has a header, of this form:

Code: Select all

struct large_block {
  struct large_block *next;
  struct small_block *contents;
  size_t small_blocksize;
  size_t refcount;
  (optionally, a binary semaphore)
};

When a series of small blocks of size n are needed, a large block is "split" by making a linked list out of the non-header memory of small blocks of size n. small_blocksize is then set to n, and refcount to 0. The block itself is removed from the lower allocator's free list and added to a used list.

The upper allocator has a series of linked lists, each corresponding to a different allocation size. The number and size of these lists are arbitrary: I usually use powers of two. These lists hold large blocks of the proper small_blocksize. When one of these lists is empty, it is filled by adding a split large block from the lower allocator. When a small block is allocated, it is taken from the closest open large block (the binary sempahore of the large block can be used to implement thread-safety here, and the multiple blocks per list make the algorithm lock-free); the refcount of the large block is increased. When a small block is freed, the large block's refcount is decreased and the small block is added back to the large block; if the refcount is now zero, the large block is also freed.

This algorithm is a bit complicated, but it gets great performance for uniform allocations, and uses less memory than pretty much any other allocator when allocating a lot of small blocks (because small blocks, when allocated, keep no metadata). It also can be easily made lock-free as described. I currently have an implementation in my libc that does not have the fine-grained locking, but is otherwise complete; it's not well commented though, and uses a lot of OS-specific stuff.

: Graphical Representation

I also use a similar algorithm in my kernel (the one described previously is in my libc), where the heap does not have to be multithreaded, does not need to free memory once it has been allocated (small blocks are recycled, large ones are not), and the size of a freed block is known. A well-commented reference implementation for that simpler algorithm can be found in my kernel source.

Posted: **Thu May 06, 2010 12:24 pm**

That's interesting since it sounds similar to what the specs I'm following say it has to be like (page level slabs which then are broken down). How easy do you think it would be if the OS were to for some reason need to change the virtual mappings?

Other algorithm suggestions are always welcome. In fact, it would be neat to build a kind of semi-comprehensive list of algorithms.

Posted: **Thu May 06, 2010 12:40 pm**

I use a combination of algorithms. My heap is mainly a linked list (you can ->NextHeader from the first header to the last). Each header also has a NextInSize pointer, which is zero if the header is not free. There is a second linked list sorted from smallest hole header to largest hole header. So there are two linked lists in each heap (each process has its own heap as well). Then I have a slab allocator, with blocks that are large enough for one process/thread object. This slab allocator is placed in a shared memory region, so the scheduler will have no trouble accessing any of these from any virtual address space.

That's mainly my design, I always opted for a simple design. It took me 3 or 4 full rewrites to get this heap and since then, I keep fixing bugs that keep popping up, but I think I've got a fairly bug-free heap now. Seeing as the heap is such a critical component, I don't want a too complicated design so I will make the least amount of mistakes possible (and thus, have as littel bugs as possible).

Posted: **Thu May 06, 2010 2:20 pm**

Here's the slab/page structure the specs say I have to be able to provide an array of on demand, and the internal version I'm just going to append a pointer to the next in the list to make it easier for me to add and remove regions as needed (the specs say I have to fill the user's buffer with a copy, not provide a pointer to the actual array).

Code: Select all

typedef struct
{
	UINT32 Type;
	EFI_PHYSICAL_ADDRESS PhysicalStart;
	EFI_VIRTUAL_ADDRESS VirtualStart;
	UINT64 NumberOfPages;
	UINT64 Attribute;
} EFI_MEMORY_DESCRIPTOR;

Maybe for the malloc-style one I could just rename NumberOfPages to SizeInBytes or something similar in my internal version. It would make maintaining the code a bit easier since it's basically the same structure performing the same function, just with a higher granularity.

Posted: **Thu May 06, 2010 2:53 pm**

What "specs" are you referring to, exactly? And why are you relocating memory chunks?

Posted: **Thu May 06, 2010 4:02 pm**

http://www.uefi.org/specs/

When the OS takes control of the system, it must keep the Runtime Services in memory. However, it is allowed to relocate them in whatever ways it feels is necessary as long as it tells the UEFI implementation about it (actually, from what I understand, it's more like asking the UEFI implementation to do it for them).

I'm writing an implementation of the UEFI standard on top of BIOS, similar to Intel's DUET but not based on TianoCore. I'm not quite sure how I'm going to handle the NVRAM image on shutdown or sleep yet (since I'm pretty sure the OS has complete control over those meaning I can't just step in and dump it to the disk).

Posted: **Thu May 06, 2010 7:26 pm**

My memory management is very simple, it just uses two linked lists, one of which is exclusively for freed blocks of memory, and the other for allocated ones. When you want to free() or alloc(), you simply need to move the block from one list to another. And searching for an appropriate block is faster as well.

OSDev.org

Memory Management Algorithms

Memory Management Algorithms

Re: Memory Management Algorithms

Re: Memory Management Algorithms

Re: Memory Management Algorithms

Re: Memory Management Algorithms

Re: Memory Management Algorithms

Re: Memory Management Algorithms

Re: Memory Management Algorithms