Design buddy allocator that supports reserved memory?

Discussions on more advanced topics such as monolithic vs micro-kernels, transactional memory models, and paging vs segmentation should go here. Use this forum to expand and improve the wiki!
Post Reply
stevefan1999
Posts: 3
Joined: Sun Jan 15, 2023 4:02 am

Design buddy allocator that supports reserved memory?

Post by stevefan1999 »

Supposed I have the following memory map: (in [start - end) format)
0x0 - 0x10000
0x10000 - 0x40000
0x50000 - 0x70000

Assume the address not listed are reserved memory, as you can see, there is a hull between [0x40000 - 0x50000). If my buddy allocator allocated in this range and accessed it, I will immediately get a General Protection fault.

I'm using a buddy allocator to implement the physical memory management, that is, I want to implement kmalloc() (rather than vmalloc() which I can already do in some extent).

kmalloc() allocate physical memory contiguously, so a buddy allocator should be the best fitting algorithm here. But as far as I have researched beyond, most of the buddy allocator implementations out there do not account for reserved/disabled memory ranges like this, and just expect to pass a big single blob of memory (i.e. I have to pass in the whole [0x0 - 0x70000), and I cannot assume there are invalid ranges like [0x40000 - 0x50000)) which is infeasible for physical memory management (take for example, some of the memory regions [0x40000 - 0x50000) are reserved for BIOS and DMA).

Also, should vmalloc() be lazily allocating memory instead? because my current implementation of vmalloc() is to take all the available physical memory pages, carve a specific contiguous virtual address, and instruct the page table to map the memory continuously. I use 0xffffc90000000000 to follow Linux kernel as an example.

Then the algorithm will get all the pages and put map all of them on the virtual address:
0xffffc90000000000 -> 0x0
0xffffc90000001000 -> 0x1000
0xffffc90000002000 -> 0x2000
0xffffc90000003000 -> 0x3000
...
0xffffc90000040000 -> 0x40000
0xffffc90000041000 -> 0x50000
0xffffc90000042000 -> 0x51000


So, as you could see, 0xffffc90000041000 can skip 0x41000, but not only this is a wrong implementation of vmalloc() (this means kmalloc() will not have available physical memory pages for allocation), this is also pretty inefficient as I have to query all the 4K pages from the physical memory, which could be large (assume that I will run this on a virtual machine with more than 64GB of memory -- which means I will have to wait for a long time to wait the for vmalloc() to initialize)

All of those problem surfaced during my kernel development but I cannot find any definitive answers. I think Linux managed to solve this using memblock, but there are few documentations about this, and I still have no idea how to read through the crappy coding structure of Linux kernel...
thewrongchristian
Member
Member
Posts: 426
Joined: Tue Apr 03, 2018 2:44 am

Re: Design buddy allocator that supports reserved memory?

Post by thewrongchristian »

stevefan1999 wrote:Supposed I have the following memory map: (in [start - end) format)
0x0 - 0x10000
0x10000 - 0x40000
0x50000 - 0x70000

Assume the address not listed are reserved memory, as you can see, there is a hull between [0x40000 - 0x50000). If my buddy allocator allocated in this range and accessed it, I will immediately get a General Protection fault.

I'm using a buddy allocator to implement the physical memory management, that is, I want to implement kmalloc() (rather than vmalloc() which I can already do in some extent).

kmalloc() allocate physical memory contiguously, so a buddy allocator should be the best fitting algorithm here. But as far as I have researched beyond, most of the buddy allocator implementations out there do not account for reserved/disabled memory ranges like this, and just expect to pass a big single blob of memory (i.e. I have to pass in the whole [0x0 - 0x70000), and I cannot assume there are invalid ranges like [0x40000 - 0x50000)) which is infeasible for physical memory management (take for example, some of the memory regions [0x40000 - 0x50000) are reserved for BIOS and DMA).
Does your buddy allocator keep track of allocated memory? Or is it incumbent on the client code to track the physical mamory that is allocated, and hand it back including the allocated size?

Because if it is the latter, at bootstrap, the memory that should be free can be handed to your buddy allocator piecemeal, perhaps even page by page, and your buddy allocator release code can consolidate the "free'd" memory into larger and larger regions, as per the buddy algorithm to merge adjacent regions into a single larger region.

Once you've free'd all the physical memory you have available, your buddy allocator will have a consolidated list of physical address ranges that it considers free, and it should be in a form it can then use to satisfy allocations again. It probably won't have a single region, but it'll be correct so long as your buddy allocator free code works correctly.
stevefan1999 wrote: Also, should vmalloc() be lazily allocating memory instead? because my current implementation of vmalloc() is to take all the available physical memory pages, carve a specific contiguous virtual address, and instruct the page table to map the memory continuously. I use 0xffffc90000000000 to follow Linux kernel as an example.

Then the algorithm will get all the pages and put map all of them on the virtual address:
0xffffc90000000000 -> 0x0
0xffffc90000001000 -> 0x1000
0xffffc90000002000 -> 0x2000
0xffffc90000003000 -> 0x3000
...
0xffffc90000040000 -> 0x40000
0xffffc90000041000 -> 0x50000
0xffffc90000042000 -> 0x51000


So, as you could see, 0xffffc90000041000 can skip 0x41000, but not only this is a wrong implementation of vmalloc() (this means kmalloc() will not have available physical memory pages for allocation), this is also pretty inefficient as I have to query all the 4K pages from the physical memory, which could be large (assume that I will run this on a virtual machine with more than 64GB of memory -- which means I will have to wait for a long time to wait the for vmalloc() to initialize)

All of those problem surfaced during my kernel development but I cannot find any definitive answers. I think Linux managed to solve this using memblock, but there are few documentations about this, and I still have no idea how to read through the crappy coding structure of Linux kernel...
Not sure how others do it, but I allocate a single contiguous VM region for my kernel heap, that is lazily allocated to physical memory on demand. The benefit of this is that I don't need to map any heap memory 1:1, like Linux does. My heap is in entirely virtually mapped pages, which need not be backed by physically contiguous physical pages. The lazy allocation is done using the page fault handler and an anonymous zero-filled virtual memory region.

It does create the problem, though, that now the kernel heap memory has to compete for physical memory with all other consumers of physical memory. Extending the heap can now fail, and require memory be stolen from other consumers, which is typically cache or user level memory. If your page fault handler requires heap allocations, then you can deadlock, so you'd need to refactor the page handling code to not use the heap recursively or be able to handle this failure. In my case, when a kernel heap based PMM allocation fails, it triggers a garbage collector to free up physical pages, as well as freeing existing unused heap items.
Post Reply