proxy wrote:I've been thinking about how to best represent process memory maps and am curious about what others have come up with. Right now, I'm just doing something simple with having an array of "regions", where each region has a start,end,permissions,etc... which is "ok", but definitely is inefficient for certain operations.
I believe the requirements generally are:
* entries need to be non-overlapping, so detecting overlaps should be made easy
* at least two entries (heap and stack) should be extendable, so detecting the bounds of the nearest region in a given direction should be made easy
* ideally, finding holes of a given size should also be easy, so things like mmap can be efficiently implemented.
So, what have people done? Do they just use a sorted array and brute force it knowing that there won't be THAT many regions per process? or is there a clever data structure I'm not thinking of?
I map virtual addresses to segment descriptors (with base, size and backing details) using binary search tree. There is one tree for the kernel address space shared between all processes, and one tree per process for the process user address space.
A sorted data structure here is important, to make lookups O(log2), as you'll be doing look ups quite often (each page fault, any system call argument that validates a user space provided address,) hence why I use a BST, but any sorted structure would do (sorted array, b+tree). In fact, an array based structure would be more optimal for lookup, as it'll be more cache friendly.
For holes to satisfy mmap calls, you're likely to have far fewer entries to track, as your space is likely to be in a small number of actual spaces. I don't have a mmap yet, so when exec'ing, I just allocate segments based on PT_LOAD segment entries from the ELF file, keeping track of the highest and lowest addresses used, then setting the break address to the highest address from which the heap is expanded.
But I do track holes in kernel space, and for that, I use a sorted map of the holes, again by base address. Given the small number of entries you'll probably have (segments are likely concentrated into contiguous regions), a sorted array will probably be optimal, but I currently use a BST here as well. I walk the available holes in address space order, using the first fit hole to allocate the desired new segment. "Removing" the new segment from the hole is as simple as advancing the hole base address by the size of the allocated segment, and reducing the size accordingly, so you don't even have to re-sort or resize your sorted map, unless the hole found was exactly the right size and can be removed.
For your specific requirements:
* entries need to be non-overlapping, so detecting overlaps should be made easy
* at least two entries (heap and stack) should be extendable, so detecting the bounds of the nearest region in a given direction should be made easy
Using a sorted map, you should be able to easily find entries that are greater than or less than your address key. My abstract map interface defines lookups using <, <=, ==, >= and > operators, so you can easily find the start of the next segment in either direction that will provide the bounds to which you can extend a segment. so both of these should be easy and efficient.
* ideally, finding holes of a given size should also be easy, so things like mmap can be efficiently implemented.
As noted above, a map of holes will likely be small, so using first-fit, you can linearly walk the map of holes by base address, for an O(N) lookup (where N is the number of holes, and will be small.) If your desired address has any alignment requirements, this can fragment your holes, but I don't find that a problem. Running up my kernel, running a fork bomb test case and mounting a filesystem, splits my kernel address space into 3 holes.
Using a sorted map by base address of the hole makes it easy to find adjacent holes when freeing segments, allowing you to easily merge adjacent holes. Certainly easier than if your holes are indexed by size. Plus, indexing by size would require a compound key of [size,base], to keep keys unique when faced with multiple holes of the same size.
TL; DR
The goto interface in my kernel is the sorted map, against which I have a number of implementations (BST, sorted array)
For address space segments, I add segments to the map sorted by segment base address.
For address space holes, I add holes to the map sort by hole base address, with a linear first fit search to find a suitable hole to satisfy an address space allocation.
Basically, use a sorted map by address for both is my advice.