Well, if we are talking about PMM's, take a look at mine
It's a bitmap-like structure, something I call
bittree.
So, instead of trying to invent a better stack, I went ahead to invent (or re-invent, I'm not sure
) a better bitmap. A bitmap, to which I added a several indexing levels.
Characteristics:
- Looking up for a free page: O(log_32(n))
- Allocating: O(log_32(n))
- Freeing: O(log_32(n))
- Checking if page is allocated or not: O(1) - a feature that really helps to catch bugs.
- Memory spent: O(n)
I don't see why spending 128KiB per 4GiB (or 0.0031%) qualifies as "too much". And if you are willing to spend slightly over 4KiB more (precisely + 4KiB + 128 + 4, frankly last 4 bytes could be omitted, but it makes implementation much clearer), you can get overall performance of O(log_32(n)). Also it can be promoted to O(log_64(n)) easely, but it really doesn't make a big difference until 64 or 128 GiB (5 levels, 6 levels, do not care), and I prefer to keep my background libraries common for 32- and 64-bit systems.
I've posted basic idea (when it was in it's infancy)
here, it somewhat resembles
Cottontail (mentioned before by
Legendmythe) or
Brendan's idea about
array of "number of free 4 KiB pages" entries for each 2 MiB page, in this thread.
Here's a simplified illustration for 4-bit indexing groups: (in reality I'm using 32-bit groups, but it wouldn't be practical to draw that)
To make it clear: 1 means free page, 0 - allocated. Bit set in a upper level means that there's at least one bit set in corresponding group at lower level.
Basic algorithms
- Look up for a free page: start at a top, find first nonzero bit at that level (preferably using BSF instruction or GCC's __builtin_ctz()). Descend to next level, find a nonzero bit there, and so on until you hit the bottom.
- Allocating: clear a bit in lower level, then if a whole group becomes 0, update a bit in upper level, and so on until the top (or stop).
- Freeing: set a bit in lower level, if whole group was 0 before, update a bit in upper level and so on...
- Checking if page is allocated: just check the bit in lower level.
Start and end ranges can be applied to search easily - if something doesn't like addresses above 4 GiB, just add a limit. Something do no fear those addresses - just start searching from there first.
Support for 4 MiB pages also could be added - it just needs additional, parallel upper levels (slightly different, checking for 0xFFFFFFFF, not 0). 2 MiB pages would be trickier, as it involves 16-bit groups somewhere between 4KiB and 2 MiB, but are doable anyway.
Currently I'm using 4KiB-only version, and it works great
And the whole thing is implemented in ~110 lines of C code (could be less, I'm a \n-happy guy).