Thpertic wrote:Thank you.
However this implementation of the stack is going to slow down the popping of one page, right? There is no problem in this because of the other advantages but, as I understood, I should pop the couple (address + nPages) decrement it, re-push it and return the address?
Well, that's one way to do it. But it's much faster to manipulate the entry on the top, and only pop it when it's nPages becomes zero. Then you'll have no slow down. That's what my allocator does. Note that although in theory it can scan for a suitable entry for nPages, in practice (because it's always called to allocate 1 page only) it always picks the top entry right away.
Thpertic wrote:Another thing that I'm not sure of is if I should consider the maximum stack length as reserved or I should pop the addresses off it when they are used.
Not sure if I understand your question. With simple stack, you calculate the maximum size by dividing the size of RAM by page size. With RLE stack, you estimate the number of fragments, and you can dynamically resize your stack if needed. The worst case scenario when memory is totally fragmented (that is, every second page is free and every other is allocated), stack RLE requires exactly the same amount of memory as the simple stack allocator with all memory free. But I would like to point out that reaching worst case is so extreme that you'll never face it in a million years. It's safe to say you'll need no more than 1/4th of the simple stack's starting memory requirement.
Octacone wrote:All of this can be solved by simply switching to a bitmap based allocator.
I would advise against it. Here are the reasons:
Asymptotic behaviour
Bitmap allocation is extremely expensive. You have to scan for a bit, which (even with hardware backed ffs()) is a very slow O(n) alogirthm, where
n is the size of RAM. An application can't continue until the memory is allocated, so this is a bottleneck.
Freeing is cheap, true, O(1).
With stack, allocation is very fast, O(1). You just pop an entry.
Freeing is cheap too, O(1), a simple push.
With RLE stack, allocation is still fast, O(1), You just reach for the top entry.
Freeing is a bit more expensive, O(f) where
f is the number of fragments in free memory (
f is much much less than
n by magnitudes, in worst case
f = n/page size/2). This is because you have to scan for an entry to merge the freed page into. If you keep the list sorted using bsearch() and insert in place, then this can be O(log2(f)). Also you can do this "in the background", the application don't have to wait for freeing to finish as with allocation.
Now for the memory requirements
Bitmap needs constant 128K to describe 4G RAM.
Stack allocator needs 4M on start when all memory is free, which is the maximum.
RLE stack needs only 8 bytes on start when memory is not fragmented, and 4M in worst case scenario (when fragmentation is at maximum).
If we assume you allocate the same size of memory for RLE stack as for bitmap, then you can have 16384 fragments, which is not that much, but could be enough in most cases.
Things gets a lot worse with 64 bit. Assuming 16T RAM, which is quite a lot. Stack based allocation does not need 64 bit entries, because the least 12 bits are always zero, and if you don't store them, you can describe 32+12 bits of RAM, which is 16T.
Bitmap allocation there needs 32M.
Stack allocator needs 16G on start when all memory is free, which is the maximum.
RLE stack needs only 8 bytes on start when memory is not fragmented, and 16G in worst case scenario.
As you can see, with 32M you can have 4194304 (4 million) fragments, which is quite a lot, hard to imagine to exceed. This means the average memory consumption is the best for RLE stack, and even the worst case scenario requires no more than 1/1024th of the RAM.
Hope this answers your question.
Korona wrote:With all stack-based allocators, it's worth keeping in mind that they cannot (efficiently) support multi-page allocations that are necessary for some I/O devices.
True, but you need that so rarely, that the lost in efficiency does not really count. You allocate special I/O memory only on boot once, and maybe when a user attaches a new device if you support hot-plug. That's uncomparable to how many times applications call sbrk().
Cheers,
bzt