@JamesM, Breandan on junior/senior topics: your argument are valid, I changed my mind, you're right. Seniors won't read junior's topics, that's a really good point.
JamesM wrote:but here's my analysis if Turdus cares
I do care about intelligent arguing, indeed.
JamesM wrote:
I think the idea is sound. The aim of the allocator is really to reduce memory usage, and it appears to do a good job of that, so could be used in embedded systems. However:
* The fragmentation is a real issue. You initially stated that fragmentation wouldn't occur, then abated on that argument and stated that it still in worst case doesn't use as much as a stack. OK, that's true, but I think you missed the point in that a stack is not O(n) on deallocation - yours is. So fragmentation matters for your allocator.
With respect, you missed the key part of my allocator. I do know stack is not O(n), but neither is mine. O(n) would mean it scales according to the
number of frames, but my allocator scales according to the
number of fragments, m. O(m) is considerably less than O(n). n is in range of millions, m is around 1000 in the worst case (but let's assume a system with really lot of small randomly exiting tasks, let's say 100000). I also included a guarantee in my freeing code that there'll be space for new fragment at all times allocator is being called:
Code: Select all
if top().base!=address then
//instead of reporting the error you can call defragmenter here
//to make you some space in FIFO by merging blocks
return ERROR
return OK
So fragmentation is not an issue.
JamesM wrote: * The idea of amortizing this by having a background task is good, however you're going to end up in a situation where you have to move physical pages around the place in order to defragment your allocator, which is not a situation you want to be in (note: I make this assumption because without mutation of the physical layout, you quickly get into a scenario where your datastructure is fragmented and can't be fixed further, when there are many holes of used frames).
As I said, missed the key part. To defragment my allocator you move around only pointer+size pairs in the first place, then you can expand fragments area, and finally, when even that's not possible, you move a few physical pages. In that worst case, you're right, that could be time consuming, that's why I have a background task to do it when computer otherwise idle. Still the worst case is several times faster than waiting for disk for swapping for example. You don't move the full memory around, only a few frames that the smaller of two pointer+size pair describes.
Examples to make this clear (let's say you have space for 3 fragments):
(0,2), (3,1),(2,1) -> (0,2),(2,1),(3,1) -> (0,4) no need for moving frames
(0,2),(3,5),(10,3) -> (1,7),(10,3) you have to move 2 frames only
JamesM wrote: * Notwithstanding the amount of copying that has to go on, this will involve a lot of locking. You'll have to stop mutation of a frame while you're copying it, which will mean your defragmentation thread will need to interact with the scheduler and even without being slow, it's a massive layering violation.
That's true, you'll need lock, but not a lot, only one to protect the fragments area. Since physical memory allocators usually has at least one lock, I don't see it's bad. Allocating a frame requires only a few cpu cycles more than stack allocator, freeing too. If there's no more space for fragments then yes, locking could block some threads long, but isn't that the case with swapping? While you're waiting for the disk to write out a frame and make space, further threads trying to allocate physical memory will be blocked.
As for the defragmenter and scheduler interaction, I could avoid that completely. I have several priority queues, and the lowest priority queue has only one thread, the idle. The priority level before that, is called "system housekeeping", and it's threads never scheduled if there's any thread in higher queues waiting. Furthermore, this queue uses cooperative multitasking, if a thread is chosen from that queue, preemption won't happen until it blocks. So scheduler can handle it of it's own, no need to interact with defragmenter. You may say it's a possibility that there'll be always threads waiting in higher queues, but it's not a problem because:
1. most systems running idle in the 90% of their time (or even more. On desktops 99% is not rare).
2. if it happens that defragmenter is not scheduled, the free function will force defragmentation before it releases the lock, so any time allocator called it can be sure that there's space for one more fragment (but most likely it will use an existing pointer+size pair, and won't need new anyway).
Now, my understanding of your algorithm may have got muddled with all the replies I read, but those are my major observations. In embedded systems where memory footprint is paramount, perhaps your system could have real benefit.
You say you have it working in situ in your kernel. I therefore wonder how you got around the "shunting memory around to defragment your allocator" problem I mentioned above in practice. I'd be interested to know.
I hope I've answered your question.
Your arguments were interesting, but with a little misinterpretation. I already think of everything you wrote (if I understand it well, tell me if not), but they were clever. Made me headache to solve them