Page 1 of 1

Run-Length Encoded Stack

Posted: Mon Aug 12, 2019 6:09 am
by Thpertic
I decided to hold all my free addresses in a run-length encoded stack as I was suggested in an old thread.

It is formatted as:

Code: Select all

typedef struct free_mem {
    uint32_t *addr;
    uint32_t nContiguousPages;
} free_mem_t; 
There will be a block to push/pop a single page, and another block to push/pop all the struct.

I'm not sure if some controls I have in mind are worth it, though. I'm thinking, while pushing the entire struct, of checking if that - or at least a part of it - is already in the stack (I think this is necessary). Another one is to merge the struct with another one which is contiguous (to avoid fragmentation - you can imagine some losing in performance, though).
The same kind-of thing could apply for the push pages only: just increment the nContiguousPages if that is the last one, checking every entrance of the stack.

The last problem I'm conceptually having is when pushing the addresses after the kernel + stack sector. As the stack grows, do I check if an address is now inside the stack or I keep a part (like 2MB after the kernel) for the stack to grow without overlapping problems?

Can you help me? Thanks in advance.

Re: Run-Length Encoded Stack

Posted: Mon Aug 12, 2019 12:02 pm
by nullplan
Will you defragment the list ever? If not, it is entirely possible that all of physical memory sans kernel is in the stack as single page fragments. That would be the worst case. Your struct takes up eight bytes per entry, and a 4GB system has a million pages. So worst case, the stack can grow up to eight megabytes.

If you were to defragment the stack, try to find an efficient system in the literature. The only algorithm I can think of requires quadratic runtime (comparing every element with every other element).

Re: Run-Length Encoded Stack

Posted: Mon Aug 12, 2019 2:48 pm
by Thpertic
Pushing the frame/struct in the right stack entry would be too much slow to do every time, right? So how often should I defragment the stack?

Maybe I leave say 4MB (half of the maximum space for the stack worst case) for the stack to grow and when it is arrived there it's time to defragment?

Re: Run-Length Encoded Stack

Posted: Mon Aug 12, 2019 3:59 pm
by Octocontrabass
What's the purpose of compressing the stack? I'm having trouble seeing why you want to do this.

For example, if you're trying to reduce the physical memory used by the stack, you can dynamically allocate and free memory for the stack. (But, if you really need that extra 4MB when you have 4GB of RAM to work with, you might have more important things to optimize.)

Re: Run-Length Encoded Stack

Posted: Mon Aug 12, 2019 4:07 pm
by bzt
Hi,

I think there are three choices for defragmentation:
1. when you call free() (when you push to the free stack)
2. when the stack size reaches its limit (4Mb sounds reasonable on a 4G machine)
3. whenever you schedule the idle task

Each of these have benefits and disadvantages.

First one is the simplest, but the application has to wait (performance loss) unless you dedicate a kernel thread for freeing memory.

Second one is simple too, also fast, because most of the time you can just push a record on top of the stack. But you'll experience occasional "freeze" when your defragmenter gets called. This could be ok, because that happens extremely rarely. In return the design is simple.

Third option is very similar to the second, except you call the defragmenter every time there's nothing else to do. This way the user won't experience any performance loss, as you'll use CPU's idle time, and the free stack will be kept constantly ordered, therefore one defragmenter call will be fast (little work to do at once).

Regardless, you'll definitely need a way to merge records, and for that you'll have to check if the new record aligns at the beginning, or at the end of an existing record. There's no way to get around this. To speed up things, you can keep the free memory stack ordered, so that a check would need an O(log(n)) algorithm. Downside is, if you can't merge records, then you'll have to insert, and for that you need a memove() call to make space for the new record.

Cheers,
bzt

Re: Run-Length Encoded Stack

Posted: Mon Aug 12, 2019 4:11 pm
by bzt
Octocontrabass wrote:you can dynamically allocate and free memory for the stack.
How? We are talking about the free memory allocation that dynamic memory is built on. You can't allocate a new stack if the free memory stack is full because malloc would fail as the free memory stack is full :-) Typical chicken and egg scenario.

Cheers,
bzt

Re: Run-Length Encoded Stack

Posted: Mon Aug 12, 2019 4:26 pm
by Thpertic
Octocontrabass wrote:What's the purpose of compressing the stack? I'm having trouble seeing why you want to do this.
As bzt suggested me some time ago...
A slightly better approach than stack is run-length encoded stack. There you push two entries at once: a starting address and the number of pages. This has several advantages:
1. it requires considerably less memory
2. it's trivial to fill up the initial stack from E820 data as it uses the same format :-)
3. you can search for entries with more pages if you really want to allocate contiguous physical pages
bzt wrote:I think there are three choices for defragmentation:
1. when you call free() (when you push to the free stack)
2. when the stack size reaches its limit (4Mb sounds reasonable on a 4G machine)
3. whenever you schedule the idle task
I'm more confident with memory management rather than scheduling so I'm trying the second choice. I was thinking that, too.
I don't know about keeping the stack in order, though. Yes, it is faster, but when I push the address it would slow the process down a little. Now the question is: is the slowing down worth it?

Re: Run-Length Encoded Stack

Posted: Mon Aug 12, 2019 4:50 pm
by Octocontrabass
bzt wrote:
Octocontrabass wrote:you can dynamically allocate and free memory for the stack.
How?
When you're freeing memory, the stack sometimes needs some of that free memory. When that happens, instead of pushing the freed page onto the stack, allocate that page for the stack.

When you're allocating memory, the stack sometimes has more memory than it needs. When that happens, instead of popping a free page from the stack, free the unnecessary page being used by the stack and allow it to be allocated by the caller.

Since the stack can't grow beyond about 0.1% of the total memory (for 32-bit, or 0.2% for 64-bit), you can even skip the part where you free pages from the stack and it's still a pretty good deal.

Re: Run-Length Encoded Stack

Posted: Mon Aug 12, 2019 6:30 pm
by bzt
Octocontrabass wrote:When you're freeing memory, the stack sometimes needs some of that free memory. When that happens, instead of pushing the freed page onto the stack, allocate that page for the stack.
This only works if there's always enough space for the new stack in the address space. In other words realloc(freestack, x) must always return the same pointer. It is not difficult to achieve, and it is a clever optimization indeed. I was thinking about allocating a totally new stack (when realloc() returns a new pointer), which would require a big continuous region that you don't have as free stack being full means free memory is fragmented. I like your idea, nice!
Thpertic wrote:Now the question is: is the slowing down worth it?
Well, considering that it slows down only if you free a block which cannot be merged, but fastens up all the other mergeable frees, I would say probably yes. But to be sure, you should measure both, because this highly depends on your memory allocation usage characteristics. For example if you allocate more pages and you free them in the same order, then freed records would be merged with existing one, therefore being fast. On the other hand if you free them in random order, then there's a good chance that you can't merge most of them and you need to insert over and over again, therefore it would be slower.

To eliminate the slowdown, you can always make it asynchronous, by using a small circular buffer. There free() adds the freed block in O(1) time, and your kernel reads the record from the queue and merges with the free stack when it has the time (the user process doesn't have to wait for it to finish). In this case it doesn't matter how long an insert takes. It's a bit cheating, but user processes won't notice any performance loss (unless the circular buffer is full, then they have to wait exactly the same amount of time as they would have without the async buffer). The only downside is you'd have to have an independent kernel thread to consume records from the queue, which needs scheduling again.

My advice is, don't care about performance at first. Insert the freed record in a way that the free stack remains always sorted and keeps records merged. If you experience performance issues, then you can replace that with an O(1) algorithm and implement a defragmenter, but not sooner. Imho chances are good that you'll be totally happy with free using logarithmic search like this one (of course you won't return NULL ever, but the nearest neighbour because that's the record you have to check for merging with, and that's the exact position where you have to insert if cannot be merged).

Cheers,
bzt

Re: Run-Length Encoded Stack

Posted: Tue Aug 13, 2019 12:26 am
by linguofreak
Thpertic wrote:
The last problem I'm conceptually having is when pushing the addresses after the kernel + stack sector. As the stack grows, do I check if an address is now inside the stack or I keep a part (like 2MB after the kernel) for the stack to grow without overlapping problems?
Keep in mind that the pages on the free stack are, by definition, free, so you can actually store the free stack as a linked list within the free pages themselves since there isn't anything else they're being used for.

So each element of your free list would look like this:

Code: Select all

typedef struct free_mem {
    uint32_t magic_number; //Optional, but a good idea to make sure that
    uint32_t *next;
    uint32_t *prev; //If a doubly linked list is desired
    uint32_t nContiguousPages;
    //Additional data as desired
} free_mem_t;
Each free_mem struct would then be located at the beginning of the first page in the block of pages it described, and would have a pointer to the next (and possibly previous) block of pages in the list.