OSDev.org

Posted: **Tue Aug 16, 2011 4:37 am**

turdus wrote:Correct, but you compare the best case (free pages drops below 1024) with the worst case (fragments exceed 512).

The "worst case" of the stack-based allocator is given when no memory is allocated, all pages are free and pushed on the stack. But if there is plenty of free memory, it really does not matter how much space the stack consumes.

And the difference is even bigger if it comes to 64 bit. The stack would consume 256G for full RAM (2^48, 256Tb), while fifo can handle all of it in 4M easily. Scalability matters!

Of course, these numbers are correct. But on a more realistic side, large servers have ~ 64 GB RAM, which can be managed with a 64 MB stack. That's really cheap.

Do not forget that the free frame stack (as well as fifo) has to be filled in at boot time when almost every frame is free!

Again, that's right. But using my server example a second time, filling a 64 MB stack requires almost no time. Indeed, for larger amounts of memory require more time, but I prefer to spend this time once at boot time, instead of having a garbage collector run all the time to move around pages and merge free blocks.

Imagine you have such a giant server with several TB RAM, and you allocate memory in blocks of several 100 MB - 1 GB. As time goes by, your memory gets fragmented and your allocator needs to move frames in order to merge them. It will take ages to move this large amounts of data, and all this has to be done when the OS is up and running.

Posted: **Tue Aug 16, 2011 5:22 am**

davidv1992 wrote: Once again, in the situation of high load (which is where it would most matter) these pages probably won't be in the TLB anyway (and they definnitely will not be in the processors cache), so it will not matter

As you like.

When that memory is used for storage cache it isn't free, and thus shouldn't be in the structure that manages free memory anyway.

Depends on. My storage cache driver uses free memory, and if the allocator requests a new frame, the disk cache is consulted. In my model if storage cache would not be free to other processes, you won't be able to allocate memory as soon as storage subsystem initialized, since it eats up all.

Fragmentation will happen, and the potential ammount of fragmentation is WAY bigger than what you are allowing for, even on systems with relatively little memory.

No it's not, if you do the housekeeping, defragmentation will happen too.

The moment you need to run a defragmenter you will be taking quite a bit of the memory bandwith of the system, which will seriously hurt performance.

Wait a minute. Slowing down under some rare circumstances and "crash"/"fail" quite different!!! Not to mention that you could do the defragmentation when the cpu is idle, so the user won't notice. If you're right, you shouldn't implement swapping, as it will seriously hurt performance too.

More space for the FIFO won't solve the problem unless one reserves a lot of memory for it.

You drop words "more space" and "lot of memory" too easily, without comparing the real memory requirement with stack's. I say 4M is a lot of memory. 128k isn't, and it allows you to handle more than 10,000 fragments. The exact number of fragments depends on how your OS work, you have to calibrate it. For me 512 is enough. Period.

If any of the points I made point to ignorance on my part I apologize, but I do believe that I have a reasonable understanding of your system, enough to state those things I see as flaws.

Basically you're right, you just wrong about the volumes, and your reasonable understanding lacks some key part that makes the thing work.

XenOS wrote:pushing pages of the stack isn't that time consuming I doubt it matters much

It does. Think it over again. With stack you have to do 2^32 pushes, whilst fifo needs 2 or 3 (one for each free entry in E820). Significant difference. Again, you're wrong about the volume, you assume that both methods require same number of pushes, but it's simply not true, they have different amount of input.

Posted: **Tue Aug 16, 2011 5:28 am**

cyr1x wrote: I doubt it. A stack will require no more than a pointer of 4-bytes or 8-bytes in size and on allocation/deallocation a single "virtual" page to map the top of the stack. A stack actually *IS* the most effective way to store free frames.

If you're reasoning with dynamic allocation, forget it, since it applies to fifo buffer as well. No difference on that.
If you think stack is the most effective way, go ahead. You don't have to use this.

Posted: **Tue Aug 16, 2011 5:34 am**

Turdus, you are seriously taking that last sentence out of it's context. I was talking about boot time, when the machine is busy doing boot and not doing any user work so latency's don't matter that much. I also didn't claim that stack was faster than your method, only that the fact that your method is quite a bit faster doesn't matter that much.

I'm currently busy writing a simulation tool to simulate all three methods and compare them under realistic loads. If you are interested in writing an implementation of your method give me a message and I'll give you the (current) specs. Otherwise I'll try my best to get an optimal performance of your system, but can't guarantee anything. I will publish source and results when done.

Posted: **Tue Aug 16, 2011 5:48 am**

FlashBurn wrote:First, don´t be so aggressive, calm down! And then, when did I say that I use the classic stack approach? My pmm can handle 4kb and 2/4mb pages (and I´m using stacks for this).

You wrote "That´s the way I´m doing it. I have one stack per 4mb page and for 4kb pages". And sorry if I was aggressive. I hate explaining obvious things.

What I don´t get is, how is your algorithm faster when it gets to merging than others (like my stacks)?

Since it does not scale on amount of RAM. It scales on number of E820 entries.

In your pseudo code, does "return OK" mean the function returns and you wont scan more buffer elements? Also if you can merge 2 blocks and the page to free to one block than you need to delete one block out of your fifo and this also is very slow.

The "return OK" means that the frame is freed successfully. If you don't understand this, I really don't know why am I arguing at all.
And, occasionally moving 4k is not very slow. Moving 4M is slow.

The next thing is, if we take the worst case, 1 page free, 1 page used, 1 page free, ..., your approach consumes more memory than a stack, because your approach needs 8bytes per page and a normal stack needs 4bytes per page (assuming 32bit adresses).

You really don't get it. You will never reach that 1 free 1 used situation. And, if we spoke of address sizes: for 32 bit fifo has 8 bytes entries, stack 4. But for 64 bits, stack would require 8 bytes (double the size), while fifo still uses the same amount, 8 bytes. (It's simply because nowdays ram bus is 48 bits, which is 2^32*4096, so frame number still fits in 4 bytes. So let's assume if we could reach 1 free 1 used on a 64 bit system, fifo would take exactly as much memory as stack would in it's worst case). Furthermore, you cannot say generally it consumes more memory: fifo *DOES NOT* scale on amount of RAM.

Posted: **Tue Aug 16, 2011 5:55 am**

You made my day, man!!!

gerryg400 wrote:A stack uses only 1 pointer. That's it.

ROTFL. PLease show me a stack implementation that uses *ONLY* a pointer. No further allocation allowed.

De-allocation is faster than O(1).

LOL, LOL, LOL. You're killing me with laught, stop it. Faster than O(1)...

I found that I tended to allocate pages 1 (or a few) at a time, but de-allocate in big chunks when processes are killed or a file is closed. If all the pages in a process are linked together, you can return the entire list with 4 pointer assignments.

Let's see: deallocate a big chunk, you have to walk next pointers as many times as big chunk is multiple of 4k. That's f*cking not faster than O(1), but O(n)!!!

Posted: **Tue Aug 16, 2011 6:02 am**

turdus wrote:You made my day, man!!!

Happy to oblige.

Posted: **Tue Aug 16, 2011 6:02 am**

XenOS, I quite agree with you. There's just one thing I want to say about. If you map/unmap stack's pages at same time when you cross stack's page boundaries (my old technique) it could be worse than to map/unmap stack's pages only if the stack is fully empty (my current technique).

For legacy hardware I use separeted memory pool managed with bitmap technique. It's located at the base memory region.

Posted: **Tue Aug 16, 2011 6:08 am**

davidv1992 wrote:Turdus, you are seriously taking that last sentence out of it's context. I was talking about boot time, when the machine is busy doing boot and not doing any user work so latency's don't matter that much. I also didn't claim that stack was faster than your method, only that the fact that your method is quite a bit faster doesn't matter that much.

Sorry, but boot latency does matter to me. My OS boots within 1 sec, and I want to keep it that way, even with 64G RAM. I was not arguing the speed (well, not intensionally) but the size of required memory.

I'm currently busy writing a simulation tool to simulate all three methods and compare them under realistic loads. If you are interested in writing an implementation of your method give me a message and I'll give you the (current) specs. Otherwise I'll try my best to get an optimal performance of your system, but can't guarantee anything. I will publish source and results when done.

Really great! Thank you for doing that. In order to get correct results, please:
1. implement a reasonable defragmenter (or grow fifo buffer dynamically)
2. try different fifo buffer sizes (4k, 8k, 64k etc)
3. do not compare preallocated fifo buffer with dynamic stack buffer.

Posted: **Tue Aug 16, 2011 6:10 am**

turdus wrote: The "return OK" means that the frame is freed successfully.

Ok and this will give you many fragments because your algorithm is not working optimal! Because what this algorithm does not is merging 2 fragments and 1 frame to 1 fragment! If you would do this you would need to remove 1 fragment out of the fifo and this is slower than my stacks (and O(n)) and also iterating over the whole fifo is slower than my stacks (and O(n)). Maybe I´m wrong, but this also means the "optimal" algorithm (with merging 2 fragments and 1 frame) would be O(n²) for freeing?

So your algorithm is slower, but has the advantage that it needs less memory when the most memory is free. I don´t think that this is really an advantage.

turdus wrote: And, occasionally moving 4k is not very slow. Moving 4M is slow.

Ok, why do I need to move 4M?

turdus wrote: You will never reach that 1 free 1 used situation.

Why not? If I know how your pmm algo works, it is possible. I just need to allocate all memory with my program and then free every 2nd page.

turdus wrote: It's simply because nowdays ram bus is 48 bits

Ok, but as 1 page is 2^12 means 36bit left and you don´t get 36bit into 32bit w/o loosing information. So you will need 16bytes per fragment!

Edit::

Yep, I´m wrong O(n)+O(n) gives you O(2*n) which is equal to O(n).

Posted: **Tue Aug 16, 2011 6:28 am**

XenOS wrote:As time goes by, your memory gets fragmented and your allocator needs to move frames in order to merge them. It will take ages to move this large amounts of data, and all this has to be done when the OS is up and running.

Yes, it will take ages, but you can expand the buffer temporarily and do the move when cpu is idle. Believe me, it's not harder or slower than swapping a process out, and here you don't have to wait for a really slow device.

The point is, fragmentation won't grow endlessly, it will stop after a while (k allocation can't cause more fragments than k+1, and don't forget every time you consume the last frame in a fragment there'll be one less entry in fifo). You should use a fifo buffer big enough to minimize the defragmenter calls, but it's up to you which suits best for your OS.

Posted: **Tue Aug 16, 2011 7:17 am**

turdus wrote:It does. Think it over again. With stack you have to do 2^32 pushes, whilst fifo needs 2 or 3 (one for each free entry in E820).

That's wrong. If you have 2^32 bytes of memory, that's 2^20 pages and thus only 2^20 pushes. Filling 4 MB takes much less than a second, once at boot time.
And even if you have to fill 64 MB for a 64 GB RAM server, it's still a matter of less than a few seconds.

Posted: **Tue Aug 16, 2011 7:52 am**

ROTFL. PLease show me a stack implementation that uses *ONLY* a pointer. No further allocation allowed.

There are plenty of them. Just browse this site.

LOL, LOL, LOL. You're killing me with laught, stop it. Faster than O(1)...

You buy every lie, don't you? He means it's faster than the stack version.

Let's see: deallocate a big chunk, you have to walk next pointers as many times as big chunk is multiple of 4k. That's f*cking not faster than O(1), but O(n)!!!

You obviously never heard of merging two linked lists.

Posted: **Tue Aug 16, 2011 8:34 am**

Ok, It has been shown again that religion is more important than fact, due to the fact it's blatantly obvious that turdus thinks his implementation is flawless and the world disagrees. Shall we wait until someone comes with scientific proof?

Posted: **Tue Aug 16, 2011 8:37 am**

XenOS wrote:
turdus wrote:It does. Think it over again. With stack you have to do 2^32 pushes, whilst fifo needs 2 or 3 (one for each free entry in E820).
That's wrong. If you have 2^32 bytes of memory, that's 2^20 pages and thus only 2^20 pushes. Filling 4 MB takes much less than a second, once at boot time.
And even if you have to fill 64 MB for a 64 GB RAM server, it's still a matter of less than a few seconds.

1 second in boot times can be very long in embedded systems where boot time requirements is usually very short (boot time requirement of one second is common). This is where that multilevel page allocator comes handy. If you can push sizes of let's say 512MB, the initialization just got much faster. 4GB -> 8 pushes.

OSDev.org

Article on physical memory management

Re: Article on physical memory management

Re: Article on physical memory management

Re: Article on physical memory management

Re: Article on physical memory management

Re: Article on physical memory management

Re: Article on physical memory management

Re: Article on physical memory management

Re: Article on physical memory management

Re: Article on physical memory management

Re: Article on physical memory management

Re: Article on physical memory management

Re: Article on physical memory management

Re: Article on physical memory management

Re: Article on physical memory management

Re: Article on physical memory management