Hello all,
I've got an idea for a physical memory manager, which I will briefly introduce here. I wonder if there are any real downsides to this method...
Basically, the data structure for the information page is a tree with a fixed depth and size. It is always known which node is a child of which, as it is always the same. Each node in the tree is associated with a page index (the actual page's address should probably be calculated as the information page's index plus one plus the index).
Each node has n child nodes, except for the last ones, of course.
Now each node contains n+1 bits (n is the number of child nodes). Bit 0 indicates whether the page this node is associated with is available or not. Bits 1 through n is a bit mask indicating whether the associated child of the current node, or any of its descendants has a free page.
Finding a page for allocation would be done as follows:
1. If the first byte in the information page is 0, all pages are allocated and we move to the next page (we could construct a similar information page so we don't have to loop through all these pages in serial though, so a second order information page, but I won't cover that for now).
2. Otherwise, set index = 0.
3. If the info[index] indicates that this page is available, we're done: allocate the page.
4. If not, find the first child that has an available page (either the page itself or its descendants), and set index to the index of this child.
5. Continue with step 3.
Actually allocating the page involves updating the parents of the allocated nodes, flagging this child node not to have any pages anymore. We're done as soon as a node still has pages other than the currently allocated one. (worst case number of steps, the depth of the tree).
Deallocating the page is simply the opposite of allocating it.
Allocating a page at a specific address is easy: find the proper index and test if available. If not, you can even test its siblings to see if they are available.
All in all, it sounds like a fairly easy scheme, and very efficient. The parameters can be tweaked, but for example one can take:
- 4 children per node.
- 1 byte per node. (4 + 1, rounded up for optimization)
- 4096 bytes per page, so 4096 pages per information page.
- Tree depth of 6 (4^6 = 4096)
This would involve one page every 4096 pages (0.02% overhead). Allocating or deallocating a page would have a worst-case scenario of 6 memory reads/writes.
My questions:
- Are there any problems with this technique I have to be careful about?
- Is this technique known about? (unless it has a flaw, I can't imagine it not being known)
- Would you implement this scheme in an OS? Or would you favor for instance stack based memory management.
Idea for physical memory management
Re: Idea for physical memory management
Hmmm I believe most implementations require it to be a critical section, right? Especially the stack based implementation (as windows does it)... As it's part of the kernel, I can't imagine that to be too unexpected, but I might be wrong here.berkus wrote:Have you considered concurrent access/locking?
Re: Idea for physical memory management
Hi,
For a set of bitmaps (with 6 levels), you might have:
This could be improved by increasing the "nodes per ground" and reducing the number of bitmaps. For example:
For concurrency/locking, the best I can come up with (without thinking too hard) is having an extra "lock bit" for each bit in the highest level. For example:
For the rare cases where you need to allocate physically contiguous pages or need to care what the physical address is, something like this probably isn't worth the hassle - a single bitmap that only covers the first 16 MiB of the physical address space is enough (especially if you keep track of lowest/highest "possibly free page" to reduce search times).
Cheers,
Brendan
It didn't make sense to me. Because each node's children are represented by one bit and not one pointer, I fail to see how you're going to find a node's children. The only way that makes sense would be to have a set of bitmaps (explained below), but in that case the "1 byte per node. (4 + 1, rounded up for optimization)" part doesn't make any sense.evoex wrote:I've got an idea for a physical memory manager, which I will briefly introduce here. I wonder if there are any real downsides to this method...
For a set of bitmaps (with 6 levels), you might have:
- One bitmap where each bit represents one page (cost is 32 KiB per GiB of RAM)
- One bitmap where each bit represents a group of 4 pages (cost is 8 KiB per GiB of RAM)
- One bitmap where each bit represents a group of 4 groups of 4 pages, or 16 pages (cost is 2 KiB per GiB of RAM)
- One bitmap where each bit represents a group of 4 groups of 16 pages, or 64 pages (cost is 512 bytes per GiB of RAM)
- One bitmap where each bit represents a group of 4 groups of 64 pages, or 256 pages (cost is 128 bytes per GiB of RAM)
- One bitmap where each bit represents a group of 4 groups of 256 pages, or 1024 pages (cost is 32 bytes per GiB of RAM)
This could be improved by increasing the "nodes per ground" and reducing the number of bitmaps. For example:
- One bitmap where each bit represents one page (cost is 32 KiB per GiB of RAM)
- One bitmap where each bit represents a group of 32 pages (cost is 1 KiB per GiB of RAM)
- One bitmap where each bit represents a group of 32 groups of 32 pages, or 1024 pages (cost is 32 bytes per GiB of RAM)
For concurrency/locking, the best I can come up with (without thinking too hard) is having an extra "lock bit" for each bit in the highest level. For example:
- One bitmap where each bit represents one page (cost is 32 KiB per GiB of RAM)
- One bitmap where each bit represents a group of 32 pages (cost is 1 KiB per GiB of RAM)
- One bitmap where each even bit represents a group of 32 groups of 32 pages, or 1024 pages; and each odd bit is a lock that protects the corresponding even bit in this bitmap and all of the corresponding bits in the lower level bitmaps (cost is 64 bytes per GiB of RAM)
For me, almost all allocations/de-allocations are a single page and nobody cares what the physical address is; where stack based memory management is faster and has less overhead, and is easier for locking (e.g. one lock per stack).evoex wrote:- Would you implement this scheme in an OS? Or would you favor for instance stack based memory management.
For the rare cases where you need to allocate physically contiguous pages or need to care what the physical address is, something like this probably isn't worth the hassle - a single bitmap that only covers the first 16 MiB of the physical address space is enough (especially if you keep track of lowest/highest "possibly free page" to reduce search times).
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: Idea for physical memory management
Hi,
I tried to think of a lockless way of doing the "set of bitmaps" thing, and it is possible, but only if you never allocate/free more than one page at a time.
Cheers,
Brendan
One lock means all the CPUs potentially fighting for the same lock and getting no useful work done. For "many CPUs" you need many locks or no locks (lockless algorithms).evoex wrote:Hmmm I believe most implementations require it to be a critical section, right? Especially the stack based implementation (as windows does it)... As it's part of the kernel, I can't imagine that to be too unexpected, but I might be wrong here.berkus wrote:Have you considered concurrent access/locking?
I tried to think of a lockless way of doing the "set of bitmaps" thing, and it is possible, but only if you never allocate/free more than one page at a time.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
-
- Member
- Posts: 81
- Joined: Wed Nov 09, 2011 2:21 am
- Location: Behind a keyboard located in The Netherlands
Re: Idea for physical memory management
So much for Von Neumann, how about a Harvard example.
I personally use a flat memory model so i'll gonna have trouble on the x86 with it's mapped I/O in 1 MB range.
I personally use a flat memory model so i'll gonna have trouble on the x86 with it's mapped I/O in 1 MB range.
Re: Idea for physical memory management
Thanks guys!
Yeah Brendan, your method actually seems to be better in every way ... Mine probably has a slightly better average case number of reads/writes, but that's not going to be worth it one bit (no pun intended).
Edit: come to think of it, even average case isn't better in my idea... So I guess it's useless completely .
Yeah Brendan, your method actually seems to be better in every way ... Mine probably has a slightly better average case number of reads/writes, but that's not going to be worth it one bit (no pun intended).
Edit: come to think of it, even average case isn't better in my idea... So I guess it's useless completely .
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: Idea for physical memory management
Ever heard of the heap stored as an array: left child = 2 * current + 1, right child = 2 * current + 2, parent = (current - 1) >> 1.Brendan wrote:Because each node's children are represented by one bit and not one pointer, I fail to see how you're going to find a node's children.
Re: Idea for physical memory management
Hi,
Cheers,
Brendan
Doh - you're right. That would work (and can be extended to 4 children per node), and would have slightly better cache locality than the "sets of bitmaps" approach. I'm not sure if it's worth the extra calculations or the extra complexity (maintenance hassle) though.Combuster wrote:Ever heard of the heap stored as an array: left child = 2 * current + 1, right child = 2 * current + 2, parent = (current - 1) >> 1.Brendan wrote:Because each node's children are represented by one bit and not one pointer, I fail to see how you're going to find a node's children.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: Idea for physical memory management
What I'm doing is each CPU gets a private stack they can use to satisfy page alloc/free requests; the only constraint is the scheduler must be disabled for access. When the local stack fills/empties it then goes to the common, locked stack. Assuming there's a mix of allocations and frees on an individual processor it hits the lock infrequently. Of course, this means you can actually run out of memory while valid free memory is held invisibly, but you can have IPIs in that case to cause flushes.Brendan wrote:Hi,
One lock means all the CPUs potentially fighting for the same lock and getting no useful work done. For "many CPUs" you need many locks or no locks (lockless algorithms).evoex wrote:Hmmm I believe most implementations require it to be a critical section, right? Especially the stack based implementation (as windows does it)... As it's part of the kernel, I can't imagine that to be too unexpected, but I might be wrong here.berkus wrote:Have you considered concurrent access/locking?
I tried to think of a lockless way of doing the "set of bitmaps" thing, and it is possible, but only if you never allocate/free more than one page at a time.
Cheers,
Brendan