Physical page allocation

evoex · Post by **evoex** » Wed Aug 22, 2012 2:23 pm

Hi all,

I'm trying to consider algorithms for allocating physical page allocation. However, I have a few questions/concerns regarding this...
1. How likely is it that later in my operating system I need to test whether a specific frame is free? It seems that this requirement severely limits my options for algorithms, so before I pick I would like to understand why I'd even need to do that.
2. I would like to use PAE, for no particular reason (I always over-complicate everything, and I want to make everything perfect first try, which is a curse in OS development

. Let's say I were to use a bitmap for this: there could be (assuming 48 effective bits for an address, though I believe processors may even support more than that) 2^48/2^12 pages available. One bit per page, the bitmap would take up 2^48/2^12/8/2^12 = 2^21 pages. That's double the number of pages on a 32-bits computer, and so not all bitmap pages can even be mapped at the same time!
What are some good methods to overcome this issue? I guess I could change the page tables every time a page is freed, to get access to the proper page for this, but then I'd also need to invalidate the page cache, right? I have some concerns of doing this on such a regular basis, but that may be because of me overcomplicating things. Is it reasonable to do so?

I'll probably have more questions regarding this after some answers, as I have more concerns, but they mainly depend on the first question. If it is perfectly reasonable not to be able to tell whether a page is allocated given its address, I could imagine a few reasonable algorithms...

Thanks in advance

bluemoon · Post by **bluemoon** » Wed Aug 22, 2012 2:49 pm

evoex wrote:Hi all,
I'm trying to consider algorithms for allocating physical page allocation. However, I have a few questions/concerns regarding this...
1. How likely is it that later in my operating system I need to test whether a specific frame is free? It seems that this requirement severely limits my options for algorithms, so before I pick I would like to understand why I'd even need to do that.

This can be minimized by some design. However a more useful information is reference count(and owners) of a page when you do CoW, and this is usually done with some kind of bookkeeping (but not necessary integrated into the allocator).

evoex wrote: 2. I would like to use PAE, for no particular reason (I always over-complicate everything, and I want to make everything perfect first try, which is a curse in OS development . Let's say I were to use a bitmap for this: there could be (assuming 48 effective bits for an address, though I believe processors may even support more than that) 2^48/2^12 pages available. One bit per page, the bitmap would take up 2^48/2^12/8/2^12 = 2^21 pages. That's double the number of pages on a 32-bits computer, and so not all bitmap pages can even be mapped at the same time!

Your bitmap only need to keep track of available memory, not the whole address space.
So for a machine with 32GB of RAM, your bitmap would be 1MB (if your support memory holes, you can implement some chunks of bitmap).

What are some good methods to overcome this issue?

Check the Page_Frame_Allocation

gerryg400 · Post by **gerryg400** » Wed Aug 22, 2012 4:23 pm

evoex wrote:1. How likely is it that later in my operating system I need to test whether a specific frame is free? It seems that this requirement severely limits my options for algorithms, so before I pick I would like to understand why I'd even need to do that.

You might need some sort of reverse lookup to implement COW. If you write to a COW page (and thus copy it) you might need change the state of the other page that points to the shared physical frame. Some swapping algorithms too require reverse lookup. It all depends on your implementation.

My advice is to not think too far ahead. If you do you might never get started. Try to set realistic goals for your first memory manager and use either a bitmap or stack for allocation. Be prepared to throw the whole thing away before you implement your final system. Note that neither of those methods is ideal to implement a fully-featured desktop OS.

bewing · Post by **bewing** » Wed Aug 22, 2012 7:31 pm

You also do not have to limit yourself to allocating merely 1 page at a time. You could have the number of pages that you allocate in a chunk be proportional to the total number of pages on the system. So each of your bitmap bits may correspond to 16 or 32 contiguous pages or something, on a system with lots of RAM. You can even have it vary. Your first n bitmap pages are size 1, and the rest are size 16. Or something even trickier than that.

evoex · Post by **evoex** » Thu Aug 23, 2012 10:43 am

Thanks for your help guys!

@bluemoon: I know that I only have to keep track of available memory, but I'm also nearly obsessively trying to make everything work for even the most unlikely events. Imagine someone wants to run my OS in 25 years time on a 32-bit computer with 256 TB of RAM. Okay, quite unlikely, I guess I should just limit the amount of RAM, but it goes against my nature as a programmer

.

I guess I'll settle for some bad algorithm for now. I'll get back to it later.

Thanks again!

rdos · Post by **rdos** » Mon Sep 10, 2012 12:28 pm

I'm also looking for better alternatives for physical page allocation. I use a stack, which is bad because it requires locks, and it also takes way too much memory (4 bytes per 4K page).

A special issue is that I must somehow be able to keep multiple GB of physical memory in the lower 4G since the physical memory manager will run in 32-bit protected mode. Therefore, the stack method is more or less out of the question since it uses 1MB per GB.

A better alternative is bitmaps. It's pretty easy to provide lockless access which is good when the pagefault handler can request physical memory at any point. The problem with bitmaps is that they are real slow when memory becomes exhausted, or when searches are done in areas which contain little free memory. I have this idea where the basics for allocation is a 4K page used as a bitmap. It can handle 128MB of physical memory. The entire 4G address space only require 32 such pages. Each bitmap can then have statistics on number of free pages, which is updated (without locks) as memory is freed or allocated. The allocator will regularily go through all bitmap pages, and pick the page with the largest number of free pages as the current allocation pool. This can somehow solve the issue with speed related to number of free pages. Additionally, there can be a current index within the page where the search starts. As the end of the page is reached, a new pool is searched for (the one with the most available entries).

Another issue is that memory must be partitioned into 32-bit and 64-bit. This is because some devices need memory that must be below 4G. This could be solved by having current pools for both 32-bit and 64-bit allocations, and letting the 64-bit version prefer above 4G addresses, but still resorting to lower addresses when few high addresses are available.

bluemoon · Post by **bluemoon** » Mon Sep 10, 2012 12:54 pm

rdos wrote:I'm also looking for better alternatives for physical page allocation. I use a stack, which is bad because it requires locks, and it also takes way too much memory (4 bytes per 4K page).

I also use stack, but I only allocate 4K for the whole system to start. Upon I push more page into the stack it generate #PF to enlarge the stack; upon pop it release the extra stack space.
So, nomatter how many memory, the total amount of useful memory is constant and about the total amount of ram - kernel -4K.
(If doing it more aggressively you can use 4 byte or even zero byte since the stack is empty when run out of memory)

to reduce stress on lock, you may split the initial resource into zones(per core), when a zone is low on fuel, you trigger the fuel transfer mechanism - which can be as quick as a remap of page referring to stack content, or link list pointer assignment. The zone idea also resolve device memory problem.

rdos · Post by **rdos** » Mon Sep 10, 2012 1:14 pm

The bitmap approach has another advantage: It can easily detect double-frees. The linked list allocation will "flip-out" when an already free page is freed again. Allocating several adjacent entries is also much easier with a bitmap (although possible with a list as well). Some devices require multiple pages.

rdos · Post by **rdos** » Mon Sep 10, 2012 1:30 pm

bluemoon wrote:to reduce stress on lock, you may split the initial resource into zones(per core), when a zone is low on fuel, you trigger the fuel transfer mechanism - which can be as quick as a remap of page referring to stack content, or link list pointer assignment. The zone idea also resolve device memory problem.

This can done with the "paged-bitmap" method as well. You can keep pointers to current page-bitmap per 32/64 bit, and even per core if that is desired (although I doubt this is useful as then the core will not use the allocation history of other cores). A "control-block " array that covers the entire physical memory will be used (if an 8 byte control-block is required for a 4K page bitmap, it means 8 bytes per 128MB, which translates to 256 bytes per 4G). This also makes it fast to free, as then the code traverse the control-block array and page-bitmaps in a two-level process. The page bitmaps could potentially be swapped out of the 32-bit linear address space, and mapped-on-demand.

Owen · Post by **Owen** » Mon Sep 10, 2012 4:27 pm

I have to question how you're managing allocated physical memory.

Any practical system needs some structures to store data related with allocated physical memory (even as simple information as the "share count" and "lock count"). If you can't fit 4bytes/page into your address space, what hope do you have for fitting the per used page data, which will likely amount to much more? I think you'd be better off just implementing a 64-bit physical memory manager.

As for a stack requiring 4 bytes per page and locks? No, it doesn't. Stop thinking "contiguous stack" and think "linked list". Atomic allocations can be done trivially using cmpxchg.

Brendan · Post by **Brendan** » Mon Sep 10, 2012 7:46 pm

Hi,

Owen wrote:As for a stack requiring 4 bytes per page and locks? No, it doesn't. Stop thinking "contiguous stack" and think "linked list". Atomic allocations can be done trivially using cmpxchg.

It can be done with a lockless algorithm. This still isn't "perfect" (e.g. there's still contention, fairness and "read for ownership" cache line issues); but most of those problems can be reduced by using many stacks rather than a single global stack.

Stacks can also be done where the "next" field for the linked list is inside the free page itself, and the free pages still don't need to be mapped into any virtual address space. This means that if there's a stack managing 1234 GiB of free pages, it might costs you about 8 bytes of kernel space to manage that stack. If you need to consume 1 MiB of the kernel's virtual address space for each 1 GiB of free pages, then you're probably doing it wrong.

For an algorithm for managing swap space; you never need to send a free pages to swap and therefore you never need any swap management data for free pages. In addition; unless/until you reach some threshold, you don't need any swap management data for individual allocated pages either. For example, you can track when each process was used last, and only begin tracking "least recently used" for individual pages when there's less than 10% of memory left (e.g. if free physical pages plus pages used for file system caches, etc is less than 10% of total pages). Of course this only applies if the computer is configured to use swap space (if there is no swap space, then there's no point keeping track of "least recently used" pages even if less than 10% of memory is left).

For shared memory, you only need data to keep track of for pages that are actually being shared. This is likely to be far less than "all physical pages" and likely to be far less than "all allocated pages".

Finally; allocated pages are always mapped into at least one virtual address space somewhere. For "present" pages, you may be surprised how useful the 3 (or more) "available" bits in the page table entry can be (e.g. they could be used to indicate if the page is part of a shared memory area or copy on write area or whatever). For "not present" pages there's 31 (or more) "available" bits in page table entries (for e.g. you could use 30 of those bits to determine where the data is in swap space, and this would be enough for up to 4 TiB of swap space).

Essentially; for physical memory management you shouldn't need much more than free page stacks at about 8 bytes each (plus something extra for bus mastering/DMA buffers, which is the only case where the physical address of allocated page/s actually matters). For virtual address space management, you only need extra data for things that don't fit in the "available" bits of page tables - lists of virtual address space ranges that are shared, etc.

Cheers,

Brendan

rdos · Post by **rdos** » Mon Sep 10, 2012 11:16 pm

Owen wrote:I have to question how you're managing allocated physical memory.

There is no global list of allocated memory. As soon as a page is allocated, it is no longer recorded.

Owen wrote: Any practical system needs some structures to store data related with allocated physical memory (even as simple information as the "share count" and "lock count"). If you can't fit 4bytes/page into your address space, what hope do you have for fitting the per used page data, which will likely amount to much more? I think you'd be better off just implementing a 64-bit physical memory manager.

I have a single "shared" bit in the page tables to allow page-sharing, and avoiding double-frees. I've also reserved a bit for COW, but other than that, there is no information for allocated memory.

I don't support locking pages, and I also don't support swapping to disk. I do provide a function to free physical memory in low-memory situations, but this is not implemented as lists, rather is "hook-based" in that each device that can free physical memory registers a callback. That way, the memory low recovery function becomes decentralized and there is no need for page locks. Generally, it is mostly disk caches and file caches that can be freed in order to increase free physical memory.

rdos · Post by **rdos** » Tue Sep 11, 2012 1:21 am

Brendan wrote:Stacks can also be done where the "next" field for the linked list is inside the free page itself, and the free pages still don't need to be mapped into any virtual address space.

How do you propose to solve that? By mapping them on demand, and flushing TLB?

Brendan wrote:This means that if there's a stack managing 1234 GiB of free pages, it might costs you about 8 bytes of kernel space to manage that stack. If you need to consume 1 MiB of the kernel's virtual address space for each 1 GiB of free pages, then you're probably doing it wrong.

I prefer the bitmap version, which would use 39,488 page bitmaps at a size of 158MB. Actually, I would probably limit usable physical memory to 256GB, which uses 32MB linear address space for mapping, which I consider the limit.

Brendan · Post by **Brendan** » Tue Sep 11, 2012 3:00 am

Hi,

rdos wrote:
Brendan wrote:Stacks can also be done where the "next" field for the linked list is inside the free page itself, and the free pages still don't need to be mapped into any virtual address space.
How do you propose to solve that? By mapping them on demand, and flushing TLB?

Kernel has a "physical address for the page on the top of the stack" variable. To free a page at a virtual address:

kernel stores the current "top of stack" in the page (which is still mapped)
kernel unmaps the page and flushes the TLB for that page (INVLPG)
kernel sets the "top of stack" to the physical address from the page table

To allocate a page at a virtual address:

kernel maps the "top of stack" page into the page table
kernel flushes the TLB for that page (INVLPG)
kernel gets the "next" field from the (now mapped) page and puts stores it in it's "top of stack" variable for next time

Note: When allocating, the INVLPG is theoretically optional (e.g. lazy TLB invalidation) but in practice it's probably faster to do it (to avoid a potential page fault) than not doing it.

Also, when a task allocates a (virtual) page it's likely that it will write to that page as soon as it's allocated, and a TLB miss is inevitable (it doesn't matter much if the TLB miss occurs in the kernel during alloc or in the task after). If you do "allocate on demand" (e.g. where pages are allocated in the page fault handler because a task is trying to write) then the task is guaranteed to write to the page immediately after the page fault handler returns.

rdos wrote:
Brendan wrote:This means that if there's a stack managing 1234 GiB of free pages, it might costs you about 8 bytes of kernel space to manage that stack. If you need to consume 1 MiB of the kernel's virtual address space for each 1 GiB of free pages, then you're probably doing it wrong.
I prefer the bitmap version, which would use 39,488 page bitmaps at a size of 158MB. Actually, I would probably limit usable physical memory to 256GB, which uses 32MB linear address space for mapping, which I consider the limit.

I don't know why you prefer the (slower, more expensive) bitmap.

rdos wrote:I have a single "shared" bit in the page tables to allow page-sharing, and avoiding double-frees. I've also reserved a bit for COW, but other than that, there is no information for allocated memory.

When you're freeing a virtual page that is marked as "shared"; do you have to search all pages of all virtual address spaces to determine if the corresponding physical page should also be freed?

Cheers,

Brendan

rdos · Post by **rdos** » Tue Sep 11, 2012 5:29 am

Brendan wrote:Kernel has a "physical address for the page on the top of the stack" variable. To free a page at a virtual address:
kernel stores the current "top of stack" in the page (which is still mapped)

kernel unmaps the page and flushes the TLB for that page (INVLPG)

kernel sets the "top of stack" to the physical address from the page table
To allocate a page at a virtual address:
kernel maps the "top of stack" page into the page table

kernel flushes the TLB for that page (INVLPG)

kernel gets the "next" field from the (now mapped) page and puts stores it in it's "top of stack" variable for next time

That's smart, although needs locks in order to work.

The main reason why I don't want to use this is that allocate physical / free physical is not done from page-tables, but are separate functions. Thus, they have no knowledge which page the new address should be mapped into. I don't think I want to redo this to link it with pagetables.

Besides, the main drawback of lists, being unable to detect double-frees cannot be fixed like this.

Also, when allocating / freeing memory on behalf of a 64-bit process, this will need to be done in an 64-bit context, which would duplicate the code. It is far easier to just call the 32-bit kernel and ask it to allocate / free a physical memory page.

Brendan wrote:I don't know why you prefer the (slower, more expensive) bitmap.

Primarily because it allows double-free detection. Additionally, because I don't want to duplicate physical memory handling in 64-bit mode, and because I don't want to connect memory allocation with page table allocation.

Brendan wrote:When you're freeing a virtual page that is marked as "shared"; do you have to search all pages of all virtual address spaces to determine if the corresponding physical page should also be freed?

No, because I don't support this. I do have a memmap API, primarily to allow mapping files into a process address space, but it keeps track of which pages it uses and aliases itself, and thus don't need general kernel support. The preferred method of process communication is IPC, and shared memory is not supported. The shared bit is used for aliasing in kernel, and is used by the alias so that the physical page will not be freed when the alias is freed.

OSDev.org

Physical page allocation

Physical page allocation

Re: Physical page allocation

Re: Physical page allocation

Re: Physical page allocation

Re: Physical page allocation

Re: Physical page allocation

Re: Physical page allocation

Re: Physical page allocation

Re: Physical page allocation

Re: Physical page allocation

Re: Physical page allocation

Re: Physical page allocation

Re: Physical page allocation

Re: Physical page allocation

Re: Physical page allocation