OSDev.org

Posted: **Wed Mar 31, 2021 7:31 pm**

Not present does not imply free. To keep track of allocated addresses you could use a linked list of control blocks, or some sort of balanced tree.

I just put the initial top level page tables at a fixed address directly after the kernel itself.

Posted: **Wed Mar 31, 2021 10:03 pm**

ngx wrote:I was writing my memory manager and finished nearly everything except the allocation method. I have no idea of how I should keep track of which pages are free and which aren't, so I could easily give one on a user request - should I just search through the whole page table tree to find one that has the present bit off and then set the bit to on and map it to a physical address, but if a user needs several pages one after another, should I search for place where enough pages are free - considering that I am in long mode with PML4, it could take tons of time?

No. You use your own data structures to keep track of the allocations, and only write to the page table as necessary. For virtual memory, the normal recommendation is to create a self-balancing range-tree containing all virtual memory areas. Indeed, creating one for every process. In it you keep track of where the nominal allocations are, what files they connect to (at what offset) and what access permissions are set. If someone needs memory, you can walk through the VMAs and find if there is a hole somewhere matching the requirements of the allocation. Then add a new VMA for that address. Note that you don't need to write to the page tables right away when adding a mapping, and when removing or changing a mapping, it is enough to clear out the affected mappings in the system call itself. In the page-fault handler, you would then look up the VMA for the fault address, and either allocate a new page (if mapping is anonymous), enqueue a request to read a file (if mapping is file-backed), or fault the process (in UNIX-like systems: Send SIGSEGV).

For physical memory, you also use your own data structures. Here it is generally alright to only keep track of free pages. One simple implementation would be a stack of free pages. You keep in your kernel a pointer to the first free page. Every free page contains a pointer to the next free page in its first word, with possibly a special "end of chain" marker. When a page is allocated, you remove the first node from the list and return its address. When a page is freed, you add it to the list at the front. Pro: Constant allocation and free time, so long as you are only interested in pages, which will probably be good enough for a start.

Posted: **Thu Apr 01, 2021 12:11 am**

Posted: **Thu Apr 01, 2021 2:00 am**

ngx wrote:I don't really understand, is this like a linked list?

What @nullplan tried to say is, use two separate structures. It doesn't really matter what data structure you use:

There's the PMM (physical memory manager) level, where you keep track of the free RAM. You can use a bitmap, a linked list, a stack based list whatever you like. This is global to all processes.

At the VMM (virtual memory manager) level, you keep track of what pages are mapped in the paging structures, and what parts of the address space are used, free (not allocated but mapped) or free and not mapped. One property (mapped or not) is stored in the paging tables, but not the rest. Again, you're free to choose the data structure to be used. This is different per process.

ngx wrote: If yes, then I would think it would be enormous as each node is at least 128 bits wide(for the page address and the pointer to next node) if we have no metadata like you said and that is only for one process. SO if I understand correctly - I just add the allocated addresses at the end like in a linked list, and then to allocate I just have a variable set to some certain address which I increase with every allocation, and I just search if this address is in the tree and if it isn't I allocate it or otherwise I increase it and search again if it is free? Am I right?

Nope, don't mix allocated address and free pages. You'll need two separate allocation layer at a minimum: a PMM and a VMM. For the PMM the address doesn't matter (you just ask for a physical page and add it to the paging table), handling the address space (and hence mapped addresses) is a job for the VMM. Since both the PMM and VMM should deal with pages, you'll probably need a third level too, libc malloc, which can allocate arbitrary sized memory and returns virtual addresses to your applications.

Take a look at the big picture.

Cheers,
bzt

Posted: **Thu Apr 01, 2021 2:38 am**

I don't have lists of free pages. The present bit in the page table will tell if a page is in use, and there are a few more bits that can be used for aliasing, second-chance and copy-on-write. When the present bit is not set, there are 31 (or 63) bits free for house keeping. I reserve bit 1 for "allocated" but not yet referenced. This is the way I allocate memory in the application by scanning for pages where bit 1 & 0 are both 0 and then set bit 1 to 1. When the application tries to use it the pagefault handler will see that it is reserved and allocate a physical page, set the present bit and re-execute the instruction. For images, I use selector:offset in the page table that will be called by the pagefault handler. The PE loader uses this so it can dynamically load pages in the executable on demand.

The only thing that this model cannot handle is fork. When I added fork I needed to create lists of pages & reference counts for forked processes, but if you don't support fork, you really don't need this.

Posted: **Thu Apr 01, 2021 3:31 am**

rdos wrote:I don't have lists of free pages. The present bit in the page table will tell if a page is in use

And how do you record the RAM used by the page table itself? You need to keep track of free RAM in the system independently to page tables. The first is global (you have only one RAM), and the second changes on each task switch (you have many page tables, one for each process).

Cheers,
bzt

Posted: **Thu Apr 01, 2021 6:21 am**

Posted: **Thu Apr 01, 2021 6:25 am**

Posted: **Thu Apr 01, 2021 7:13 am**

ngx wrote:I could track which pages are mapped/not mapped using the present flag on the page. The problem I have is that I can't think of a data structure that could be used to know which virtual memory pages are free(by free I mean not mapped and not allocated)

You said you know which pages are mapped or not. Then this is simple, as unmapped pages must be unallocated too.

ngx wrote:and which are used(allocated and mapped/not mapped) so I could allocate them on a request or map them to MMIO...

The request in this case would be a page fault, meaning you'll know exactly where to map the new page (CR2). On exit(), you'll have to walk through the page tables, and free all physical addresses (recorded in the tables as well as the physical address of the table itself). Not sure what this has to do with MMIO. You just map those as you would map any free memory (maybe with different bits set in the table, but that's all). It doesn't matter where the MMIO appears in the address space.

ngx wrote:There could be a linked list of all allocated pages, but searching through it every time to check if some page is free would take loads of time and with more pages being allocated it would take longer and longer

Actually this is exactly what dlmalloc does for example. With one little modification, it records what's not allocated. When you free a memory area, dlmalloc puts a linked list pointer in that memory area, which points to the next free memory block. It is on you to choose the correct algorithm that doesn't need to iterate through the entire chain in order to work correctly. Such algorithms do exist.

ngx wrote:stack would not work here and a bitmap would take several million of GB(and also I don't really like bitmaps bitmaps), so I would really appreciate advice on the data structure to use and how to use it in this context.

For PMM, keeping track of free pages I believe bitmap is still the best. This is RAM size / page size / 8 bytes, for example with 4k pages and 16G RAM would require 512K of memory, definitely less than gigabytes. Other alternatives buddy and SLAB.
For VMM, you can use the page tables to figure out which memory is mapped and which isn't. You can use the available bits too. Alternatives are RLE encoded stacks. The Linux kernel calls these struct vm_area_struct, also stores protection bits in them and the Linux kernel has a list of this struct for each and every address space.
For allocated memory, take a look at dlmalloc (or other simple memory allocator's) source to see how they use the linked list without iterating on the entire chain. Here I really don't like to say any alternative data structure, because literally all allocators use a different one with a different approach.

Cheers,
bzt

Posted: **Thu Apr 01, 2021 8:30 am**

Posted: **Thu Apr 01, 2021 10:17 am**

rdos wrote:I don't have lists of free pages. The present bit in the page table will tell if a page is in use, and there are a few more bits that can be used for aliasing, second-chance and copy-on-write. When the present bit is not set, there are 31 (or 63) bits free for house keeping. I reserve bit 1 for "allocated" but not yet referenced. This is the way I allocate memory in the application by scanning for pages where bit 1 & 0 are both 0 and then set bit 1 to 1. When the application tries to use it the pagefault handler will see that it is reserved and allocate a physical page, set the present bit and re-execute the instruction. For images, I use selector:offset in the page table that will be called by the pagefault handler. The PE loader uses this so it can dynamically load pages in the executable on demand.

So if a program asks your OS for more memory, you iterate over the page tables to find a free spot in VM space to put the new memory? Seems inefficient, especially with address space as vast as it is on x86_64.

I have a binary range tree. If someone tries to allocate memory, I will start an in-order walk of that tree to find a hole big enough. Since VMM ranges can be arbitrary size, I will not have to descend too deeply.

rdos wrote:The only thing that this model cannot handle is fork. When I added fork I needed to create lists of pages & reference counts for forked processes, but if you don't support fork, you really don't need this.

I don't anticipate supporting fork() myself, I think it is a bad way to start a new process. All of its use cases can be handled by either pthread_create() or posix_spawn(). Well, almost all. But I am surprised to hear you of all people proclaim your support for fork().

But since I do intend to allow shared memory, I shall have to implement reference counting for mapped files at least. By disallowing fork(), it becomes impossible to share anonymous mappings across processes, so they can be used at most once.

ngx wrote:The problem I have is that I can't think of a data structure that could be used to know which virtual memory pages are free(by free I mean not mapped and not allocated) and which are used(allocated and mapped/not mapped) so I could allocate them on a request or map them to MMIO...

Binary tree. Self-balancing binary range tree. You will have to do some searching. The red-black tree seems to be favored among OS enthusiasts, though I have always been partial to the Anderson tree myself. The "range" part of that description means that every tree node has a range of keys instead of just one key. In this case, all nodes are keyed by their address and length. When handling page faults, the correct VMA can be quickly identified by just walking the tree, and even failure can be handled quickly. And once you have the VMA, you can quickly identify if it just needs a physical page to back it, or if the page fault was spurious (apparently it can sometimes happen that you get a page-fault although nothing was wrong, and just returning from the #PF handler makes the problem go away). Or if it really was a bad fault and you need to reprimand the process in whatever way you see fit.There could be a linked list of all allocated pages, but searching through it every time to check if some page is free would take loads of time There could be a linked list of all allocated pages, but searching through it every time to check if some page is free would take loads of time and with more pages being allocated it would take longer and longer, stack would not work here and a bitmap would take several million of GB(and also I don't really like bitmaps bitmaps), so I would really appreciate advice on the data structure to use and how to use it in this context.and with more pages being allocated it would take longer and longer, stack would not work here and a bitmap would take several million of GB(and also I don't really like bitmaps bitmaps), so I would really appreciate advice on the data structure to use and how to use it in this context.

For allocation, you only need to do an in-order walk of the tree to see if there is any hole in virtual memory large enough to accommodate the request. The walk must be In-order so you get all VMAs in sorted order.

ngx wrote:There could be a linked list of all allocated pages, but searching through it every time to check if some page is free would take loads of time and with more pages being allocated it would take longer and longer, stack would not work here and a bitmap would take several million of GB(and also I don't really like bitmaps bitmaps), so I would really appreciate advice on the data structure to use and how to use it in this context.

Stack and bitmap are more for physical memory, anyway.

ngx wrote:So I would have a pointer to the start of the heap(4096 aligned) right after the end of the executable and when a user requests for a page I would just check if the pointer points to a free page and if yes mark it as used and return or otherwise move forwards until a free page is found and do the same thing, if multiple pages are required then there will be a place found where all of them fit one after another. If a page is freed then it's present bit will be removed and it will become free and will be added to the linked list. The linked list of free pages will be used by the allocator to allocate a page if the user only requests for one page, otherwise the pages will be allocated starting from the heap pointer.

The problem with iterating over the page tables rather than the VMAs is that this way, allocation becomes linear in the number of pages rather than the number of VMAs. To illustrate the difference: One firefox instance on my system right now has 567 VMAs, but a virtual size of 642410 pages (if all of them are small pages, but I think they are).

Posted: **Thu Apr 01, 2021 12:47 pm**

Posted: **Thu Apr 01, 2021 12:52 pm**

ngx wrote:Ok thanks. I thought of an algorithm and would be grateful if you could point out some things that I could change in it:
So I would have a pointer to the start of the heap(4096 aligned) right after the end of the executable and when a user requests for a page I would just check if the pointer points to a free page and if yes mark it as used and return or otherwise move forwards until a free page is found and do the same thing, if multiple pages are required then there will be a place found where all of them fit one after another. If a page is freed then it's present bit will be removed and it will become free and will be added to the linked list. The linked list of free pages will be used by the allocator to allocate a page if the user only requests for one page, otherwise the pages will be allocated starting from the heap pointer.

I still don't get it as you're mixing free pages with allocation. You shouldn't. So "free page" only exists in context of physical memory allocator, but not on other levels. At VMM, there's no free page, just mapped ones. At libc malloc level, you have allocated and free areas of arbitrary size (not page sized). If the application frees memory, then it is up to the allocator if the mapped pages are returned to the PMM or not. With dlmalloc, you cannot return the pages because you have free memory pointers on them. On the other hand if the application allocates again, there's no need to call the VMM and the PMM (which can be slow), because pages are already mapped into the address space, and the allocator does not need extra memory to keep track of memory because pointers are simply stored in free memory areas.

Here's how it's supposed to go:
1. at start, heap points to the end of the executable
2. the application requests (let's say) 256 bytes of memory
3. your allocator tries to allocate that on the heap, but since nothing is mapped after the executable, writing the end of free memory pointer causes a page fault
4. the page fault handler calls the PMM for a free physical page, and maps it to the address space after the executable
5. the execution is returned to the allocator, which knows nothing about the memory mapping
6. the allocator sets the free pointer to heap + 256 and returns the previous pointer (heap)
(Alternatively after step 2 the allocator could have checked that the end of the address space is at the end of the executable, so it could have called sbrk/mmap to map more memory and not relying on the page fault handler.)

Let's say the application allocates another 512 bytes:
1. the allocator adds 512 to the current free pointer (which is heap + 256), and returns the previous pointer (heap + 256)
2. there's no mapping involved, because page is already mapped after the executable

Now let's say the application frees the first 256 bytes:
1. the allocator places the current free pointer (heap + 256 + 512) at the area being freed
2. points the current free memory pointer to that area
3. page isn't unmapped, it's still mapped in the application's address space
Now memory looks like this: after the executable, there's a 256 bytes free memory, with a pointer to heap + 256 + 512. The next 512 bytes (not included in the free memory list) is allocated. The list's tail points to heap + 256 + 512, meaning there's no more free memory holes, just at the end of the heap.
If the application tries to allocate memory again, then the allocator would return the pointer in the first area (if that's big enough), and would set the current free pointer to the pointer in that area. If the allocation is bigger then the free area, then the allocator would allocate at the end, just like it did with the 512 bytes allocation.

The simple solution (walk the free list from the chain's head to the tail until you find a big enough area or add the allocation to the tail if not found) is O(n), the more areas you have, the slower it became. However you could keep track of the sizes in multiple lists instead, and then you'd only have to choose the list with big enough areas, or use the tail. This could be O(1) (because you always use the first entry in the list, no walking through the chain). So instead

Code: Select all

 head -> block at 0 size 256 -> block at 768 end of chain

You could have

Code: Select all

head[size 8] -> end
head[size 16] -> end
head[size 32] -> end
...
head[size 256] -> block at 0 -> end
head[size 512] -> end
...
overall tail -> block at 768 end of chain

What data structure and algorithm you use is totally up to you. There's no good solution here, only compromises: whether you prefer fast alloc and slower free or not so fast alloc and not so slow free; use more memory but faster alloc or less memory but slower alloc etc. It all depends on your priorities and design decisions, what's the best for your design.

Cheers,
bzt

Posted: **Thu Apr 01, 2021 2:04 pm**

bzt wrote:
rdos wrote:I don't have lists of free pages. The present bit in the page table will tell if a page is in use
And how do you record the RAM used by the page table itself?

Those are allocated automatically by the pagefault handler. When the process terminates, these are freed again automatically.

bzt wrote: You need to keep track of free RAM in the system independently to page tables. The first is global (you have only one RAM), and the second changes on each task switch (you have many page tables, one for each process).

Physical memory is handled using bitmaps.

Process memory (user part) is handled by scanning page tables when 4k aligned areas are requested, and by the compiler's libc when it comes to byte aligned (like malloc and new).

Shared kernel memory (same in all address spaces) has a 4k pool (allocation by scanning page tables), and a byte aligned pool that uses control blocks just like OpenWatcom's libc does for byte alligned allocation.

OSDev.org

How do I keep track of which pages are free/used?

How do I keep track of which pages are free/used?

Re: How do I keep track of which pages are free/used?

Re: How do I keep track of which pages are free/used?

Re: How do I keep track of which pages are free/used?

Re: How do I keep track of which pages are free/used?

Re: How do I keep track of which pages are free/used?

Re: How do I keep track of which pages are free/used?

Re: How do I keep track of which pages are free/used?

Re: How do I keep track of which pages are free/used?

Re: How do I keep track of which pages are free/used?

Re: How do I keep track of which pages are free/used?

Re: How do I keep track of which pages are free/used?

Re: How do I keep track of which pages are free/used?

Re: How do I keep track of which pages are free/used?

Re: How do I keep track of which pages are free/used?