Hi,
The way I do memory management in my OS is by separating it in two layers: Physical mem management and virtual mem management. I've implemented a malloc() function that only works with virtual memory. It reserves a virtual area without allocating it to physical memory. On #PF, the the pages are allocated. I later found out that Linux works pretty much the same way. So that's a good hint for me that I was on the right path. It seems they coined the term "overcommitment" for this behaviour.
So I read a lot of information about this overcommitment strategy. Everyone talks about malloc, but not so much about free.
My #PF handler is rather simple: allocate the page if it is needed. It doesn't validate if the page belongs to a valid reserved virtual buffer. It doesn't check that because I don't want #PF to know anything about how is virtual memory is managed. #pf should be quick and dumb (unless maybe I am on the wrong path with this idea).
My problem is with freeing physical pages. With free(), only virtual memory is freed. But the physical pages will still be allocated. I want to avoid making free() manage physical mem.
So what would be the best way to deallocate that memory? I thought about a garbage collecting system where when the OS requires more physical pages, it would go through the heap of all processes and deallocate all pages that reside in a free virtual area.
As you can image, when such garbage collection kicks in, the system would take a huge hit on performances.
Even if I make free() deallocate physical mem (which I really want to avoid. Separation of concerns),
I get the same problem with the stack. If an application starts and grows its stack down STACK_TOP-4mb and then returns in the range of STACK_TOP-8k, then all those extra pages would stay allocated for nothing. So the "garbage collector" could probably go through all threads and free any allocated pages that reside under that thread's RSP value.
I can't think of any better way of doing this. Is there?
Freeing physical memory
Re: Freeing physical memory
One idea is to run a background task of very low priority that scans for unused physical memory. It's very easy to mark a page as unused (or to decrement its usage count) when you free the virtual memory that references it.
Re: Freeing physical memory
Indeed. So the "garbage collector" becomes a low priority task instead of a burst when needed. This is better idea.
But if free is to mark the page as unused, then it might as well deallocate it right?
So mixing both our ideas, I would get a low priority task that scans all process/threads to find unused pages. The algorithm would stay the same tough: detecting free stack pages by checking the value of RSP, and detecting free heap pages by scanning the heap. But then the garbage collector needs to know how the heap works. I'm wondering if there is something else I didn't think about that could avoid this.
Thanks.
But if free is to mark the page as unused, then it might as well deallocate it right?
So mixing both our ideas, I would get a low priority task that scans all process/threads to find unused pages. The algorithm would stay the same tough: detecting free stack pages by checking the value of RSP, and detecting free heap pages by scanning the heap. But then the garbage collector needs to know how the heap works. I'm wondering if there is something else I didn't think about that could avoid this.
Thanks.
Re: Freeing physical memory
Hi,
Let's start from the start...
You've got "heap" that is like a pool of virtual memory. When you call "malloc()" it grabs some memory from the pool if it can. When you call "free()" it returns the memory back into the pool. None of this has anything to do with the kernel's virtual memory manager. The goal is to minimise kernel API calls. For example, if an application is constantly allocating and freeing things the kernel API may not be used at all. However; if you call "malloc()" and there isn't enough memory left in the pool, then "malloc()" has to increase the size of the pool, and does call the kernel's virtual memory manager. Also, if "free()" decides that there's too much memory in the pool it can return some.
Then you've got a virtual memory manager that uses tricks. One of the tricks is "allocate on demand" - pretending something is allocated when it actually wasn't. If you call "malloc()" and there isn't enough memory left in the pool, then "malloc()" has to increase the size of the pool and calls the kernel's virtual memory manager to "allocate" virtual memory; but the virtual memory manager only marks the area as "in use, allocate on demand" and doesn't allocate physical pages. When a page fault occurs the page fault handler checks if the page is marked as "in use, allocate on demand" and allocates a physical page if it was (and marks the page as "allocated"); otherwise the virtual memory manager knows something has gone wrong (e.g. the application crashed) and does whatever the OS decides to do in that case (terminate the process, send a "SIGSEGV" signal or whatever).
Also, if "free()" decides that there's too much memory in the pool it can return some, and the virtual memory manager frees the physical pages (if any) and marks the pages as "not used" again (so that they're no longer marked as "in use, allocate on demand"). This means that if the program touches these pages the OS knows the program is buggy; even though the program was allowed to touch those pages before.
This on its own is good enough. However, nothing really prevents "free()" from asking the virtual memory manager to convert a page from "allocated" back to the "in use, allocate on demand" state. This means that if you free a large area (return a large area back to the pool) but "free()" doesn't want to reduce the size of the pool, then "free()" can still ask the kernel to free the underlying physical pages.
Note that you can't have some sort of garbage collector in the kernel that scavenges pages from a process' heap/pool itself, because kernel shouldn't know or care if the process actually uses a heap/pool or how its implemented. Kernel could notify the process when its running low on physical memory though (and the process could scavenge/return pages from its pool when it receives this notification) and this would also be a nice feature for other things (e.g. could be used by any software that caches anything - "undo history", HTTP documents, etc).
You'll notice that the page table entries have some "available for OS use" bits. These are the bits that the virtual memory manager uses to keep track of the state of pages (e.g. if they're "in use, allocate on demand" or not). Sadly there's a limited number of these bits (e.g. for "plain 32-bit paging" there's only 3 of them), and there's a lot of things a virtual memory manager would like to use them for; so it takes a little cleverness to figure out how an OS can best use the available bits.
The first piece of cleverness is realising that when a page is not present all the other bits are "available for OS use" (e.g. for "plain 32-bit paging" you can use 31 bits and not just 3 of them whenever a page is not present). For example; when a page is on swap space you could store a 31 bit "page number in swap space" in those bits and that'd work for up to 8 TiB of swap space. Of course you can't actually do that because there are other reasons for a page to be "not present" (e.g. part of a memory mapped file); but maybe you can use 30-bits and support 4 TiB of swap space (where one bit is used for a "swap space or memory mapped file" flag).
So... the first step is to identify all of the different "page states" and sort them into 2 lists - one list for pages that are present, and one list for pages that aren't present. This makes it easy to decide how you're going to use those "available for OS use" bits. The goal is to cram as much functionality as possible into those bits, because you're going to need additional data structure/s to handle whatever doesn't fit (which makes things more complicated and consumes more RAM).
Cheers,
Brendan
Let's start from the start...
You've got "heap" that is like a pool of virtual memory. When you call "malloc()" it grabs some memory from the pool if it can. When you call "free()" it returns the memory back into the pool. None of this has anything to do with the kernel's virtual memory manager. The goal is to minimise kernel API calls. For example, if an application is constantly allocating and freeing things the kernel API may not be used at all. However; if you call "malloc()" and there isn't enough memory left in the pool, then "malloc()" has to increase the size of the pool, and does call the kernel's virtual memory manager. Also, if "free()" decides that there's too much memory in the pool it can return some.
Then you've got a virtual memory manager that uses tricks. One of the tricks is "allocate on demand" - pretending something is allocated when it actually wasn't. If you call "malloc()" and there isn't enough memory left in the pool, then "malloc()" has to increase the size of the pool and calls the kernel's virtual memory manager to "allocate" virtual memory; but the virtual memory manager only marks the area as "in use, allocate on demand" and doesn't allocate physical pages. When a page fault occurs the page fault handler checks if the page is marked as "in use, allocate on demand" and allocates a physical page if it was (and marks the page as "allocated"); otherwise the virtual memory manager knows something has gone wrong (e.g. the application crashed) and does whatever the OS decides to do in that case (terminate the process, send a "SIGSEGV" signal or whatever).
Also, if "free()" decides that there's too much memory in the pool it can return some, and the virtual memory manager frees the physical pages (if any) and marks the pages as "not used" again (so that they're no longer marked as "in use, allocate on demand"). This means that if the program touches these pages the OS knows the program is buggy; even though the program was allowed to touch those pages before.
This on its own is good enough. However, nothing really prevents "free()" from asking the virtual memory manager to convert a page from "allocated" back to the "in use, allocate on demand" state. This means that if you free a large area (return a large area back to the pool) but "free()" doesn't want to reduce the size of the pool, then "free()" can still ask the kernel to free the underlying physical pages.
Note that you can't have some sort of garbage collector in the kernel that scavenges pages from a process' heap/pool itself, because kernel shouldn't know or care if the process actually uses a heap/pool or how its implemented. Kernel could notify the process when its running low on physical memory though (and the process could scavenge/return pages from its pool when it receives this notification) and this would also be a nice feature for other things (e.g. could be used by any software that caches anything - "undo history", HTTP documents, etc).
You'll notice that the page table entries have some "available for OS use" bits. These are the bits that the virtual memory manager uses to keep track of the state of pages (e.g. if they're "in use, allocate on demand" or not). Sadly there's a limited number of these bits (e.g. for "plain 32-bit paging" there's only 3 of them), and there's a lot of things a virtual memory manager would like to use them for; so it takes a little cleverness to figure out how an OS can best use the available bits.
The first piece of cleverness is realising that when a page is not present all the other bits are "available for OS use" (e.g. for "plain 32-bit paging" you can use 31 bits and not just 3 of them whenever a page is not present). For example; when a page is on swap space you could store a 31 bit "page number in swap space" in those bits and that'd work for up to 8 TiB of swap space. Of course you can't actually do that because there are other reasons for a page to be "not present" (e.g. part of a memory mapped file); but maybe you can use 30-bits and support 4 TiB of swap space (where one bit is used for a "swap space or memory mapped file" flag).
So... the first step is to identify all of the different "page states" and sort them into 2 lists - one list for pages that are present, and one list for pages that aren't present. This makes it easy to decide how you're going to use those "available for OS use" bits. The goal is to cram as much functionality as possible into those bits, because you're going to need additional data structure/s to handle whatever doesn't fit (which makes things more complicated and consumes more RAM).
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: Freeing physical memory
Thanks. What you described there is very usefull. What you seem to be saying is that there are 3 layers in your design. Although mine uses only 2 (for different reasons), I can apply some of the notions you said. So I get two good ideas from this thread:
- The kernel will not scavenge pages, it will only inform processes that it would like to get some pages back. It will be the process's duty to do this. So I could have a scavenge() function in my heap manager that would scan the heap for unallocated space.
- Instead of waiting for low-memory, I will do this in a background low-priority task.
By implementing the scavenge() function out of the physical memory manager, I can change the heap implementation anytime without affecting the other layer. The scanvenge() function could be something that would pass a functor and says: "Invoke this function for all free areas that are larger than 4k".
Thanks, you guys were really usefull.
- The kernel will not scavenge pages, it will only inform processes that it would like to get some pages back. It will be the process's duty to do this. So I could have a scavenge() function in my heap manager that would scan the heap for unallocated space.
- Instead of waiting for low-memory, I will do this in a background low-priority task.
By implementing the scavenge() function out of the physical memory manager, I can change the heap implementation anytime without affecting the other layer. The scanvenge() function could be something that would pass a functor and says: "Invoke this function for all free areas that are larger than 4k".
Thanks, you guys were really usefull.