Seriously simple VMM design I came up yesterday

mystran · Post by **mystran** » Tue May 09, 2006 4:52 am

First of all, sorry for the seriously long post (the long poster is back, please panic). I hope that the text is relatively easy to read though.

Ok, this thing assumes there's some IPC facility, because it's going to be built on top of microkernel, but same thing could be used for macrokernels, in which case the whole thing would be even more simple.

I'm going to implement this hopefully today, but thought I'd throw it here, so people can review if it has some stupid flaws.

Terminology I gonna use:
- frame: about 4k chunk of physical memory
- page: about 4k chunk of virtual memory
- object:, something that can be sent messages
- (page)directory: "hardware" table from virtual to physical

Design is (basicly) this:

1. There's big array with one entry per frame. Each entry has a map_count and an object*.

2. Each "present" directory entry points to a frame. Obviously.
Each "not present" directory entry contains an object*.

3. The magical object is the ethernal representation of a page. This object knows if the page is resident on some frame. It also knows how many times the page is mapped. Finally, it knows whether the resident page is dirty, and knows where it has backing store.

So starting from virtual address, we have either pointer to object, or a frame number, and the object knows how to give us a frame with the page, and the frame array tells us what page is on a given frame.

The object knows how many times it's mapped (that is, virtual mappings) and a frame knows how many time it's mapped (that is physical mapping). There's no way to figure out WHERE a given page (or frame) is mapped, though.

So there's a clock-collector, which has access to a process list, where processes are popped to top of list each time they either visit a CPU, or they have been scanned. The clock then just scans through the page directories one process at a time, starting from the process that's on the bottom of the list (that is, longest time not run and not scanned).

For each accessed page, just reset access.
For each unaccessed page, notify page object if it's dirty, and unmap the frame. Normal clock policy. For added bonus, one can claim the process as "collected" after a given number of frames have been unmapped, to avoid stealing all pages at once.

Once a frame has zero mapping count, it is either freed (if it's clean) or sent to backing store if it's dirty (and free once written, if nobody hasn't needed it yet).

Shared memory can be done simply by mapping the same page object in several address spaces (or even several places in one address space) and copy-on-write is trivially done by having the object know it's a copy of something else.

Since everything is done with single pages/frames, there is no complicated datastructures or anything like that to care about. Once the page has been unmapped (mapcount zero) the page object can be destroyed.

If backing store is in userspace, page object simply need to message it with the detail of "read me this" and "write this" and maybe "I don't need you anymore".

As long as the page is on some frame, the same frame can be mapped elsewhere. As long as the frame is mapped somewhere, we never write it to store. So there is (slow) userspace traffic only when the operation would be slow anyway, namely, loading page from store, and writing page to store.

Finally, demand loading needs page objects created when a file is mapped, but then proceeds exactly as normal page in.

Disadvantages:
1. Microkernel purists won't like the central paging policy.

2. Doing everything for each pages consumes some memory for each frame, and some memory for each page. Some memory will be wasted anyway, and I think a few dozen bytes for each virtual page is ok. (I gonna keep them in non-pageable kernel heap for now, possibly moving them elsewhere if it seems it's a problem).

3. There's no easy way to unmap a specific frame, other than scanning directories until nobody has it mapped. I don't think I want to support that anyway.

----

The big question now, is why haven't I done this before?

(Somebody here probably tells me in a minute

)

PS: has the post length limit finally been raised?

Solar · Post by **Solar** » Tue May 09, 2006 6:01 am

Bookkeeping overhead for 32-bit machines: ( bytes used per 4k page ) * 1 MByte.

And the costs in terms of clock cycles sounds prohibitive...

...but don't listen to me, I've never written a VMM yet.

mystran · Post by **mystran** » Tue May 09, 2006 7:10 am

There's fixed 8bytes of overhead per 4k of physical memory. No need to have a naive array, so only waste memory for frames that actually exist.

Then there's fixed overhead of about 24 bytes per 4k virtual page that is mapped somewhere at least once, no need to keep objects for all "possible" virtual memory, since that isn't even bounded. This is a bit worse. In worst case I have to throw these into virtual memory (from unpageable kernel heap), but that ain't very hard to do.

But about clock cycles, if I cheat around the messaging for the fast-cases (like map a page that's already resident), then most operations are IO bound or fast O(1).

Of the rest, unmapping a specific physical frame is prohibitively expensive, but I'm not going to support that. Supporting it would mean tons of extra CPU and space overhead.

Mapping/Unmapping virtual pages to address space is O(n) in number of pages, but I think that's ok.

The collector's overhead is to be ignored, because it's going to run when system is either idle, or low on memory, and in latter case we gonna do some (slow) IO anyway.

distantvoices · Post by **distantvoices** » Tue May 09, 2006 7:31 am

Oh you 're not the only one cursing about too little space in one post to rant off at satisfying lenght - and at once and without the risk to lose all the typed stuff after having pushed 'send'

*gg*

BTW: why don't you merely keep the list of available pages (sorta stack) already filled in with the physical addresses so, if you need to hand out fresh memory: pop it off that stack and attach the dealt-out entities to whatever list you want to. I currently keep a global list of dealt out pages with page directory as additional property. That approach is a bit search intensive (need to find all the pages belonging to the process - that takes time if the list is lenghty - gonna revise it). Shared Memory entities are grouped in regions which any process can attach to.

Yeah ... just my 2 ?-cent.

stay safe

mystran · Post by **mystran** » Tue May 09, 2006 8:19 am

beyond infinity wrote: Oh you 're not the only one cursing about too little space in one post to rant off at satisfying lenght - and at once and without the risk to lose all the typed stuff after having pushed 'send'

If you refering to the disadvantages section, then that was a kind of a stupid joke really, because there's gonna be overhead anyway. I'm quite happy with the amount of memory I'm currently going to waste.

BTW: why don't you merely keep the list of available pages (sorta stack) already filled in with the physical addresses so, if you need to hand out fresh memory: pop it off that stack and attach the dealt-out entities to whatever list you want to. I currently keep a global list of dealt out pages with page directory as additional property. That approach is a bit search intensive (need to find all the pages belonging to the process - that takes time if the list is lenghty - gonna revise it). Shared Memory entities are grouped in regions which any process can attach to.

Hmmh. I'm using stack for free physical memory, yes, but as it is, I don't think I care where my physical memory is if it's in use. I deal with other problems (ISA DMA?) separately.

I also don't care who is using a certain virtual page. If it's using it, it probably has the right to do so. So all I need is really a mapping from address space to virtual page. The virtual page might be shared, or whatever, and is represented as a small object, the nature of which is unimportant really.

Since hardware needs to know physical addresses for present pages, I can keep the pointer to the virtual page object in page directory only when the page is not present, but when it's present, I can use the physical address as index into a array that keeps those pointers to virtual pages that I can't keep in page directory directly.

I don't even keep track of the objects other than by the pointers. I reference count them (could just as well garbage collect, but little point since there's no cycles anyway) and free them once the last reference goes away.

So, basicly, what do I need a list of dealt-out pages? All the pages that are referenced somewhere. Same for virtual pages and physical frames really. Can't iterate them easily, but why would I want to. Except to steal memory, ofcourse, but that's solved if we steal from least-recently-run process first, because...

All the pages dealt out to a single process? Page directory is there already, why bother duplicate the information. Slow to go through, yeah, but hardly a problem for stealing.

-----

So a summary of the whole scheme is:

Page directory contains pointers to virtual pages (or physical frames which can be used to find out the virtual page) and that's it. We reference count enough to free stuff to be freed, and virtual page object deals with reading/writing backing store.

Then there's some nasty worm that goes through the processes when it's allowed to, and steals physical memory, so that later those processes need to fault, and ask the virtual page object for new address for the relevant page.

Backing store can be pretty much anything anywhere around the globe (with a local proxy, naturally).

So basicly, I got a very simple VMM model that allows userspace paging, but with just enough logic in kernel to avoid going to userspace for anything else but IO. And since kernel does global page stealing, there's no need for some ugly, complicated, distributed policy management. Finally, stealing pages in kernel means they go back to kernel's list of free pages, so kernel can deal with it's own memory requirements as well.

mystran · Post by **mystran** » Tue May 09, 2006 4:22 pm

Fortunately I have Firefox in work by default and recommended.
Oh, and Thunderbird is there by default (and recommended for mail) too.

Not that anyone would care if I install anything else on my workstation though, at least as long as it's legal. It's not like I needed Dev-C++ or Maxima for my work. Or X-Chat for that matter.

<edit who="Pype"> The interleaved thread about Firefox portable and the like is now in a thread of its own. </edit>

mystran · Post by **mystran** » Tue May 09, 2006 4:59 pm

Ok, while implementing this thing, I figured there's little point to use messaging at all between the faulting thread and the page object. It'd just add complexity, when the thread can do little but wait for the page.

So it seems I only need a list of threads waiting for a specific vpage. Single pointer in vpage should be enough.

I'm using names vpage (for virtual) and ppage (for physical) in kernel (didn't want to make the internal naming convention more confusing than it is already).

So my pagefault handler now goes like this:

1. Fault
2. Check page directory for vpage reference, or if it's write fault on read-only, check ppage descriptor for vpage reference.
3. Map if possible, do COW if possible, else request page from store and go to queue.
4. When store gives the page, it'll fix the vpage object with the ppage number, and free any waiting threads.

Remaining problem is how to map from store's reply to vpage object when store is "untrusted" userspace code, but I think I'm going to solve that problem later.

Throwing some bits around, memory usage is currently:

per ppage: vpageref and counter; 2*4 bytes = 8bytes
per vpage: ppageref, counter, storeref, storeoffset and waitlist; 5*4bytes = 20bytes.

Oh, and some amortized overhead for the fact that holding these is going to require some mapped memory, which needs to have a page table, but I'm going to ignore that.

storeoffset is there so I don't need a separate store-object for each vpage

OSDev.org

Seriously simple VMM design I came up yesterday

Seriously simple VMM design I came up yesterday

Re:Seriously simple VMM design I came up yesterday

Re:Seriously simple VMM design I came up yesterday

Re:Seriously simple VMM design I came up yesterday

Re:Seriously simple VMM design I came up yesterday

Re:Seriously simple VMM design I came up yesterday

Re:Seriously simple VMM design I came up yesterday