How to handle virtual memory allocation?

mrvn · Post by **mrvn** » Thu Mar 13, 2008 3:49 am

Hi,

I'm at a point in my kernel where I have to seriously work out how to handle my memory and there don't seem to be good information on how to do this.

First let me explain my basic design because a solution that doesn't fit won't be of much use to me. My memory layout looks currently like this:

0x000000 unused
0x100000 kernel image
0x130000 end of kernel
begin of usable physical ram for page tables
0x???FFF end of physical ram

0x000020000000 begin of kernel virtual ram
0x000100000000 begin of user virtual ram
0x7FFFFFFFFFFF end usable virtual ram

As you can see I work with a 64bit cpu and I use a 1:1 mapping of physical to virtual to manage page tables. I have a stack based handling of physical pages and can map pages to virtual addresses as I like. I have 2 memory pools for kernel and user. I want to use a flat address space and be able to share pages between processes for IPC.

Now the problem:

For each process I need to know what pages it may read/write/execute. Specifically given an address what rights does the process have?
I need to overall know what addresses are free so allocation between processes never overlap.
I need to quickly find a large enough free region to contain a allocation request.
I need to free a region of memory from a processes address space, see if any other process still has pages in that region and mark all totaly unused pages as free again.

I obviously can't use a bitmap to mark free regions. With 64bit address space that is just way to big. I also can't use a linked list of regions. Finding an entry in the list would be far too slow.

The only viable standard solution for memory management seems to be the buddy system. But to allocate a 4K page I would end up with 35 splits of my topmost buddy. Looking up a specific address would result in following up to 35 links which would be rather slow.

So what else is out there?

MfG
Mrvn

bewing · Post by **bewing** » Thu Mar 13, 2008 8:47 am

I am close to the same stage as you, so I can't give you info with any hindsight. Many of the others here can.

But with allocatable resources like this, in general, you have a tradeoff between speed of allocation, and overhead on "garbage collection" -- ie. recovering/reusing resources after they have been freed.

With such a huge potential availability of VM pages, I would think you would need a mixture of a few different kinds of allocation schemes. I think I'd probably suggest a combo of "stack based" and regions.

A stack based page allocation scheme puts a big armload of free pages, one at a time, on a big stack. When a process wants one, it pops it off the top of the stack. On a free, the system pushes the freed page back on the stack. It is fast for allocate and free operations.

Obviously you don't want to do this with umpty billion single free pages.
So, almost all of your free pages would be stored as contiguous free regions. You only put a relatively small number of pages out of one region on the stack, when needed. If "too many" pages get freed onto the stack, then you can remap any remaining allocated pages in order to free up a region (to turn it back into a contiguous free region), and put it back on the free region list. But that particular process will be quite slow -- you would need to locate pages in use (a process of elimination?), find the process that is using that page (a list search?), and modify its page tables to point to a remapped page.

mrvn · Post by **mrvn** » Fri Mar 14, 2008 2:23 am

Problem with a stack of pages are allocations that aren't page sized. I actualy expect a fair number of page sized (or a few at most) allocations for IPC messages and then large requests for heap allocations. I will provide some library functions that will do malloc/free of small objects by allocating say 1MB from the kernel and then split it up in user space.

I do use a stack to manage my physical pages because they can be combined into larger virtual regions at will. No need to get continious physical pages. But for virtual addresses that won't work.

MfG
Mrvn

Krox · Post by **Krox** » Thu Mar 27, 2008 5:35 pm

not sure if the original topic is solved but I'd like to present my implementation of virtual memory, cause I'm in pure 64Bit, too. I didnt want to start a new thread. Comments are welcome

I split the virtual memory into "segments" a 1 GiB (practical size, cause thats exactly one PageMap Level2). The used size of that can be smaller of cause, but owner/permissions and stuff like that is per segment. Furthermore the virtual address space is identical for every every task, except for the very first segment, which contains the individual code-section, but even that is mapped a second time into some higher region for simple access by the kernel.

When starting a task, it gets one code-segment and one heap-segment, but it is free to request more of them. Probably it will need one for its heap and one for every shared memory with another task.

advantages of this layout:

pointers are unambiguous, so pointers into the individual task-heap or even the stack can be passed to another task (only permissions have to be adjusted)

there is no real difference between private and shared memory, only the length of the "permission-list"

the page-fault handler is simple: the upper 18 bits (remember addresses are 48 bits only) of the addr is kind of "segment ID", so its easy to check the current tasks permissions for that segment

resizing memory-segments is simple (but limited to 1 GiB). I think thats especially nice for memory-mapped files, which will be standard in my OS)

before I forget... physical memory is also mirror-mapped into one segment. Yes, ATM that limits me to 1 GiB RAM, otherwise i whould have to get a continous block of segments... well, that breaks the idea a little bit, but should not be a problem.

For the thing with not-page-size allocations: The library will have to do that. The CPU provides memory-protection and fragmentation on a page basis, so we only can use that for the Alloc-Syscall. I would furthermore suggest, that the library splits the malloc function into two cases: for "small" stuff, it get chunks out of its heap, but for "big" (> 1 or 2 Pages) stuff, it gets new memory from the OS (rounded up to whole pages). That way, heap-fragmentation is minimized (freeing a "big" Chunk immideantly returns it to the OS). I think, Linux uses that method (the "big" malloc is called nmap)

mrvn · Post by **mrvn** » Sat Mar 29, 2008 12:04 pm

Having to share a full 1GiB segment for each large IPC is not really practical. And you are still left with 2^27 segments to manage. Although I guess a 16MB bitmap might be managable.

MfG
Goswin

piranha · Post by **piranha** » Sat Mar 29, 2008 6:26 pm

there don't seem to be good information on how to do this.

Have you looked at all?

Have a look at JamesM's tutorials, they're great. Although there is a small resizing bug in there...
http://www.jamesmolloy.co.uk/tutorial_html/index.html

-JL

Krox · Post by **Krox** » Sun Mar 30, 2008 3:49 am

mrvn wrote:Having to share a full 1GiB segment for each large IPC is not really practical. And you are still left with 2^27 segments to manage. Although I guess a 16MB bitmap might be managable.

Actually its only 2^18 segments (remember virtual addresses are 48bit in longmode). Thats a bitmap of 32KiB. And why is sharing 1GiB not practical? Thats only virtual memory, and in longmode, you've got plenty of that. Thats the main reason I have chosen AMD64-only in the first place

(right, its not portable back to 32Bit)

einsteinjunior · Post by **einsteinjunior** » Mon Mar 31, 2008 5:09 am

I also had the same difficulties until someone helped me out here.
Please read this tutorial,they helped me a lot.
http://www.osdever.net/tutorials/pdf/memory1.pdf
http://www.osdever.net/tutorials/pdf/memory2.pdf

mrvn · Post by **mrvn** » Tue Apr 01, 2008 6:10 am

Krox wrote:
mrvn wrote:Having to share a full 1GiB segment for each large IPC is not really practical. And you are still left with 2^27 segments to manage. Although I guess a 16MB bitmap might be managable.
Actually its only 2^18 segments (remember virtual addresses are 48bit in longmode). Thats a bitmap of 32KiB. And why is sharing 1GiB not practical? Thats only virtual memory, and in longmode, you've got plenty of that. Thats the main reason I have chosen AMD64-only in the first place (right, its not portable back to 32Bit)

Right, I was thinking 2MB chunks.

But lets say I use 1GiB chunks. Every process will need at least 1 segment for the message box for IPC. That limits me to 2^18 processes. Another segment for stdin/out/err, pipe and tty buffers if I put them into a single shared segment (which risks leaking information between file descriptors). That limits me to 2^17 processes. And that is without processes dynamically allocating segments to talk to each other or different drivers, e.g. filesystems and networking. Lets say 6 more segments per process. That brings me down to 2^15 processes before running out of segments. Ups, xen also uses half the address space for its own: 2^14 processes.

Not practical.

MfG
Mrvn

mrvn · Post by **mrvn** » Tue Apr 01, 2008 7:18 am

einsteinjunior wrote:I also had the same difficulties until someone helped me out here.
Please read this tutorial,they helped me a lot.
http://www.osdever.net/tutorials/pdf/memory1.pdf
http://www.osdever.net/tutorials/pdf/memory2.pdf

Which is absolutely useless for the question at hand.

As said I already have my physical memory allocator done as a stack of pages and all its support code. That is memory1.pdf out the window.

As for memory2.pdf that is slightly more on topic. But it only says what to do. Not how to do it well. The relevant part for my question is actualy just one sentence:

For now it's enough to record all allocations taking place in a list.

Which, and I hope everyone agrees, is purely a first try, give me anything that works approach and will not work in real life. Looking up or changing the page permissions for a page fault will become horribly expensive as chunks of memory are allocated by processes.
Anyone suggesting a list for a random access deserves to be shot.

Unfortunately he doesn't mention what to replace the list with to do it properly. Where is the memory3.pdf?

MfG
Mrvn

Krox · Post by **Krox** » Tue Apr 01, 2008 9:17 am

mrvn wrote:2^14 processes

Yes. Thats the theoretical limit. But I doubt thats relevant in normal home computers. Physical RAM isnt big enough to hold 16.000 Tasks. Aand we are not talking about tiny tasks which only needs a codesegment plus IPC here. Have you ever seen more than a few hundred tasks on your PC? I dont think so. But anyway, you may be right on big servers and supercomputers (but only if those also run with 48Bit virtual addresses, too), or when you you have an extremly modular system. So, okay, you are right, but at the moment, I dont mind having that limit.

btw: my msg-passing currently works without segments (that doesnt really hurt your calculation though). There are simply SendMsg/GetMsg Syscalls. The latter puts the msg either in a user-provided struct (pointer as GetMsg-Param) or directly into registers (according to the AMD64-standard, 6 regs are for param-passing, and its logical to use those here too. 6 is a litte bit limited, I know. But enough for sender/receiver/time/msgID + 2 params)... but now i get offtopic

To get back too your original topic. The Buddy-System is the only standard I know about, too. To minimize the number of sptlittings you could modify it in a way, that you split every Chunk in 4 parts, and not 2. If you put the lower bound to 16k and the upper to 4 GiG you end up with only 10 splits (the actual numbers 4/16K/4G are only spontaneous guesses). The only wasted thing here is virtual memory... and in longmode you've got plenty of it.

greetz
Krox