Page 1 of 1

General questions regarding where a memory manager belongs

Posted: Fri Nov 14, 2014 4:43 pm
by Sanchezman
I've planned out how I want my memory manager to work, but I have a few questions about best practices regarding where I should put it.

My general idea for a basic memory manager is that each block will have a header containing :

Status (ie, free/allocated)
Size of block
Pointer to next block
Pointer to previous block

Blocks will be split/combined as needs dictate. However, I'm wondering where I should put the code for the initialization of my memory manager. I'm thinking that the kernel should check the GRUB multiboot header to learn where all the initial blocks are and create the initial headers. Obviously my C library will have malloc() and such to allocate the blocks and stuff, but if my memory manager exists in the kernel, how should I get malloc() to comunicate with it? Shouldn't my C library be as kernel independent as possible? Is it 'best practice' to add something like malloc_init() function that takes over from the memory manager once it's done reading the multiboot header?

Re: General questions regarding where a memory manager belon

Posted: Fri Nov 14, 2014 6:04 pm
by KemyLand
Sanchezman wrote:My general idea for a basic memory manager is that each block will have a header containing :

Status (ie, free/allocated)
Size of block
Pointer to next block
Pointer to previous block
Its your OS, its your decision. Just try to use algorithms that don't depend on a "pointer to X block". You can simply add the base address of the structure, the size of the structure, and the "size" field (then cast back to MyFoo*...) and you'll get the base address for the next block. This is handled by m/kalloc(), which is really a algorithm wrapped around a page frame allocator. It's completely different in kernel-land and user-land due to paging.

Both have one thing in common: high-level optimizations. Both tend to use a single bit for free/allocated, and the rest of the word for the block size. Thus in a 32-bit machine this model can malloc() objects of up to 2GB. Did you wanted a 3GB array? I'm sorry, but its more efficient. If you use 5 bytes (4 for size, 1 for status), you could have several algorithmic inefficiencies if allocating small sized objects (i.e. a 4-byte int) in large cuantities. So try to keep the header as small as posible :D. If you won't allocate nothing bigger than 32KB, use 16 bits for both size and status. Also, free() does something you shouldn't pass out by a single 1GHz tick (bad joke...), joining freed blocks.

In user-land, your malloc() function will run some algorithms to get the next block in the heap. If there's no more space, a system call occurs and the kernel maps a new page at the end of the heap. The process starts over. A common malloc() uses exactly what you say, except for the pointers.

In kernel-land, your Page Directory can be slightly different than physical memory (higher-half kernel, IPC and system calls via paging...), but kalloc() is really different. This kalloc() should be able to do the same as malloc(), but it must also be able to stop the page manager from remmaping those pages. It should reclaim them as being kernel heap. Remember that you're not hosted, you're the host! :wink:
Sanchezman wrote: Blocks will be split/combined as needs dictate. However, I'm wondering where I should put the code for the initialization of my memory manager. I'm thinking that the kernel should check the GRUB multiboot header to learn where all the initial blocks are and create the initial headers. Obviously my C library will have malloc() and such to allocate the blocks and stuff, but if my memory manager exists in the kernel, how should I get malloc() to comunicate with it? Shouldn't my C library be as kernel independent as possible? Is it 'best practice' to add something like malloc_init() function that takes over from the memory manager once it's done reading the multiboot header?
Calm down. Its easy! You can use the Multiboot Header (<multiboot.h>) to get access to this information. When your kernel gets control from GRUB, EBP contains the address of the Multiboot Information Structure. If you passed the correct options from GRUB, you can get a memory map simply by parsing an array.

And about the C library, all is about system calls. If you intend your C library to be independent from your OS, just *declare* a someSystemcall() function, then *define* it in every architecture/OS pair you target. If you follow POSIX.1 or other standard, search for a equivalent for this, such as sbrk() or mmap(). Then build up from there.

A little tip: Read *CAREFULLY* the multiboot.h file. It uses programming technics from the ol' days, so you can get confused easily if not differencing every single symbol.

Re: General questions regarding where a memory manager belon

Posted: Sat Nov 15, 2014 12:23 am
by Sanchezman
Hmm, so the way you're talking about suggests that I've actually been thinking about memory allocation wrongly from the start. I'd completely forgotten about paging. I'd just assumed that all memory would be allocated freely for as long as there was any to give. That is to say: A program asking for memory could be given any addressible chunk and there would be no guarantee where it would exist or that any memory blocks would be contiguous.

I guess my reticence to use paging is due to how I'm not sure if I understand it correctly. Here's how I see it, and correct me if I'm wrong:

The kernel starts up, it gets a list of memory addresses from the multiboot header.
The kernel chooses a place to begin the first page. This address is ususally a multiple of 4096 (because a page is usually 4KiB).
...stuff happens where a program gets loaded into however many pages are needed...
If a program needs more memory, it calls malloc(). Somehow, malloc() checks whether or not the current page (or pages) that the program has is enough.
If it is not enough, the program is given a new page. malloc() somehow knows how to check all pages that a program has been given.
...I'm not sure what else needs to happen...

So, I guess my new questions involve the interplay between paging and functions like malloc(). Obviously the answers can be different between any OS, but I'd like to understand what's common as I don't quite understand how it all fits together despite trying to read up as much on the subject as I can. What's the difference between a page frame and a page? What's the bad about just handing out memory whenever a program wants? What's the best way to differentiate memory asked for by a program and memory asked for by the kernel? How does the kernel even know how much memory it needs/takse up?

Re: General questions regarding where a memory manager belon

Posted: Sat Nov 15, 2014 6:11 am
by Brendan
Hi,
Sanchezman wrote:I guess my reticence to use paging is due to how I'm not sure if I understand it correctly. Here's how I see it, and correct me if I'm wrong:

The kernel starts up, it gets a list of memory addresses from the multiboot header.
The kernel chooses a place to begin the first page. This address is ususally a multiple of 4096 (because a page is usually 4KiB).
...stuff happens where a program gets loaded into however many pages are needed...
If a program needs more memory, it calls malloc(). Somehow, malloc() checks whether or not the current page (or pages) that the program has is enough.
If it is not enough, the program is given a new page. malloc() somehow knows how to check all pages that a program has been given.
...I'm not sure what else needs to happen...
I like to think of it a little more like this:
  • Something (kernel or boot code) uses the firmware's memory map to pre-initialise a physical memory manager
  • Something (kernel or boot code) sets up paging.
  • Kernel has a physical memory manager to manage physical pages
  • Kernel has a virtual memory manager to manage virtual address spaces (which uses the physical memory manager to allocate/free physical pages). This is provides an interface that:
    • allows you to allocate, free and change the attributes (read/write/exec, write-back/write-through/write-combining/uncached) of arbitrary virtual pages
    • allows you to create and destroy entire virtual address spaces
    • takes care of the trickier stuff, like allocate on demand, copy on write, memory mapped files and swap space (either directly, or coordinating other pieces like a swap space manager).
  • Kernel may (or may not) have its own "heap manager" (e.g. a "kmalloc()" and "kfree()" thing); which allows the kernel to allocate arbitrary sized pieces of memory, and uses the virtual memory manager's interface to allocate/free virtual pages if/when necessary.
  • Processes do whatever they want. This may include:
    • Having a standard C or C++ library that implements "malloc()", "free()", "new()" and/or "delete()"; which uses the virtual memory manager's interface (via. the kernel API).
    • Having a garbage collector of some kind; which uses the virtual memory manager's interface (via. the kernel API).
    • Lower level processes that use the virtual memory manager's interface (via. the kernel API) directly (without any library, garbage collector or anything).
Sanchezman wrote:What's the difference between a page frame and a page?
Different people use different terminology. I like to use "physical page" and "virtual page"; some like to use "page" (for a physical page) and "page frame" (for a virtual page - a "frame" that the physical page could be mapped into).
Sanchezman wrote:What's the bad about just handing out memory whenever a program wants?
Nothing, maybe. The thing is that you'd actually be handing out virtual pages, not real/physical pages; and there is a problem called "overcommit" where you've handed out more pages than you actually have.

For an example, imagine you've got 512 MiB of RAM and 1 GiB of swap space; and someone asks to memory map a 2 GiB file. You tell them that's fine; and (later) when they access a page in that area you load data from disk and pretend it was there all along. If they only read from the virtual memory area you can't run out of physical RAM - you can just free previously used RAM because the file/data is still on disk. If they modify half the pages in the virtual memory area it's still fine (you can use RAM and swap to store all the modified pages that can't be fetched from the file on disk anymore). However, if they modify all of the pages, you're in trouble - you've told them they can have 2 GiB of space, but "RAM + swap" is only 1.5 GiB.

There are good reasons to allow "overcommit" - most processes don't actually use everything they ask for, so if you don't allow overcommit you end up wasting RAM and denying allocations. It's one of the things where there is no right answer.
Sanchezman wrote:What's the best way to differentiate memory asked for by a program and memory asked for by the kernel?
That depends on "where". For the kernel API (that's used by processes) you'd have additional security checking, to ensure that processes can't (e.g.) ask to free kernel's RAM. For inside the virtual memory manager itself, you'd just differentiate using the virtual address itself - e.g. you can know that all virtual addresses above or equal to 0xC00000000 are for kernel; and all virtual addresses below 0xC0000000 are for processes.
Sanchezman wrote:How does the kernel even know how much memory it needs/takse up?
Does it need to know (e.g. can kernel just use whatever RAM it needs, and handle "out of memory" if/where it happens)?

Note: I would recommend keeping track of various statistics in various places; including the amount of free physical pages of RAM, how much virtual space each process is using, how much RAM the kernel is using, etc.


Cheers,

Brendan

Re: General questions regarding where a memory manager belon

Posted: Sat Nov 15, 2014 10:47 am
by eryjus
KemyLand wrote:When your kernel gets control from GRUB, EBP contains the address of the Multiboot Information Structure.

I'm sure it's a typo, but the spec is actually EBX.