Page 1 of 1

Setting Up Memory Management

Posted: Thu Aug 24, 2017 5:36 pm
by horizon
I am working on setting up a memory manager as the next step of my kernel.

I have already obtained a memory map from GRUB that shows which regions are available, reserved, their addresses, and how long they are. My next step would be to set up a page frame allocator for physical memory, but I have a few issues I feel I need to address first. Here is my general thought process:

1. Reserve some space for the kernel's data structures and memory usage. Reserve the remaining portion for user space.

2. Write some kind of malloc() function for the Kernel to use in its data structures (probably using some kind of watermark allocator, without any kind of paging/virtual address space--just using physical addresses and managing memory blocks).

3. Set up paging for the the user space memory and build a memory manager for that from there, including a physical memory manager (page frame allocation), page table, page directory, and enabling paging.

Is this a correct approach?

I am wondering whether the kernel needs to use paging too when allocating its data structures. And if so, and all memory should be under paging (kernel or userspace), how do I set up the data structures that are used in writing a page frame allocator in the first place? If I need a AVL Tree or Linked List for a physical memory manager, don't I need a malloc function for those in the first place?

Re: Setting Up Memory Management

Posted: Thu Aug 24, 2017 5:48 pm
by heat
Hi,
1. Reserve some space for the kernel's data structures and memory usage. Reserve the remaining portion for user space.
Kernel data structures aren't statically allocated, they are allocated on demand. Memory management inside the kernel is practically the same as in a user-space program, except in special cases where you can't allocate memory at all(i.e IRQs); for those cases, you'll need a special memory pool. You can't reserve space for the data structures, the memory will just be allocated from the general pool of pages.
2. Write some kind of malloc() function for the Kernel to use in its data structures (probably using some kind of watermark allocator, without any kind of paging/virtual address space--just using physical addresses and managing memory blocks).

3. Set up paging for the the user space memory and build a memory manager for that from there, including a physical memory manager (page frame allocation), page table, page directory, and enabling paging.
Kernels run in virtual memory as well, paging isn't just enabled for the user-space's sake. You'll want to have your malloc() run on top of your virtual memory manager, that runs on top of the physical memory manager. Asking malloc for memory for the first time should look like this: malloc(8) -> malloc checks if it has any available block=no -> ask the virtual memory manager for some memory -> vmm finds a region and maps it using pages given by the pmm -> malloc returns a block.

Also, watermark allocators are just a Bad Idea™ in general. Freeing memory is almost as common as allocating memory, especially in a big library/kernel. Not properly freeing things right from the beginning will give you lots of grief.

Re: Setting Up Memory Management

Posted: Fri Aug 25, 2017 12:39 am
by Octacone
This is how I think of memory management. It will take you some time to understand the whole concept.
There are 3 distinct memory management levels:
  • 1.Physical memory. Here you can use a general purpose stack/bitmap allocator. Your allocator needs to be able to return page aligned addresses (either 4 KB, 2 MB or 4 MB). Here we use a concept called blocks, where each block is either 4 KB, 2 MB or 4 MB in size. You will need a function (you will also need allocate_block(s), free_block(s)) that can reserve certain space from X to Y. Make sure you use your multiboot memory map. If your memory map says something is used then make sure to reserve that area. Everything that is free has to be marked as free. Note: some memory is likely to be above 4 GB = not accessible in 32 bit protected mode, you will need to either enable PAE or not use that memory (wasteful). Remember, memory does not have to be contiguous and you PMM needs to be aware of this. There can be holes as big as 1 GB between the memory map areas. Making a huge bitmap/something else that can cover everything from 0x0 to 0xFFFFFFFFFFFFFFFF is not recommended, better split your implementation per each free area, aka 1 free area = 1 separate bitmap. You don't have to use a bitmap, there are other options as well, but it is the easiest one. Make sure you reserve and save those ACPI areas accordingly, you can reclaim some of them after reallocating them. This memory manager is the one that you are supposed to reserve your kernel with. Make sure you reserve your entire kernel from A to Z so things don't get overwritten. Consider making a physical address space manager. (it could manage all of your devices and structures inside your memory used by the BIOS).
  • 2.Virtual memory. This memory includes paging, everyone's favorite subject. This is really straight forward. Enable paging, map all of its structures, use recursive page directory trick if you want. Make sure you map all of your PMM structures so you don't get unexpected triple faults. As stated before you can enable PAE, or not (if you don't care about wasting a few hundred MB or even up to 1 GB). Now, you will need to replicate the same exact functions as used in PMM but they will have to be virtual address space aware. You don't have to make an entirely new system. You could just use your PMM and map/unmap the addresses returned by it. Don't let paging scare you, once you figure it out you will like it. Intel manuals are always a great thing. You could also make functions such as page directory switching or page table cloning. When we talk about VMM we also talk about the higher half kernel. You can choose to use it or not. It has its pros and cons, I personally don't use it, yet. If your kernel is loaded at 1 MB you don't need to worry about first MB being used, you will need it for a Virtual 8086 monitor (if you decide to use it for graphics mode switching).
  • 3.Dynamic memory. This is the part where we talk about malloc, free, calloc, realloc, etc... This is mostly user space related. This is a manager that people will use to create applications for your OS. (you primarily) Dynamic memory manager has to return a certain amount of bytes as requested. It is suggested to use an AVL tree for performance. It is "hard" to implement it at first, but it is the most efficient implementation (consider self balancing trees and red black ones). A linked list can be used for this and it is easy to work with, but it is really really slow and not recommended. You don't have to go through all this you can just port an existing allocator. Such as: dlmalloc, liballoc, tlfs, nedmalloc, ptmalloc, etc... I would personally suggest porting if you don't know what are you doing, but if you are familiar with binary tress go ahead. But what about the kernel, you didn't talk about that. Well you could make a simple allocator that would return 2, 4, 8, 16, 32, 64 bytes for kernel to use. You should use this allocator for kernel structures and "data" keeping. It is supposed to be on top of VMM. Ask VMM for couple of pages (pages in VMM) (blocks in PMM) and then parcel/slice/chop those pages into fragments/smaller pieces and that could be your pool (not the swimming one!).
I hope I was even the tinniest bit helpful.

Re: Setting Up Memory Management

Posted: Wed Aug 30, 2017 7:44 am
by LtG
Octacone wrote:You don't have to use a bitmap, there are other options as well, but it is the easiest one.
I know I've said this before, but I'll say it again, bitmaps aren't easier than a stack. Or if they are, I'd like to hear some explanation in what way are they easier?

In the proposed bitmap arrangement you have multiple bitmaps which you must search in some way and make sure there aren't bugs, how is that easier than a stack? If you want your bitmap to be even remotely performant the optimizations become quite convoluted, a stack is very easy to implement and has pretty good performance. Even a bitmap without any optimizations doesn't seem easier compared to a stack. I'm mainly objecting on Octacone suggesting that a bitmap is _easier_. If you want to use a bitmap, go for it, but I don't think it's easier.

The main issue with a simple stack is that it doesn't inherently support multiple page sizes, but I wouldn't suggest large page support until you're a lot further ahead. Large pages can also hurt performance, which means you need pretty full OS implementation (including userspace apps) so that you can measure large page effect on performance.

Aside from multiple page size support I'm not sure if bitmaps has any advantage, and even in multiple page size support it's not stellar. So my general suggestion is to use a stack..

Re: Setting Up Memory Management

Posted: Wed Aug 30, 2017 8:12 am
by Octacone
LtG wrote:
Octacone wrote:You don't have to use a bitmap, there are other options as well, but it is the easiest one.
I know I've said this before, but I'll say it again, bitmaps aren't easier than a stack. Or if they are, I'd like to hear some explanation in what way are they easier?

In the proposed bitmap arrangement you have multiple bitmaps which you must search in some way and make sure there aren't bugs, how is that easier than a stack? If you want your bitmap to be even remotely performant the optimizations become quite convoluted, a stack is very easy to implement and has pretty good performance. Even a bitmap without any optimizations doesn't seem easier compared to a stack. I'm mainly objecting on Octacone suggesting that a bitmap is _easier_. If you want to use a bitmap, go for it, but I don't think it's easier.

The main issue with a simple stack is that it doesn't inherently support multiple page sizes, but I wouldn't suggest large page support until you're a lot further ahead. Large pages can also hurt performance, which means you need pretty full OS implementation (including userspace apps) so that you can measure large page effect on performance.

Aside from multiple page size support I'm not sure if bitmaps has any advantage, and even in multiple page size support it's not stellar. So my general suggestion is to use a stack..
You really couldn't resist replying to this thing, couldn't you? As if you had a detector that could sniff when I mention bitmaps. :-k
Let's solve this I guess:
This is based of my personal experience, to me I can wrap my head around it more easily that I can with a stack.
Also there was an old topic speaking about how bad the stack was. I think it had something to do with DMA pages + not being space efficient.
When it comes to allocation performance, I really don't care about 2.000001 ms more or less. "User doesn't care about things their eye can't notice."
It is very easy to implement a bitmap based allocator, there are plethora of examples and other documents that can get you started, while you can't say that for your stack.
Looks like both of our opinions are very subjective and that the author should decided himself/herself what he/she thinks is the best.

Re: Setting Up Memory Management

Posted: Wed Aug 30, 2017 10:52 am
by LtG
Octacone wrote: You really couldn't resist replying to this thing, couldn't you? As if you had a detector that could sniff when I mention bitmaps. :-k
I couldn't, and I've been away for a few days, so...

The main reason was that I really consider bitmap to be a tiny bit more complex than a stack.
Octacone wrote: This is based of my personal experience, to me I can wrap my head around it more easily that I can with a stack.
You're of course free to use whatever you like, but I do think that wrapping your head around a stack should be fairly trivial.
Octacone wrote: Also there was an old topic speaking about how bad the stack was. I think it had something to do with DMA pages + not being space efficient.
Not sure which thread, and not sure what the comparison was. Assuming the simple case of only allocating a single 4KiB physical page frame stack is probably the fastest and best performing option there is.

If you need to handle special memory ranges (for example for DMA), then things get slightly more complex. A simple solution is to have two (or more) stacks, where the low memory is put in the special DMA stack and rest goes to the normal stack.

A bitmap needs to search the bitmap for a free page, a stack just pops it. To get even remotely decent performance from bitmap you need to "remember" where you last looked in the bitmap so you can start the next search there, instead of the beginning (where all memory is likely to be in use), but even that isn't very good, which means the complexity keeps rising to get better performance.

Octacone wrote: When it comes to allocation performance, I really don't care about 2.000001 ms more or less. "User doesn't care about things their eye can't notice."
Not sure what you are saying here, I'm guessing you're trying to say that the difference between stack and bitmap is on the order of 0.000001, or in other words 1ns? The difference between a simple stack vs a simple bitmap in non-pathological cases would be massively bigger than 1ns.

I'm not accounting for the syscall overhead, just the bitmap vs stack algorithm. Best case scenario (the bitmap "pointer" points to free bit) would likely give very close performance to stack and everything else increasingly worse. On average I'd expect 2x to 1000x worse performance. I don't have any tests to provide more useful numbers.
Octacone wrote: It is very easy to implement a bitmap based allocator, there are plethora of examples and other documents that can get you started, while you can't say that for your stack.
Looks like both of our opinions are very subjective and that the author should decided himself/herself what he/she thinks is the best.
When a page is allocated you return the current "stack pointer", it points to a free page. Before returning it you read from that page what is the next "stack pointer" and that gets stored in the "stack pointer" for next time. This way the stack uses the free pages themselves, thus the stack itself only needs memory for a couple of variables, so effectively it uses no memory at all.

When you deallocate (free) a page you do the reverse, you write to the newly free page the current "stack pointer" and then set the "stack pointer" to point to the newly deallocated page.

I hope that explanation was simple =)

There's of course various optimizations you could do with that, like putting more than one address in each page (better cache usage), allowing multiple 4KiB pages to be returned simultaneously, etc..

Re: Setting Up Memory Management

Posted: Wed Aug 30, 2017 2:30 pm
by Korona
The requirements of DMA indeed make stack implementations less useful. DMA often has requirements like
  • Buffers must be contiguous in physical memory and have a size of x bytes (where x > page size). For example, UHCI requires buffers to be physically contiguous.
  • Buffers must be below x GiB. Many devices can only address 32-bits.
  • Buffers must be aligned to x bytes (where x > page size). However, this should be somewhat rare.
  • Buffers must not cross x byte boundaries. For example XHCI requires buffers not to cross 64 KiB boundaries.
All these things are much easier to implement if your have a bitmap (or better, a buddy allocator) and not a stack.

Re: Setting Up Memory Management

Posted: Wed Aug 30, 2017 3:01 pm
by LtG
Korona wrote:The requirements of DMA indeed make stack implementations less useful. DMA often has requirements like
  • Buffers must be contiguous in physical memory and have a size of x bytes (where x > page size). For example, UHCI requires buffers to be physically contiguous.
  • Buffers must be below x GiB. Many devices can only address 32-bits.
  • Buffers must be aligned to x bytes (where x > page size). However, this should be somewhat rare.
  • Buffers must not cross x byte boundaries. For example XHCI requires buffers not to cross 64 KiB boundaries.
All these things are much easier to implement if your have a bitmap (or better, a buddy allocator) and not a stack.
How does bitmap make that easier? If the memory is in use, then it's in use, whether you use stack or bitmap.

For instance you can have three stacks:
First 16MiB
16MiB - 4GiB
4GiB+

And the granularity with these can be larger than 4KiB if you like, further, you can "easily" evict something from physical memory and thus clear contiguous memory. It also should be a one time thing, allocate the DMA memory when the driver requests it (at driver startup), so the cost is minimal and you get a lot faster "normal" memory allocation.

I haven't tested buddy for physical allocation, but I'm not sure if even that will provide significant benefits over stack if large pages aren't used. I would imagine stack being faster at 4KiB page allocation than buddy.