Suggestions About Memory Allocators

arabasso · Post by **arabasso** » Wed Mar 31, 2010 9:49 pm

I was thinking about implementing a good memory manager for my OS project and I thought of several possibilities. The first was related to paging. In my OS project, reserve the first 1GB of memory to the kernel, and the rest for the user programs.

If each program has its own address space, so I conclude it would be interesting to allocate all the kernel Page Tables (for 1GB, 256 PTs would be used). Thus, 1 MB of memory is used, but has the advantage that when I was a new process, just copy the kernel page tables for pages tables of the processes, then all these pages mapped on kernel page tables would be available instantaneously.

Or I could go by placing the page tables when necessary, but there would be a serious problem if I change the page directory (in my OS, each process has its own page directory). With this, if I carry a module or anything else once I've uploaded a process, I would have to remap the page tables of all processes loaded.

Which of the options would be better? Or is there something else?

Another thing is with respect to the algorithm to allocate / deallocate memory for the kernel and modules. I'm using double linked list heap algorithm to allocate / deallocate memory for the kernel. As I understand it, the heap algorithm seems "somewhat slow." The first thing I did to improve the heap algorithm was to remove it the page aligned allocation. Instead I created a bitmap containing all the pages in use (a bitmap for stack modules and kernel threads, another bitmap for processes's PTs and PDs, and so on). All these structures are stored in the kernel heap.

There are so many possibilities that I am very confused on who to use. Suggestions will be helpful.

gerryg400 · Post by **gerryg400** » Tue Apr 13, 2010 9:58 pm

This is my first post to osdev. Hope I don't show my lack of knowledge too quickly.

My kernel allocator frees and allocates memory as required. I use the upper 1G for kernel space so frequently had to modify the pgdir of every running process. I came up with a couple of solutions.

1. My kernel uses the top 256 entries in the current process' pgdir for its memory. The kernel keeps a 64bit counter that increments every time it changes one of those entries. That counter is copied to a field in the process table of the currently running process when that process is being switched from. When the scheduler switches to a new process, it checks the counter for the new process and if it's less than the value the kernel has, it copies the top 256 entries (actually for me it's less than 256 because the malloc area is only from 64 entries) from the outgoing process to the incoming process and updates the counter of the outgoing process. It adds a small overhead (1024byte memcpy) to the task switch. The only problem is that if a process doesn't run for a long time, the counter will wrap around. The scheduler needs to monitor that.

2. If you're on a i386 in 32bit mode you could use PAE. I think then you could set one one the 4 PDPTE regs to point to a common value for all processes. I've not tried this nor thought it through completely. Just a thought.

Right now I'm moving to 64bits and rethinking everything.

- gerryg400

NickJohnson · Post by **NickJohnson** » Wed Apr 14, 2010 12:36 pm

arabasso wrote:I was thinking about implementing a good memory manager for my OS project and I thought of several possibilities. The first was related to paging. In my OS project, reserve the first 1GB of memory to the kernel, and the rest for the user programs.

If each program has its own address space, so I conclude it would be interesting to allocate all the kernel Page Tables (for 1GB, 256 PTs would be used). Thus, 1 MB of memory is used, but has the advantage that when I was a new process, just copy the kernel page tables for pages tables of the processes, then all these pages mapped on kernel page tables would be available instantaneously.

Or I could go by placing the page tables when necessary, but there would be a serious problem if I change the page directory (in my OS, each process has its own page directory). With this, if I carry a module or anything else once I've uploaded a process, I would have to remap the page tables of all processes loaded.

Usually the page tables that map the kernel are created only once, and pointed to by all of the page directories. Having more than one copy of the kernel page tables is a recipe for disaster, because you need the kernel to be the same in all processes, and managing a hundred copies of them would be very hard and slow.

Another thing is with respect to the algorithm to allocate / deallocate memory for the kernel and modules. I'm using double linked list heap algorithm to allocate / deallocate memory for the kernel. As I understand it, the heap algorithm seems "somewhat slow." The first thing I did to improve the heap algorithm was to remove it the page aligned allocation. Instead I created a bitmap containing all the pages in use (a bitmap for stack modules and kernel threads, another bitmap for processes's PTs and PDs, and so on). All these structures are stored in the kernel heap.

I'm not exactly sure what you mean by a "double linked list heap algorithm", because many heap algorithms use doubly linked lists, so I can't really tell you if you are doing it correctly.

gerryg400 · Post by **gerryg400** » Wed Apr 14, 2010 6:20 pm

Another thing is with respect to the algorithm to allocate / deallocate memory for the kernel and modules. I'm using double linked list heap algorithm to allocate / deallocate memory for the kernel. As I understand it, the heap algorithm seems "somewhat slow." The first thing I did to improve the heap algorithm was to remove it the page aligned allocation. Instead I created a bitmap containing all the pages in use (a bitmap for stack modules and kernel threads, another bitmap for processes's PTs and PDs, and so on). All these structures are stored in the kernel heap.

Arabasso,

I guess by 'heap algorithm' you mean something like userspace malloc. Have you considered a buddy allocator or a slab allocator ? A slab allocator is particularly good for allocating memory for kernel objects that are created and destroyed like process tables, thread structures etc. A buddy allocator might be better for allocating memory (especially bigger contiguous blocks) for loadable modules and kernel stacks. I wrap all my kernel allocation routines up in one function called kmalloc and let it decide which algorith to use depending on the amount of memory being allocated.

- gerryg400

arabasso · Post by **arabasso** » Thu Apr 15, 2010 8:11 am

Thanks for the answers.

Usually the page tables that map the kernel are created only once, and pointed to by all of the page directories. Having more than one copy of the kernel page tables is a recipe for disaster, because you need the kernel to be the same in all processes, and managing a hundred copies of them would be very hard and slow.

I thought the following about it: I created a structure that stores the page tables (for the x86 platform it has 1024 entries and occupies 4096 bytes). The case is that each process starts at 1GB> then 1GB< must be mapped to all processes communicate with the kernel.

The big problem is that if each process has its own address space up to 1GB, so I can not simply use the same page directory for all processes. So what I thought: reverse the first 256 page tables, and when a new process is created, I create a page directory structure (occupying 4096 bytes) and copy all the first 256 entries. Like structure "PageTable" is only a pointer to the pages, if I mark a page like "present", in all processes will be present automatically (even if I change the page directory). The only thing "bad" is that you lose 1MB of memory just for first 256 page directory entries.

I guess by 'heap algorithm' you mean something like userspace malloc. Have you considered a buddy allocator or a slab allocator ? A slab allocator is particularly good for allocating memory for kernel objects that are created and destroyed like process tables, thread structures etc. A buddy allocator might be better for allocating memory (especially bigger contiguous blocks) for loadable modules and kernel stacks. I wrap all my kernel allocation routines up in one function called kmalloc and let it decide which algorith to use depending on the amount of memory being allocated.

I think I was a little vague when I referred to "the heap algorithm". I'm referring to a way to allocate memory for the kernel (processes, threads structs, filesystem objects, modules, etc). With respect to userspace malloc, I leave it up to the process itself. I created a system call to allocate / deallocate pages for process, so the process itself must manage its heap (if the process needs to expand its heap, it should make a system call for reserve more pages).Buddy allocator is not very suitable in my case, given the fact that the size of structures and objects in my project vary greatly in size (except in cases where I need to allocate page aligned space). I have heard something "Slab Allocator", but totally unaware of any implementation...

Regarding the "kernel heap allocator" to speed things up I created a structure like this:

Code: Select all

struct Block
{
    unsigned short magic;
    bool free;
    unsigned long size;

    Block * next;
    Block * prev;
    Block * nextFree;
};

I put the field "nextFree" to try "speed" things up. When I call "kmalloc" (which in my kernel is only a call to the new operator), it does a search only in the free blocks. The last block is always marked as "free", thus "kmalloc" can not find a block, it expands the heap of the kernel. There was a huge performance gain by doing this, because I only researching the free blocks, and not simply in all blocks.

Comment on all that I have discussed. I would also understand the slab allocator algorithm, any site or material thank you very much.

gerryg400 · Post by **gerryg400** » Thu Apr 15, 2010 8:40 am

http://ptgmedia.pearsoncmg.com/images/0 ... n_book.pdf

This book has an entire chapter on the linux slab allocator.

- gerryg400

Selenic · Post by **Selenic** » Thu Apr 15, 2010 10:36 am

arabasso wrote:The big problem is that if each process has its own address space up to 1GB, so I can not simply use the same page directory for all processes.

I believe the usual solution to this is to keep a 'blank' page directory, which contains only the kernel stuff; this is then copied and the user part filled in when a new address space is required.

Also, usually 32-bit systems have a user/kernel memory split of 2/2 (Windows) or 3/1 (Windows /3GB and Linux); in the latter case, you can probably sacrifice 1MB for the 256 kernel page tables which are mapped into everything; then you only need to allocate things in the shared lower levels.

About heap allocation, I quite like one idea I've seen in Python's code (it requests large chunks from malloc, then sub-allocates itself) - nothing is touched until needed (ie, there is a 'next to allocate' pointer, as well as a free list which only contains space which has been allocated and then freed)

quanganht · Post by **quanganht** » Sun Apr 18, 2010 1:56 am

gerryg400 wrote:http://ptgmedia.pearsoncmg.com/images/0 ... n_book.pdf

This book has an entire chapter on the linux slab allocator.

- gerryg400

A very awesome book. Thanks

OSDev.org

Suggestions About Memory Allocators

Suggestions About Memory Allocators

Re: Suggestions About Memory Allocators

Re: Suggestions About Memory Allocators

Re: Suggestions About Memory Allocators

Re: Suggestions About Memory Allocators

Re: Suggestions About Memory Allocators

Re: Suggestions About Memory Allocators

Re: Suggestions About Memory Allocators