OSDev.org

Posted: **Wed Dec 31, 2003 12:00 am**

hi

I am currently doing my memory mangement part of my operating system. And I have a page based memory model for my os. The problem is : What algorithm to use for finding free pages - linear search consumes too much time and having separate lists(ie active list, inactive list and free list) seems ridiculous to me(imagine one entry for each page & the no of pages = 8*2^10 (for 32Mb memory)).

Help me.

Posted: **Wed Dec 31, 2003 12:00 am**

Different methods are around, the beautiful thing about programming something is you decide how it works.

The free list as you call it is what most OS's use. MS Windows uses it, as does linux (so im told, i always thought it used something called the buddy system).

As mentioned above, the buddy system, uses ranges of space in lists like pages but bigger.
So you have a list of blocks of 32kb, 16kb, 8kb, 4kb and 2kb. Going as high as you like, and when you need for example, 30kb, you allocate 30kb from the 32kb list of ranges, and the 2kb left overs gets placed in the 2kb list. Its space efficient and can resize your blocks when needed without waste.

If you're worried about the memory consumption of the free list, how about you list the page numbers that are free, and make sure that the pages are aligned to 4kb when you store them. Then you would take a page off the list (which wouldnt require as many bytes to store) and multiply out by the page sizes. you then have the base pointer to your page which you can return as an allocation.

Find Tim Robinsons guide to memory management. Its based on the 'free list' idea.
And a site about memory management, might be useful in your quest for an allocation algorithm - http://www.memorymanagement.org/

Dangamoose

Posted: **Wed Dec 31, 2003 12:00 am**

I use a stack... looks something like this:

The Stack Allocator:
         -------  <- length
        | alloc |
        | pages |
        | ===== | <- pointer
        |       |
        | ----- |
        |       |
        | ----- |
        | free  |
        | pages |
         -------  <- base

Where each element is a 4 byte address, page aligned.  The process of allocating a page becomes a simple decrement of the pointer.  The deallocate a page, increment the counter, and place to newly available address on the top of the free pages stack.  Simple, very fast, and only consumes 4 bytes per 4kb.

Cheers,
Jeff

Posted: **Wed Dec 31, 2003 12:00 am**

Personally i use the same system but without technically using a stack.
It means i don't need to disrupt my kernels stack with new stack pointers of a memory heap.
Instead i emulate the pointers inside a register and change the 'stack offset' manually rather than the cpu doing it every time you use the push instruction.

And, I have a question about that.

If you have more than 4Gb or ram then would it be true to say you would have to accomadate a bigger pointer (8bytes) for each page, or, in order to accomodate more than 4gb of ram you have to use a bigger size page anyway, thus no need to store larger pointers?

Im not sure.

Dangamoose

Posted: **Wed Dec 31, 2003 12:00 am**

Of course... so do I! It would be absolutely rediculous to use the actual stack segment and pointer!!! How would your kernel call any methods?

And yes... if you had over 4GB of RAM, you'd need to use larger pointers... 4 bytes can't reference anything larger then 4.2GB, of course.

However, if you've got more then 4GB of RAM, then you're using a 64-bit architecture, and you can afford to have 8 byte pointers.

You could, conceivably, use larger page sizes... but I wouldn't. With a 64-bit address space, you'd need a HUGE page size to make your pointers fit in 32 bits... AKA 4.2GB pages... that's insane.

Cheers,
Jeff

Posted: **Thu Jan 01, 2004 12:00 am**

I would be using a 64bit architecture for more than 4.2Gb?

Whys my cpu (AMD Athlon XP 2100+), a 32bit arch cpu, able to support and run pages sizes for purposes of having more than 4.2Gb of RAM? If it only works under 64bit archs then why implement it under 32bit cpu's?

Dangamoose.

Posted: **Thu Jan 01, 2004 12:00 am**

Yeah... you're talking about the PAE bit, of course, which anything past a pentium has.

Depending on that bit, you've got 4kb or 4mb page sizes... that doesn't change the fact that the bus is only 32-bits, and can only reference 4.2GB of "whatever"... be it bytes, 4kb pages, or 4mb pages...

Soo....

With PAE enabled, pages are 4MB, with a maximum of 64GB of memory (if I recall...)

4MB can be represented with 23 bits, ergo the lower 23 bits are always going to be 0 anyway (4MB aligned) so you can right shift all pointers by 23.

64GB == 2^36

Ergo, when you think about it (and assuming my calculations are right) even WITH 4MB pages enabled, you can represent the entire address space in a 32-bit integer... in fact... in less then a 32-bit integer.

Cheers,
Jeff

Posted: **Sat Jan 03, 2004 12:00 am**

Hope you guys don't mind if I interject a few things into this:

First, the solution of using a stack to list all free pages on a 32-bit system seems like a bad idea. A simple bitmap indicating whether each possible page is allocated or not would use 1/32 the memory and provide the same information. Although finding the next available page would be slightly slower (a linear search versus a constant time pop), it would only use 0x20000 (128K) bytes of memory as opposed to 0x400000 (4M!!!) to represent the entire 32-bit address space of 4k pages. However, neither strategy is scalable to much larger address spaces, so I would probably use something else altogether, for instance, organizing the free pages into a linked list.

Second, your information on PAE is not complete or fully correct. The physical address bus is extended beyond 32-bits, and in the page tables, the physical addresses have 64-bit fields, although at present no processors actually have more than a 48-bit physical bus (I think). 4MB pages are supported, but 4KB pages are also still supported, with an additional level of page tables. So, unless you are content with supporting no memory allocations of less than 4MB (which is kinda inefficient) you will in fact have to keep track of at least 52-bits of the physical address of a page unless you want to set an upper limit on the amount of memory you can have that is lower than the logical potential.

Posted: **Sun Jan 04, 2004 12:00 am**

4MB/4GB is still not even 0.1%, however, so it doesn't seem like too big of a deal to me.

Personal preference, of course.  I like the idea of the stack because it's incredibly fast, and very simple to implement.  I don't like bitmaps, especially for finding contiguous blocks.  I did use a bitmap in my first OS, however.

Indeed, using a 4MB granularity allocator would be much too inefficient.  I, personally, would probably take care of that with another allocator on top the 4MB one, however... that is, of course, assuming we're still going with the original trend and trying to fit 64GB of memory space into the standard 32-bit stack.

Honestly, it's all heresay on my part, because I haven't, nor do I intend to use, the PAE extensions.

If I were to start from scratch, using PAE, I would actually probably use a fully 64-bit wide stack, anyway.  It'd be more scalable to fully 64-bit architectures, at least... perhaps have an option to scale down to the 32-bit stack with less then 4GB of memory to save up on a lot of space.

In any event, again, it's all conjecture on my part

Cheers,
Jeff

Posted: **Sun Jan 04, 2004 12:00 am**

Thats pretty much what i've written in my os memory setup routine so far.
Not complete mind you.
Basically, when i boot, i boot from Grub with the memory map it gives me, i check if my allocatable free ram goes upwards of 4.2Gb, if so, i use 8byte addressing, if not, 4byte addressing.
A single integer sets the length of the address size, i.e. 4 or 8, and all addresses are page size aligned in a stack.

On that note, i thought about not allowing tasks to share pages which at the virtual memory management level stops all fragmentation and stops tasks screwing with each other. Any space left over from an allocation is stored as a range within the kernels memory manager for if the task decides to try and allocate some more memory later.
I don't think its a bad approach with a 4kb page but obviously 4mb pages would tie up a lot of memory if it never got allocated to the same task.

Do you think thats wise? im not too sure, i would like some other opinions.
If im using 4mb pages in the first place, then the memory is upwards of 4.2Gb and i can afford to use the method?

Maybe i could allow 4mb pages to be shared by tasks but not 4kb.

Dangamoose.

Posted: **Tue Jan 06, 2004 12:00 am**

hi friends,

Thanks for ur replies. I am implementing my memory manager by using bitmaps. As rexlunae said: 1bit for every 4kb page. Thus I can cramp 8 pages in one byte bitmap.

And as far as the linear search time is concerned I'm working on an algorithm that takes O([n/32]+5) in worst case and O([n/32]+9) in best case.

Thus to get a free page in a bitmap which has already allocated 63 consequtive pages we will need from 6 to 10 searches. Also the deallocation is far faster as it deals with bitmaps.

I am expecting your views.

Thank you

Posted: **Tue Jan 06, 2004 12:00 am**

Bear in mind that you won't find a faster solution for allocations and deallocations than the stack method.

The bitmap method uses less space, but costs much more time for allocations, and deallocations are somewhat slower (each dealloc will require a computation to determine what bit in what byte id's the page, then you must read, binary op, and write that byte.)

Given the large quantities of cheap RAM to be had for PCs these days, I think more people will implement the simple stack and not look back. Managing your virtual pages per process is a bit more perplexing.

Keep on rockin'

Posted: **Wed Jan 07, 2004 12:00 am**

"Bear in mind that you won't find a faster solution for allocations and deallocations than the stack method."

The time taken to search a bitmap of 128K for a single unmarked bit is really quite small. Compared to the other steps in making a system call for page allocation, it will not be noticed. Additionally, if you use a stack, and you are not content to just use a static-sized array (which would have to be 4MB) you will have to use some sort of dynamic memory allocation, which will have its own overhead.

"... and deallocations are somewhat slower (each dealloc will require a computation to determine what bit in what byte id's the page, then you must read, binary op, and write that byte.)"

Well, any solition will require a few operations. There are also several operations that will have to be performed to push a page address onto a stack. How many and how fast they are depends on the implentation, but both can be done O(1). It may be that the deallocations of a page in a bitmap implementation are faster than a stack implementation.

Posted: **Thu Jan 08, 2004 12:00 am**

The "several operations" is simply incrementing (or decrementing, depending which way you go) a pointer into the stack.

Here's an example of a page allocation routine. The deallocation routine is almost exactly the opposite.

I wouldn't recommend using this code in a real OS without bullet-proofing it. (That's a disclaimer, folks)

[code]
// Array of pointers initialized to the base addresses of all available pages
// on the system
void *page_heap[TOTAL_PAGES];
void **sp = page_heap;

void *allocate_page(void)
{
        if (sp >= page_heap + TOTAL_PAGES) {
                return NULL;
        } else {
                return *sp++;
        }
}
[/code]

Posted: **Thu Jan 08, 2004 12:00 am**

I use the stack method and My stack is not STATIC. Its size is determined from the amount of physical memory available.
if the system has 32mb physical ram the the stack size needed is
stack_size=(memory_installed+4095)/4096;
the addition of 4095 is to round the division up.
all that is then needed is to reserve stack_size bytes starting from a know position ie end o kernel code & data.
each entry in the stack is 4 bytes. And there you have is.

more_code can be implemented as a "POP" instruction
free_core can be implemented as a "PUSH" instruction
and alot of other functions can be implemented quickly.

I doubt that a deallocation of a page using the bitmap method is faster. All those shifts and bit tests will take more time than just placing the page back on the stack.
ie

more_core:
     if stack_ptr < stack_base_address stop due to underflow
     else
      page_address=*stack_ptr
      stack_ptr--;
      return page_address

free_core:
   if stack_ptr > stack_top_address stop due to overflow
   else
     *stack_ptr=page_address
      stack_ptr++;

how slow is that? This beats any bitmap method I've seen so far.

OSDev.org

Page allocation algorithm

Page allocation algorithm

RE:Page allocation algorithm

RE:Page allocation algorithm

RE:Page allocation algorithm

RE:Page allocation algorithm

RE:Page allocation algorithm

RE:Page allocation algorithm

RE:Page allocation algorithm

RE:Page allocation algorithm

RE:Page allocation algorithm

RE:Page allocation algorithm

RE:Page allocation algorithm

RE:Page allocation algorithm

RE:Page allocation algorithm

RE:Page allocation algorithm