My memory manager

Jeko · Post by **Jeko** » Fri Nov 03, 2006 8:25 am

I wrote this memory manager. Do you think it is good? How can I optimize it?

unsigned int bitmap[32768];

void initmemory(){

	for (unsigned int i = 0; i<16; i++)
	{
		bitmap[i] = 0xFFFFFFFF;
	}
	
	for (unsigned int i = 16; i<32768; i++)
	{
		bitmap[i] = 0x00000000;
	}
}

void *malloc(unsigned int size)
{

	unsigned int pagenum = (size / 4096) + 1;
	unsigned int pagecontinue = 0;
	int bit = 0;
	int i = 0;
	int j = 0;
	unsigned long int address;
	
	for (i = 16; i<32768; i++)
	{
		
		while(j<32)
		{
			bit = bitmap[i] & (1<<i);
			
			if (bit == 0)
			{
			
				pagecontinue++;
				
				if (pagecontinue == pagenum)
					break;
				
				j++;
			}
			else
			{
 				j++;
				pagecontinue = 0;
			}
		}
		
		if (pagecontinue == pagenum)
			break;
		
		j=0;
	}
	
	if (pagecontinue != pagenum)
		return 0;

	address = ((i*32+j+1)-pagecontinue)*4096;
	
	i = (address/4096)/32;
	j = (address/4096)%32;
	
	pagecontinue = 0;
	
	while (pagecontinue < pagenum)
	{
		bitmap[i] = (bitmap[i] | (1<<j));
		j++;
		pagecontinue++;
		
		if (j==32 && pagecontinue < pagenum)
		{
			i++;
			j=0;
		}
	}
	
	return (void*)address;
}

Cheery · Post by **Cheery** » Fri Nov 03, 2006 9:24 am

Where is the free?

Jeko · Post by **Jeko** » Fri Nov 03, 2006 9:38 am

I not yet implemented it... I'm working on it now.

ces_mohab · Post by **ces_mohab** » Fri Nov 03, 2006 1:37 pm

first:
it's confusing that you use malloc because malloc in the standard library allocates arbitrary memory blocks. while I think you use a page allocator.
second:
implementing free is fairly easy than malloc
third

unsigned int bitmap[32768];

why 32768 ints i.e. 32768*4*4096 i.e. 512 MB you can use a dynamic array at the end of kernel and either calculate memory size or leave this task to boot loader.
optimization is possible.
you can increase the code about 32 times

by using code like this:

Code: Select all

       /* Get a unit of free bit */
	for( i=0 ; i<bitmap_size ; i++ )
		if( bitmap[i] != 0xFFFFFFFF ) 
			break ;
			
	/* Bad luck! We run out of pages */
	if( i==bitmap_size )
		return NULL ;
		
	/* We found a unit of free bit! set this bit */
	for( j=0 ; j<BTMPUNTSZ ; j++ )
		if( !GETBIT(bitmap[i],j) )
		{
			SETBIT(bitmap[i],j) ;
			break ;
		}
		
	/* compute address */
	return (void*) ( (i*BTMPUNTSZ+j)*PAGE_SIZE ) ;

GOod lUck

B.E · Post by **B.E** » Fri Nov 03, 2006 11:09 pm

ces_mohab wrote:first:
it's confusing that you use malloc because malloc in the standard library allocates arbitrary memory blocks. while I think you use a page allocator.
second:

implementing free is fairly easy than malloc
third

first, it's he's choose on what function name to use. and also there are many ways of implementing a memory allocator.
second, you can not have a free without a malloc

ces_mohab wrote:
unsigned int bitmap[32768];
why 32768 ints i.e. 32768*4*4096 i.e. 512 MB you can use a dynamic array at the end of kernel and either calculate memory size or leave this task to boot loader.

there is nothing wrong with 32768 ints(although to be 100% correct they should be unsigned longs), you have calculated the ammount of memeory it can access wrong. it should be 32768*32*4096(32 bits in an int).

Brendan · Post by **Brendan** » Sat Nov 04, 2006 12:28 am

Hi,

MarkOS wrote:I wrote this memory manager. Do you think it is good?

If it's a heap manager, then it's wasteful (e.g. to allocate space for 32 seperate 16 byte data structures, you'd consume 128 KB of RAM instead of 2 KB), and all that searching would be too slow.

MarkOS wrote:How can I optimize it?

If you must use a bitmap, then:

To speed up the searching, keep track of the lowest "possibly free" page, so that if the first half has already been allocated you don't keep searching it anyway. I'd also keep track of the total number of free pages, so that if there isn't enough free you don't go searching for it.

Then pre-check each dword before testing individual bits. For example:

Code: Select all

if (bimap[i] != 0xFFFFFFFF) {
     /* check bits */
}

A better idea would be to use "REP CMPSD" instead.

Lastly, consider using the BSF instruction to get rid of the inner loops.

BTW if someone does "malloc(4096)" they'd get 8192 bytes. It might be a good idea to protect against "malloc(0)" too...

It's also occured to me that it will be impossible for you to implement "free()" because you have no idea how much space was allocated. This isn't that hard to fix though - store the size of the allocated space at "address" and return "address + 4".

Taking this into account, you might want to do:

Code: Select all

unsigned int pagenum = ( (size + sizeof(void *) - 1) / 4096) + 1;

Of course this makes it as fragile as most heap managers - it's easy for common programmer's bugs to corrupt the heap management. For example:

Code: Select all

    unsigned int myArray;
    unsigned int myString;

    myArray = malloc ( 1024 * sizeof(unsigned int) );
    myString = malloc (20);
    myArray[1024] = 0xFFFFFFFF;
    free(myString);

In this case, your "free()" code could decide to free 4 GB instead of 20 bytes - lucky it's part of the C library and not running in kernel space...

Cheers,

Brendan

ces_mohab · Post by **ces_mohab** » Sat Nov 04, 2006 4:41 am

B.E wrote: first, it's he's choose on what function name to use. and also there are many ways of implementing a memory allocator.

let's say that choosing malloc as a name is confusing. Instead page_alloc() is more convenient, personally i use printf() to print kernel messages although it will be confusing when i support libc it's suitable to use kprintf or any thing else.

B.E wrote: there is nothing wrong with 32768 ints(although to be 100% correct they should be unsigned longs), you have calculated the amount of memory it can access wrong. it should be 32768*32*4096(32 bits in an int).

sorry you are right

now we use 512*8 MB i.e. 4 GB address space but using this waste memory on a machine most probably has much less memory than 4GB

instead a dynamic array can be used. total available memory must be known.

other trick

for repeated search. if you search the bitmap 10 times consecutively don't start from 0 instead use a static variable and start search from last found address.

if you are using paging i guess then you don't need to allocate multiple pages except for DMA. this will reduce memory overhead and linked lists structures.

it's well known that shifting is much faster than multiplcation. Hardware engineers know that multiplication is a sequential operation which means multiplying a 8 bit numbers may take 8 clocks or cycles. so, use shifting as i remember takes 1 or 2 cycles. Intel now uses barrel shifters. that's absolutely true for division and modulus operations.

Assembler · Post by **Assembler** » Sat Nov 04, 2006 1:04 pm

Hello,

ces_mohab wrote: it's confusing that you use malloc because malloc in the standard library allocates arbitrary memory blocks. while I think you use a page allocator.

Your are a bit right, its a page allocator, and should be named to something else, but he is free to choose whatever he want.
About the confusion between std C malloc, and kernel's malloc, there is no problem in naming the kernel's malloc just malloc, FreeBSD does the same.
if u got FreeBSD just type
man 3 malloc or http://www.freebsd.org/cgi/man.cgi?quer ... ormat=html
for libc malloc, and
man 9 malloc or http://www.freebsd.org/cgi/man.cgi?quer ... ormat=html
for kernel's malloc.
The only need to differentiate both malloc's exists in virtual machines VM, this leads Matt Dillon to rename dragonfly kernel's malloc() to kmalloc() & kernel's free() to kfree().

MarkOS wrote: I wrote this memory manager. Do you think it is good? How can I optimize it?

Its a very simple MM, and will pay the tax from the performance.
For the performance i've some notes:
you don't have to group the whole bitmaps in one array of longs, u can divide them into for example 3 groups called Zones, one for himem, one for normalmem, and one for dma.
linux & freebsd both uses that type of allocater called Zone Allocator which is sometimes known as Slab Allocator.
linux divides the whole pages into 3 types:
1- Highmem (896 MB - ....) (ZONE_HIGHMEM)
2- Normal (16 MB - 895) (ZONE_NORMAL)
3- DMA (0 MB - 16 MB) (ZONE_DMA)

besides we should keep track of the last free bit in each of this zones.

For FreeBSD:
The kernel memory allocator uses a hybrid strategy. Small allocations are done using a power-of-2 list strategy. Using the zone allocator, the kernel creates a set of zones with one for each power-of-two between 16 and the page size. The allocation simply requests a block of memory from the appropriate zone. Usually, the zone will have an available piece of memory that it can return. Only if every piece of memory in the zone is in use will the zone allocator have to do a full allocation. When forced to do an additional allocation, it allocates an entire page and carves it up into the appropriate-sized pieces. This strategy speeds future allocations because several pieces of memory become available as a result of the call into the allocator.
Freeing a small block also is fast. The memory is simply returned to the zone from which it came.
Because of the inefficiency of power-of-2 allocation strategies for large allocations, the allocation method for large blocks is based on allocating pieces of memory in multiples of pages. The algorithm switches to the slower but more memory-efficient strategy for allocation sizes larger than a page. This value is chosen because the power-of-2 algorithm yields sizes of 1, 2, 4, 8, . . ., n pages, whereas the large block algorithm that allocates in multiples of pages yields sizes of 1, 2, 3, 4,. . ., n pages. Thus, for allocations of greater than one page, the large block algorithm will use less than or equal to the number of pages used by the power-of-2 algorithm, so the threshold between the large and small allocators is set at one page.

Thanks alot

JAAman · Post by **JAAman** » Sun Nov 05, 2006 11:14 am

it's well known that shifting is much faster than multiplcation. Hardware engineers know that multiplication is a sequential operation which means multiplying a 8 bit numbers may take 8 clocks or cycles. so, use shifting as i remember takes 1 or 2 cycles. Intel now uses barrel shifters. that's absolutely true for division and modulus operations.

multiplication is the same speed as shifting on newer chips ... both take 1/2 cycle

on northwood, there are 4 ALUs, which operate on double-clock, and can perform any operation in a single half-clock, but only 1 can do multiplication (though the instruction translator will automatically translate mul/div by powers of two into shift instructions, iirc)

on prescott all 4 ALUs can perform mul/div in a single half-cycle (means a 3GHz Prescott can perform 24billion mul/div per second -- the same speed as shift operations)

not sure about conroe, but its prob faster (certainly not slower)

Assembler · Post by **Assembler** » Sun Nov 05, 2006 2:07 pm

JAAman wrote:
it's well known that shifting is much faster than multiplcation. Hardware engineers know that multiplication is a sequential operation which means multiplying a 8 bit numbers may take 8 clocks or cycles. so, use shifting as i remember takes 1 or 2 cycles. Intel now uses barrel shifters. that's absolutely true for division and modulus operations.
multiplication is the same speed as shifting on newer chips ... both take 1/2 cycle

on northwood, there are 4 ALUs, which operate on double-clock, and can perform any operation in a single half-clock, but only 1 can do multiplication (though the instruction translator will automatically translate mul/div by powers of two into shift instructions, iirc)

on prescott all 4 ALUs can perform mul/div in a single half-cycle (means a 3GHz Prescott can perform 24billion mul/div per second -- the same speed as shift operations)

not sure about conroe, but its prob faster (certainly not slower)

This is not correct for old micro processors.
Here is the data i found in "The Intel Microprocessors 5th Ed" Book by "Barrey B. Brey"

Code: Select all

SAL/SAR/SHL/SHR (SHIFT)

Example : SHL reg, 1
Clocks:
8086 = 2
8088 = 2
80286 = 2
80386 = 3
80486 = 3
Pentium = 1 or 3

Example: SHL mem, 1
Clocks:
8086 = 15 + ea
8088 = 23 + ea
80286 = 7
80386 = 7
80486 = 4
Pentium = 1 or 3

Example: SHL reg, imm
Clocks:
8086 = ---
8088 = ---
80286 = 5 + n
80386 = 3
80486 = 2
Pentium = 1 or 3

MUL (Multiply)

Example: MUL reg
Clocks:
8086 = 118
8088 = 143
80286 = 21
80386 = 38
80486 = 42
Pentium = 10 or 11

Example: MUL mem
Clocks:
8086 = 139
8088 = 143
80286 = 24
80386 = 41
80486 = 42
Pentium = 11

This data indicates that for at least Petium microprocessor the "Shifts" is Faster than the "Multiplies".

OSDev.org

My memory manager

My memory manager

Re: My memory manager