My memory manager

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
User avatar
Jeko
Member
Member
Posts: 500
Joined: Fri Mar 17, 2006 12:00 am
Location: Napoli, Italy

My memory manager

Post by Jeko »

I wrote this memory manager. Do you think it is good? How can I optimize it?

Code: Select all

unsigned int bitmap[32768];

void initmemory(){

	for (unsigned int i = 0; i<16; i++)
	{
		bitmap[i] = 0xFFFFFFFF;
	}
	
	for (unsigned int i = 16; i<32768; i++)
	{
		bitmap[i] = 0x00000000;
	}
}

void *malloc(unsigned int size)
{

	unsigned int pagenum = (size / 4096) + 1;
	unsigned int pagecontinue = 0;
	int bit = 0;
	int i = 0;
	int j = 0;
	unsigned long int address;
	
	for (i = 16; i<32768; i++)
	{
		
		while(j<32)
		{
			bit = bitmap[i] & (1<<i);
			
			if (bit == 0)
			{
			
				pagecontinue++;
				
				if (pagecontinue == pagenum)
					break;
				
				j++;
			}
			else
			{
 				j++;
				pagecontinue = 0;
			}
		}
		
		if (pagecontinue == pagenum)
			break;
		
		j=0;
	}
	
	if (pagecontinue != pagenum)
		return 0;

	address = ((i*32+j+1)-pagecontinue)*4096;
	
	i = (address/4096)/32;
	j = (address/4096)%32;
	
	pagecontinue = 0;
	
	while (pagecontinue < pagenum)
	{
		bitmap[i] = (bitmap[i] | (1<<j));
		j++;
		pagecontinue++;
		
		if (j==32 && pagecontinue < pagenum)
		{
			i++;
			j=0;
		}
	}
	
	return (void*)address;
}
User avatar
Cheery
Member
Member
Posts: 52
Joined: Wed Oct 18, 2006 4:39 am

Post by Cheery »

Where is the free?
Windows Vista rapes you, cuts you and pisses inside. Thought these are just nifty side-effects.
User avatar
Jeko
Member
Member
Posts: 500
Joined: Fri Mar 17, 2006 12:00 am
Location: Napoli, Italy

Post by Jeko »

I not yet implemented it... I'm working on it now.
User avatar
ces_mohab
Member
Member
Posts: 77
Joined: Wed Oct 18, 2006 3:08 am

Post by ces_mohab »

first:
it's confusing that you use malloc because malloc in the standard library allocates arbitrary memory blocks. while I think you use a page allocator.
second:
implementing free is fairly easy than malloc
third
unsigned int bitmap[32768];
why 32768 ints i.e. 32768*4*4096 i.e. 512 MB you can use a dynamic array at the end of kernel and either calculate memory size or leave this task to boot loader.
optimization is possible.
you can increase the code about 32 times :D
by using code like this:

Code: Select all

       /* Get a unit of free bit */
	for( i=0 ; i<bitmap_size ; i++ )
		if( bitmap[i] != 0xFFFFFFFF ) 
			break ;
			
	/* Bad luck! We run out of pages */
	if( i==bitmap_size )
		return NULL ;
		
	/* We found a unit of free bit! set this bit */
	for( j=0 ; j<BTMPUNTSZ ; j++ )
		if( !GETBIT(bitmap[i],j) )
		{
			SETBIT(bitmap[i],j) ;
			break ;
		}
		
	/* compute address */
	return (void*) ( (i*BTMPUNTSZ+j)*PAGE_SIZE ) ;


GOod lUck :roll:
User avatar
B.E
Member
Member
Posts: 275
Joined: Sat Oct 21, 2006 5:29 pm
Location: Brisbane Australia
Contact:

Post by B.E »

ces_mohab wrote:first:
it's confusing that you use malloc because malloc in the standard library allocates arbitrary memory blocks. while I think you use a page allocator.
second:

implementing free is fairly easy than malloc
third
first, it's he's choose on what function name to use. and also there are many ways of implementing a memory allocator.
second, you can not have a free without a malloc
ces_mohab wrote:
unsigned int bitmap[32768];
why 32768 ints i.e. 32768*4*4096 i.e. 512 MB you can use a dynamic array at the end of kernel and either calculate memory size or leave this task to boot loader.
there is nothing wrong with 32768 ints(although to be 100% correct they should be unsigned longs), you have calculated the ammount of memeory it can access wrong. it should be 32768*32*4096(32 bits in an int).
Image
Microsoft: "let everyone run after us. We'll just INNOV~1"
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: My memory manager

Post by Brendan »

Hi,
MarkOS wrote:I wrote this memory manager. Do you think it is good?
If it's a heap manager, then it's wasteful (e.g. to allocate space for 32 seperate 16 byte data structures, you'd consume 128 KB of RAM instead of 2 KB), and all that searching would be too slow.
MarkOS wrote:How can I optimize it?
If you must use a bitmap, then:

To speed up the searching, keep track of the lowest "possibly free" page, so that if the first half has already been allocated you don't keep searching it anyway. I'd also keep track of the total number of free pages, so that if there isn't enough free you don't go searching for it.

Then pre-check each dword before testing individual bits. For example:

Code: Select all

if (bimap[i] != 0xFFFFFFFF) {
     /* check bits */
}
A better idea would be to use "REP CMPSD" instead.

Lastly, consider using the BSF instruction to get rid of the inner loops.

BTW if someone does "malloc(4096)" they'd get 8192 bytes. It might be a good idea to protect against "malloc(0)" too...

It's also occured to me that it will be impossible for you to implement "free()" because you have no idea how much space was allocated. This isn't that hard to fix though - store the size of the allocated space at "address" and return "address + 4".

Taking this into account, you might want to do:

Code: Select all

unsigned int pagenum = ( (size + sizeof(void *) - 1) / 4096) + 1;
Of course this makes it as fragile as most heap managers - it's easy for common programmer's bugs to corrupt the heap management. For example:

Code: Select all

    unsigned int myArray;
    unsigned int myString;

    myArray = malloc ( 1024 * sizeof(unsigned int) );
    myString = malloc (20);
    myArray[1024] = 0xFFFFFFFF;
    free(myString);
In this case, your "free()" code could decide to free 4 GB instead of 20 bytes - lucky it's part of the C library and not running in kernel space... ;)


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
ces_mohab
Member
Member
Posts: 77
Joined: Wed Oct 18, 2006 3:08 am

Post by ces_mohab »

B.E wrote: first, it's he's choose on what function name to use. and also there are many ways of implementing a memory allocator.
let's say that choosing malloc as a name is confusing. Instead page_alloc() is more convenient, personally i use printf() to print kernel messages although it will be confusing when i support libc it's suitable to use kprintf or any thing else.
B.E wrote: there is nothing wrong with 32768 ints(although to be 100% correct they should be unsigned longs), you have calculated the amount of memory it can access wrong. it should be 32768*32*4096(32 bits in an int).
sorry you are right :-k now we use 512*8 MB i.e. 4 GB address space but using this waste memory on a machine most probably has much less memory than 4GB :?: instead a dynamic array can be used. total available memory must be known.

other trick :idea: for repeated search. if you search the bitmap 10 times consecutively don't start from 0 instead use a static variable and start search from last found address.

if you are using paging i guess then you don't need to allocate multiple pages except for DMA. this will reduce memory overhead and linked lists structures.

it's well known that shifting is much faster than multiplcation. Hardware engineers know that multiplication is a sequential operation which means multiplying a 8 bit numbers may take 8 clocks or cycles. so, use shifting as i remember takes 1 or 2 cycles. Intel now uses barrel shifters. that's absolutely true for division and modulus operations.
User avatar
Assembler
Member
Member
Posts: 30
Joined: Fri Oct 27, 2006 5:26 am
Contact:

Post by Assembler »

Hello,
ces_mohab wrote: it's confusing that you use malloc because malloc in the standard library allocates arbitrary memory blocks. while I think you use a page allocator.
Your are a bit right, its a page allocator, and should be named to something else, but he is free to choose whatever he want.
About the confusion between std C malloc, and kernel's malloc, there is no problem in naming the kernel's malloc just malloc, FreeBSD does the same.
if u got FreeBSD just type
man 3 malloc or http://www.freebsd.org/cgi/man.cgi?quer ... ormat=html
for libc malloc, and
man 9 malloc or http://www.freebsd.org/cgi/man.cgi?quer ... ormat=html
for kernel's malloc.
The only need to differentiate both malloc's exists in virtual machines VM, this leads Matt Dillon to rename dragonfly kernel's malloc() to kmalloc() & kernel's free() to kfree().
MarkOS wrote: I wrote this memory manager. Do you think it is good? How can I optimize it?
Its a very simple MM, and will pay the tax from the performance.
For the performance i've some notes:
you don't have to group the whole bitmaps in one array of longs, u can divide them into for example 3 groups called Zones, one for himem, one for normalmem, and one for dma.
linux & freebsd both uses that type of allocater called Zone Allocator which is sometimes known as Slab Allocator.
linux divides the whole pages into 3 types:
1- Highmem (896 MB - ....) (ZONE_HIGHMEM)
2- Normal (16 MB - 895) (ZONE_NORMAL)
3- DMA (0 MB - 16 MB) (ZONE_DMA)

besides we should keep track of the last free bit in each of this zones.

For FreeBSD:
The kernel memory allocator uses a hybrid strategy. Small allocations are done using a power-of-2 list strategy. Using the zone allocator, the kernel creates a set of zones with one for each power-of-two between 16 and the page size. The allocation simply requests a block of memory from the appropriate zone. Usually, the zone will have an available piece of memory that it can return. Only if every piece of memory in the zone is in use will the zone allocator have to do a full allocation. When forced to do an additional allocation, it allocates an entire page and carves it up into the appropriate-sized pieces. This strategy speeds future allocations because several pieces of memory become available as a result of the call into the allocator.
Freeing a small block also is fast. The memory is simply returned to the zone from which it came.
Because of the inefficiency of power-of-2 allocation strategies for large allocations, the allocation method for large blocks is based on allocating pieces of memory in multiples of pages. The algorithm switches to the slower but more memory-efficient strategy for allocation sizes larger than a page. This value is chosen because the power-of-2 algorithm yields sizes of 1, 2, 4, 8, . . ., n pages, whereas the large block algorithm that allocates in multiples of pages yields sizes of 1, 2, 3, 4,. . ., n pages. Thus, for allocations of greater than one page, the large block algorithm will use less than or equal to the number of pages used by the power-of-2 algorithm, so the threshold between the large and small allocators is set at one page.

Thanks alot
Systems and Computer Engineering Researcher
"Do you pine for the nice days of Minix-1.1, when men were men and wrote their own device drivers?" -- Linus Torvalds
http://sce.carleton.ca/~maslan
User avatar
JAAman
Member
Member
Posts: 879
Joined: Wed Oct 27, 2004 11:00 pm
Location: WA

Post by JAAman »

it's well known that shifting is much faster than multiplcation. Hardware engineers know that multiplication is a sequential operation which means multiplying a 8 bit numbers may take 8 clocks or cycles. so, use shifting as i remember takes 1 or 2 cycles. Intel now uses barrel shifters. that's absolutely true for division and modulus operations.
multiplication is the same speed as shifting on newer chips ... both take 1/2 cycle

on northwood, there are 4 ALUs, which operate on double-clock, and can perform any operation in a single half-clock, but only 1 can do multiplication (though the instruction translator will automatically translate mul/div by powers of two into shift instructions, iirc)

on prescott all 4 ALUs can perform mul/div in a single half-cycle (means a 3GHz Prescott can perform 24billion mul/div per second -- the same speed as shift operations)

not sure about conroe, but its prob faster (certainly not slower)
User avatar
Assembler
Member
Member
Posts: 30
Joined: Fri Oct 27, 2006 5:26 am
Contact:

Post by Assembler »

JAAman wrote:
it's well known that shifting is much faster than multiplcation. Hardware engineers know that multiplication is a sequential operation which means multiplying a 8 bit numbers may take 8 clocks or cycles. so, use shifting as i remember takes 1 or 2 cycles. Intel now uses barrel shifters. that's absolutely true for division and modulus operations.
multiplication is the same speed as shifting on newer chips ... both take 1/2 cycle

on northwood, there are 4 ALUs, which operate on double-clock, and can perform any operation in a single half-clock, but only 1 can do multiplication (though the instruction translator will automatically translate mul/div by powers of two into shift instructions, iirc)

on prescott all 4 ALUs can perform mul/div in a single half-cycle (means a 3GHz Prescott can perform 24billion mul/div per second -- the same speed as shift operations)

not sure about conroe, but its prob faster (certainly not slower)
This is not correct for old micro processors.
Here is the data i found in "The Intel Microprocessors 5th Ed" Book by "Barrey B. Brey"

Code: Select all

SAL/SAR/SHL/SHR (SHIFT)

Example : SHL reg, 1
Clocks:
8086 = 2
8088 = 2
80286 = 2
80386 = 3
80486 = 3
Pentium = 1 or 3

Example: SHL mem, 1
Clocks:
8086 = 15 + ea
8088 = 23 + ea
80286 = 7
80386 = 7
80486 = 4
Pentium = 1 or 3

Example: SHL reg, imm
Clocks:
8086 = ---
8088 = ---
80286 = 5 + n
80386 = 3
80486 = 2
Pentium = 1 or 3

MUL (Multiply)

Example: MUL reg
Clocks:
8086 = 118
8088 = 143
80286 = 21
80386 = 38
80486 = 42
Pentium = 10 or 11

Example: MUL mem
Clocks:
8086 = 139
8088 = 143
80286 = 24
80386 = 41
80486 = 42
Pentium = 11

This data indicates that for at least Petium microprocessor the "Shifts" is Faster than the "Multiplies".
Systems and Computer Engineering Researcher
"Do you pine for the nice days of Minix-1.1, when men were men and wrote their own device drivers?" -- Linus Torvalds
http://sce.carleton.ca/~maslan
Post Reply