best way to implement malloc and free?

TwixEmma · Post by **TwixEmma** » Mon Jul 22, 2013 12:23 pm

Hi, I'm working my memory management system for memory allocation and I've come up with the following idea, I'm wondering if it'll be any good or if there is a better way of implementing it.

G:/system files/ring0/kernel32/memorymap.asm

Code: Select all

label MemMapStart at 0x00000500

BootDevice              db 0
SectorCount             db 0
HeadCount               db 0
CylinderCount           dw 0
DriveCount              db 0

MemorySize              dd 0

MemoryAllocationTable:
AllocatedMemory dw 0
FreeMemory dw 0
TotalAllocatedEntries dw 0
TotalFreeEntries dw 0
;an alloc table entry is 8 bytes, 3 words and 2 bytes
;dw address start
;dw address end
;dw allocated bytes
;db system reserved
;db allocated flag (0 if unallocated, 1 if allocated)
times 2048 dq 0 ;dq = 8bytes, 8x2048=16384=16kb of Ram.

label MemMapEnd at 0x00001500

The above is what I have for my memory map so far for storing information, I intend to implement any tables etc here, such as my memory allocation table. The way this works, a program calls my equivelant of MALLOC with an amount of bytes. The kernel function will then look at the allocation table, if there are no allocated entries, it'll assign start address as 0x0 relative to where memory starts after the kernel, knowing total available memory in the system, it'll test the number of bytes, and if there are enough bytes it'll return via a register 0, if there are not enough bytes it'll return -1. If there are enough bytes, it'll save the number of bytes in the allocated bytes entry, and calculate the end address by adding the size to the start address. Then it'll finally mark the last db allocated flag entry as 1 to mark as allocated. If the function comes across entries it looks for the first unallocated entry, checks its size, if it doesn't have enough bytes, it'll keep looking, if it doesn't find one big enough, it checks if there is any entry space available and if theres enough memory to allocate and creates a new entry, with a limit of 2048 entries. if it cant allocate enough memory it allocates the largest allowed portion of memory thats available (allowed would be a constant defined in the memory map, untouchable by programs, that says a program can only allocate so many bytes in one malloc call. Finally it returns how many bytes were allocated.

I know there could be problems with memory allocation being fragmented, for example:
[memory total=64bytes max-entries=8]
entry 1 allocated 9
entry 2 allocated 10+9 = 19
entry 3 unallocated 27+19=46
entry 4 allocated 1+46=47
entry 5 unallocated 1+47=48
entry 6 allocated 1+48=49
entry 7 allocated 1+49=50
entry 8 unallocated 14+50=64

if you wanted to allocate the remaining 32bytes, the allocations would have to be reorganized, except a program can't be notified easily in this system if it's allocations change. Any ideas or information pertaining to memory management techniques and models with this much detail would be much appreciated, thanks in advance.

EDIT: I forgot to mention that my allocation idea also reclaims memory if entries become filled up, and notify apps that they should do something about it. This is for in the event of (very unlikely) a virus gaining control (doubtedly but not unforeseen), so if a virus program was run successfully (like a trojan) and it tried to allocate all available memory, the system takes priority, so if memory is over allocated it will reclaim massive memory blocks that aren't already allocated by system components and notify the owner app to close. I want security in my memory management, so any advice, tips and ideas on this will help too thanks.

serviper · Post by **serviper** » Mon Jul 22, 2013 9:18 pm

The kernel memory manager should work on a large granularity, e.g. 4KB page, perhaps in the form of a buddy system. Early in the initialization of the kernel, memory can be allocated linearly without deallocation. After that, you may implement malloc-like functions based on virtual pages from the kernel memory manager.

Well that's the Linux way...

Brendan · Post by **Brendan** » Mon Jul 22, 2013 11:36 pm

Hi,

TwixEmma wrote:Hi, I'm working my memory management system for memory allocation and I've come up with the following idea, I'm wondering if it'll be any good or if there is a better way of implementing it.

There are 3 completely different types of memory management.

The first is physical memory management. This is best done with "free page stacks" (extremely fast alloc/dealloc). The problem with free page stacks is that you can't allocate contiguous pages, which are sometimes needed by some types of device drivers (especially device drivers for old ISA devices that use the old DMA chips, where there's additional restrictions). For this reason free page stacks alone isn't enough. For example; you might have the first 16 MiB of physical RAM managed with a bitmap (to satisfy drivers with special requirements), and all of the remaining RAM managed by one or more free page stacks (for normal use). Also note that you may have N different types of pages with N different free page stacks (a simple example of this would be using one free page stack for each NUMA domain).

The second type of memory management is virtual memory management. This does things like creating and destroying virtual address spaces, handling "copy on write", handling memory mapped files, handling swap space, etc. The virtual memory manager doesn't decide anything and only really does what it's told to do. For example; if some code asks the virtual memory manager to allocate a page of normal RAM at virtual address 0x12345000; then the virtual memory manager asks the physical memory manager for a page and maps it where it was told to map it (the virtual memory manager does not decide which physical page and also does not decide where to map it). Because the virtual memory manager only really does what it's told to do (excluding determining which pages to send to swap space), it doesn't need anything for allocation/de-allocation.

The third type of memory management is things like heap (malloc/free) or objstacks or a garbage collector or whatever. These tell the virtual memory manager what to do (for example, you might call "malloc()" and it might need to call "sbrk()", and "sbrk()" might tell the virtual memory manager to allocate some more pages at the "previous top of allocated space"). This can be very different for different processes; and for that reason it should be implemented in user-space (e.g. part of a shared library or something).

It's important to understand that the heap/objstacks/garbage collector/whatever manager is running on top of virtual memory and only manages space - it has very little to do with physical RAM at all. Typically you give it a huge amount of space to manage (e.g. 3 GiB of space) and you don't need to care much about that space becoming fragmented because there's lots of it anyway.

For the kernel itself, you might implement some sort of heap, but you might implement something very different instead. For my micro-kernels I tend have a special purpose allocator for each different type of thing. For example, I might have "thread data structures" that are 1 KiB each, and set aside 256 MiB of space for a huge array of these "thread data structures", where the kernel recycles previously freed "thread data structures" if it can and lets the virtual memory manager's page fault handler allocate more RAM if/when a previously unused entry in the array is actually accessed.

Cheers,

Brendan

bluemoon · Post by **bluemoon** » Tue Jul 23, 2013 4:39 am

Brendan wrote: For the kernel itself, you might implement some sort of heap, but you might implement something very different instead. For my micro-kernels I tend have a special purpose allocator for each different type of thing. For example, I might have "thread data structures" that are 1 KiB each, and set aside 256 MiB of space for a huge array of these "thread data structures", where the kernel recycles previously freed "thread data structures" if it can and lets the virtual memory manager's page fault handler allocate more RAM if/when a previously unused entry in the array is actually accessed.

I do this in similar way, but make it general purpose. malloc work this way:

Code: Select all

void* kmalloc(size_t size) {
  size += 16 (for meta info header)
  pad size to multiples of 16-byte
  choose a free list based on size (32, 64, 512, etc)
  if free list has entry, return it
  otherwise allocate one with from heap (sbrk)
}

I also inject a prefix header into the allocation, which record the size of this piece of memory.

Since each allocation is at least 16 bytes, when free'ing it i reuse the same piece of memory to construct free list's node.

Brendan · Post by **Brendan** » Tue Jul 23, 2013 10:19 pm

Hi,

bluemoon wrote:
Brendan wrote: For the kernel itself, you might implement some sort of heap, but you might implement something very different instead. For my micro-kernels I tend have a special purpose allocator for each different type of thing. For example, I might have "thread data structures" that are 1 KiB each, and set aside 256 MiB of space for a huge array of these "thread data structures", where the kernel recycles previously freed "thread data structures" if it can and lets the virtual memory manager's page fault handler allocate more RAM if/when a previously unused entry in the array is actually accessed.
I do this in similar way, but make it general purpose. malloc work this way:
Code: Select all
void* kmalloc(size_t size) {
  size += 16 (for meta info header)
  pad size to multiples of 16-byte
  choose a free list based on size (32, 64, 512, etc)
  if free list has entry, return it
  otherwise allocate one with from heap (sbrk)
}
I also inject a prefix header into the allocation, which record the size of this piece of memory.

Since each allocation is at least 16 bytes, when free'ing it i reuse the same piece of memory to construct free list's node.

That would work too; but like all generic allocators it causes locality problems.

For example; imagine you've got a structure called "foo" that is 32 bytes, and a structure called "bar" that is also 32 bytes. With an "array of structure" approach all of the "foo" structures are packed together in the same area, so that if you're doing something to every allocated "foo" you get the least number of cache and TLB misses. With a generic allocator the same pages and the same cache lines will hold a mixture of both "foo" and "bar" structures, and if you're doing something to every allocated "foo" you get more cache and TLB misses.

The other thing is indexing. With "array of structure" you naturally end up with IDs; with extremely fast ID validity checks (e.g. "if( (threadID > highestAllocatedThreadID) || (array_of_thread_data[threadID].state == UNUSED) ) {") and extremely fast lookup (e.g. "name = array_of_thread_data[threadID].name"). With normal heap you can't do this and have to invent something worse (e.g. construct and manage a binary tree, and search the tree for the ID while causing potential cache/TLB misses at each entry you check, even when the ID isn't valid).

Of course the "array of structure" approach does have disadvantages - you have to set a fixed limit for the number of entries. For example; if you use a 256 MiB area to store an array of 1 KiB "thread data" structures then you'll only be able to support a maximum of 262144 threads, even if there's plenty of space that isn't being used for other things. For a micro-kernel (where there isn't that many things being stored in kernel space to begin with) it can work out fine (e.g. you'd run out of RAM before you hit any of the limits); but I wouldn't recommend the "array of structure" approach for a monolithic kernel (where there's a lot more data being stored in the same amount of "kernel space" and you can't really afford to reserve space for things that aren't currently being used), especially for 32-bit machines.

Cheers,

Brendan

bluemoon · Post by **bluemoon** » Wed Jul 24, 2013 4:18 am

Thanks, good explanation.

bemk · Post by **bemk** » Wed Jul 24, 2013 6:46 am

@Brendan: Why does your solution sound so much like a less flexible form of slab allocation? XD

Brendan · Post by **Brendan** » Wed Jul 24, 2013 9:39 am

Hi,

bemk wrote:@Brendan: Why does your solution sound so much like a less flexible form of slab allocation? XD

I don't know.

I guess it sounds like slab allocation to you because you don't really understand how my way works and/or you have very little experience/knowledge of anything other than slab allocators. For example, if you spent time figuring out why objstacks exist and how they work then you'd think my method sounds like objstacks; and if you spent time writing code that doesn't use any memory management at all, then you'd think my method sounds like "no memory management at all".

Note: To me, it's sounds exactly like "no memory management at all", where all data is statically allocated (e.g. huge arrays in the ".bss" section), and where the virtual memory manager plays tricks so that these huge statically allocated arrays don't cost actual RAM.

Cheers,

Brendan

Jezze · Post by **Jezze** » Wed Jul 24, 2013 10:11 am

I didnt know there was a name for that approach but I use obstacks I suppose. The reason is they make instant lookup. The bad thing is that there is a limit to how many you can use but in most cases you know it is highly unlikely youd need more than say ten of one type or twenty of another. I feel often that solutions are too general purpose where they allow you to allocate and deallocate as many as you like when in a typical scenario there will never be more than x.

Cognition · Post by **Cognition** » Wed Jul 24, 2013 10:56 am

Never heard the term objstacks. It's basically just an object cache though, they don't need to be massive either I use smaller caches within a region (64k) of space and a threadcache per logical processor for heap allocations under 16k. If you need larger caches you can always call the cache allocator directly as well. Works well and lowers contention without upping memory usage by too much.

bluemoon · Post by **bluemoon** » Wed Jul 24, 2013 11:07 am

Cognition wrote:Works well and lowers contention without upping memory usage by too much.

I suppose Brendan meant a reserve of logical address, he could implement allocate on demand, and release physical page if possible - and these are things only possible in (and carefully crafted for) kernel development, and beyond general data structure design for normal application.

Brendan · Post by **Brendan** » Wed Jul 24, 2013 1:22 pm

Hi,

bluemoon wrote:
Cognition wrote:Works well and lowers contention without upping memory usage by too much.
I suppose Brendan meant a reserve of logical address, he could implement allocate on demand, and release physical page if possible - and these are things only possible in (and carefully crafted for) kernel development, and beyond general data structure design for normal application.

I use allocate on demand. In case there's anyone not familiar with the idea; you have a physical page full of zeros and map that same physical page as "read only" everywhere. When something attempts to write to the read only page full of zeros it causes a page fault, and your page fault handler allocates a new physical page (also full of zeros) and maps the new page as "read write" and returns. In this way you end up with a large area that looks like it's full of zeros and looks like it can be written to, but it only uses 4 KiB of RAM until/unless something is modified. As more pages in the area are modified more pages of RAM are allocated.

Note: I'm simplifying a little - you can also have a page table where every page table entry points to your "read only page full of zeros" and map that same page table everywhere to avoid consuming lots of physical pages of RAM for page tables. In a similar way, (for 64-bit systems) you can do the same trick with page directories and PDPTs; and you can have (e.g.) a 100 TiB area full of zeros that only actually costs 16 KiB of RAM for all pages, page tables, page directories, etc (until/unless it's modified).

With nothing to free previously allocated pages when they're no longer being used, it would be a memory leak. For my kernels, when I'm running out of RAM I ask different pieces of the kernel to check for (and free) any of these "no longer needed" pages. This means that if there's a lot of RAM that isn't being used the RAM stays allocated (which avoids the overhead of freeing and then re-allocating if/when it's needed again), and when there isn't much RAM left it can be freed quickly.

It's not too hard to make it work in user-space. Allocation on demand would work the same (and you'd want it for a process' normal ".bss" section anyway). For freeing pages you could have a kernel API function that pretends to quickly fill an area with zeros (which would actually free any pages and map the "read only page full of zeros" in their place without filling anything with zeros). The kernel can also ask the process to free (or "fill with zeros") any spare pages it no longer needs when it's running out of free RAM (e.g. kernel can send a message or signal or something to the process). If the process doesn't do anything to free its "no longer being used" pages, then it'd be no worse than a process using "malloc()" and not bothering to "free()", and you'd end up swapping pages to disk just the same.

You can even have a hybrid system without any real problem; with a few areas used for large "allocate on demand" arrays, and the remainder of user space managed with a traditional malloc()/free() heap.

The only real problems are that it wouldn't work for "generic/portable/POSIX/C" software; and normal compilers/linkers might not like the idea of having several 1234 GiB arrays in your 64-bit ELF executable. Of course neither of these things are a problem for me.

Cheers,

Brendan

rdos · Post by **rdos** » Wed Jul 24, 2013 3:13 pm

I use two different types of memory allocators in kernel space. One allocates linear pages in a pool of kernel pages that is reserved in all address spaces. The physical page will only be allocated when something accesses the allocated page(s). I also have a byte-granular allocator (similar to malloc/free) that uses another linear address pool that contains 16-byte block descriptors per allocated object. All kernel memory is global. The memory allocation functions comes in two versions: Return a selector, or return a linear address. Most uses are for selectors. Linear address allocation practically only is page aligned.

In user space, there are malloc/free implementations that are part of the C library, which uses per-process kernel allocators. These are also available either as page aligned or as byte aligned allocations, although the flat C implementation only uses the page aligned version, and sub-divides it into smaller chunks in the C library. I have no sbrk.

Physical memory is managed with bitmaps arranged in a 2-dimensional matrix. This allows easy continuous allocation, and also lock-free allocation of physical memory.

bemk · Post by **bemk** » Wed Jul 24, 2013 3:26 pm

Brendan wrote:Hi,

bemk wrote:@Brendan: Why does your solution sound so much like a less flexible form of slab allocation? XD
I don't know.

I guess it sounds like slab allocation to you because you don't really understand how my way works and/or you have very little experience/knowledge of anything other than slab allocators. For example, if you spent time figuring out why objstacks exist and how they work then you'd think my method sounds like objstacks; and if you spent time writing code that doesn't use any memory management at all, then you'd think my method sounds like "no memory management at all".

Note: To me, it's sounds exactly like "no memory management at all", where all data is statically allocated (e.g. huge arrays in the ".bss" section), and where the virtual memory manager plays tricks so that these huge statically allocated arrays don't cost actual RAM.

Cheers,

Brendan

Well, mostly, in my implementation, slabs are just arrays of objects, with the number of elements made to fit the slab size. Group them together in a cache, and give that cache to some sub system to allocate its structures with. That's where I got the idea of similarity from.

Main difference, to me, seems to be that you allocate your structures at compile time, while I allocate them at run time.

h0bby1 · Post by **h0bby1** » Sat Aug 31, 2013 5:05 am

i don't use paging, i use something similar to a free space stack, at first i intialize a memory area, with a start address and a size, and a list free zone, that i initialize with a single 'free zone' of the total size of the area

when allocation is requested, i parse the list of free zone to find one that is big enough, and i increment the start of the free zone by the size of the zone to be allocated and decrement the size

when the free is requested, i parse free zone to find if one is contigous with the memory area to be freed, if one contigous free zone is found, the zone to be freed is merged with it, if no contigous area is found, the free zone is pushed into free zone stack ready to be used again

but internally, i don't just handle pointer, but i have a table of structure that contain the actual pointer, the size of the zone, the memory area id in which it's allocated, and a reference counter, and it only output a reference to this memory zone structure, actually the structure passed by the application just contain a pointer to this structure, but the type is opaque to the system, and then you have to use some function to get the actual memory pointer and the size of the zone

normally with this system, there is no direct call to the 'free' method, and there is no pointer ownership at all anywhere, as the memory allocation just output reference to the zone, each time a program don't need to use the memory zone, it release its reference to it with a function, and it's freed automically when the reference count is zero, and there are function to copy and manage those 'memory zone reference', and the allocation system only initialize a memory zone reference that contain an opaque pointer to the internal zone definition structure, and only internal function can access and manipulate the data in the structure

normally with this system, it can use easily dynamic relocation without pointer ownership, in sort that the area can be resized/moved without having to notify any other function that use the pointer, as normally direct pointer to the memory zone are not supposed to be stored internally, but fetched from the memory zone reference each time an access to the memory is needed

and like this also it is very easy to make boundary check before to access an ofset from a memory zone, if loops are managed correctly, a function can return a pointer to the memory zone + ofset , and check if the boundary is ok, and i can easily trace when out of boundary access are attempted, and it can avoid also to have to deal with 'junk pointers' who could contain invalid value for some reason, it is very easy to check if the memory access is valid or not, it might add little over head to check everything at each access, but not even that much

i still implemented simple free/malloc as well, it just align the memory to the next 16 byte boundary, store the pointer to the internal memory structure at the aligned beginning of the memory zone and return a pointer to where the useable zone start, and i use the zone structure stored at ptr-16 to free it, but i only implemented them for C compatibility for external libraries

OSDev.org

best way to implement malloc and free?

best way to implement malloc and free?

Re: best way to implement malloc and free?

Re: best way to implement malloc and free?

Re: best way to implement malloc and free?

Re: best way to implement malloc and free?

Re: best way to implement malloc and free?

Re: best way to implement malloc and free?

Re: best way to implement malloc and free?

Re: best way to implement malloc and free?

Re: best way to implement malloc and free?

Re: best way to implement malloc and free?

Re: best way to implement malloc and free?

Re: best way to implement malloc and free?

Re: best way to implement malloc and free?

Re: best way to implement malloc and free?