(Answered)Heap/Pool Dynamic Memory Allocation Implementation

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
User avatar
Octacone
Member
Member
Posts: 1138
Joined: Fri Aug 07, 2015 6:13 am

Re: Heap/Pool Dynamic Memory Allocation Implementation

Post by Octacone »

LiamT wrote:
Octacone wrote:
LiamT wrote:...

I recommend figuring out which part of your kernel you're trying to implement, and then trying to solve the problem in user space first. It's a lot easier to debug an allocator in userland than in a custom kernel, especially if you haven't set up proper kernel debugging yet.
I already have both PMM and VMM implemented and working. That is not what I am talking about. Specifically I am talking about Malloc/Free/Realloc/Calloc functions and their implementation. I am not talking about user space. I am trying to implement a dynamic memory/heap/pool/whatever allocator for my kernel. I am not referring to user space. There is a long way until I can even think of user space. What I need is malloc so my kernel can say hey memory manager give me 27 bytes of memory so I can use it for this specific struct. The thing I am writing is meant for the entire OS global heap/pool/whatever you call it allocator that both kernel and user space can use. The point of this topic turned into how to store memory data sort of. Well, it doesn't matter anymore. I decided to go for an easier solution until I "grow up" (abstraction-ally) so I can do it by myself the "best" way possible. Thanks for spending time trying to explain it a bit more.
Sure. What I mean by trying it first in user space though is to do it in a regular program on an already fully functioning OS, like wherever you write your kernel code. You might try writing a simple application, but replacing your default c library allocator with the one you write. The use case is almost exactly the same from an API point of view, and you will get a much better environment to debug your work than a homebrew kernel.

In any case, the wiki has a pretty good page on hooking in an existing discrete allocator. http://wiki.osdev.org/Memory_Allocation. I chose to do this in my kernel using LibAlloc.
(In my case, the hook functions are a tad more complex because I switch the lock that the allocator uses at a well-defined time in kernel setup.) This file is where I define the allocator hooks into the kernel.
Ohhh! I see what you meant. I thought you were talking about my OS and not the preexisting one. :wink: That is actually not a bad idea at all, but will Windows/Linux let me poke around with their memory?
LibAlloc seems really neat and compact. I am seriously thinking of porting it right now. What do you think of it, how well does it behave. Did you have any memory leaks / obvious problem? I do not have any kind of synchronization primitives implemented, yet, I hope that won't be a problem. Also can I use version 1_1, is that a newer version? Is there a way to fake them (lib locks)? Is it okay if I make it C++ compatible? Also I don't see a hook for virtual memory mapping, which means every access will result in a page fault.
Last edited by Octacone on Thu Jun 22, 2017 2:43 pm, edited 3 times in total.
OS: Basic OS
About: 32 Bit Monolithic Kernel Written in C++ and Assembly, Custom FAT 32 Bootloader
LtG
Member
Member
Posts: 384
Joined: Thu Aug 13, 2015 4:57 pm

Re: Heap/Pool Dynamic Memory Allocation Implementation

Post by LtG »

Octacone wrote:Ohhh! I see what you meant. I thought you were talking about my OS and not the preexisting one. :wink: That is actually not a bad idea at all, but will Windows/Linux let me poke around with their memory?
Probably not, but it doesn't matter. You can request some memory from the OS (using malloc/new), for example let's say 100MiB, then you pretend that is the memory and you "parcel" it out using your _own_ malloc/new. That way you can test your malloc/new implementation within an existing OS where you have decent debugging tools as well.

Note, while it's a bit more upfront effort, I like to make everything testable and usable without having to ever start a VM/emulator. Though I don't do it for the boostrap as that has its own issues and I decided it's not worth it (for now). This has the benefit that I can "fully" test everything in isolation and everything is tested at build time (TDD) and I can more easily fake "buggy" hardware for my drivers and make sure my software works correctly. As said, it takes a bit more effort upfront but gives better reliability (IMO) and saves in debugging a lot.
User avatar
Octacone
Member
Member
Posts: 1138
Joined: Fri Aug 07, 2015 6:13 am

Re: Heap/Pool Dynamic Memory Allocation Implementation

Post by Octacone »

LtG wrote:
Octacone wrote:Ohhh! I see what you meant. I thought you were talking about my OS and not the preexisting one. :wink: That is actually not a bad idea at all, but will Windows/Linux let me poke around with their memory?
Probably not, but it doesn't matter. You can request some memory from the OS (using malloc/new), for example let's say 100MiB, then you pretend that is the memory and you "parcel" it out using your _own_ malloc/new. That way you can test your malloc/new implementation within an existing OS where you have decent debugging tools as well.

Note, while it's a bit more upfront effort, I like to make everything testable and usable without having to ever start a VM/emulator. Though I don't do it for the boostrap as that has its own issues and I decided it's not worth it (for now). This has the benefit that I can "fully" test everything in isolation and everything is tested at build time (TDD) and I can more easily fake "buggy" hardware for my drivers and make sure my software works correctly. As said, it takes a bit more effort upfront but gives better reliability (IMO) and saves in debugging a lot.
That is what I thought, it doesn't matter how big it is, what matters is how it is managed. Modern OSes have really good debugging capabilities so why not to use them. I will definitively do that at some point, probably when I decide to read up on those algorithms.
OS: Basic OS
About: 32 Bit Monolithic Kernel Written in C++ and Assembly, Custom FAT 32 Bootloader
User avatar
LiamT
Posts: 9
Joined: Sun Jul 12, 2015 2:21 am
Contact:

Re: Heap/Pool Dynamic Memory Allocation Implementation

Post by LiamT »

Octacone wrote:
LiamT wrote: Sure. What I mean by trying it first in user space though is to do it in a regular program on an already fully functioning OS, like wherever you write your kernel code. You might try writing a simple application, but replacing your default c library allocator with the one you write. The use case is almost exactly the same from an API point of view, and you will get a much better environment to debug your work than a homebrew kernel.
Ohhh! I see what you meant. I thought you were talking about my OS and not the preexisting one. :wink: That is actually not a bad idea at all, but will Windows/Linux let me poke around with their memory?
LibAlloc seems really neat and compact. I am seriously thinking of porting it right now. What do you think of it, how well does it behave. Did you have any memory leaks / obvious problem? I do not have any kind of synchronization primitives implemented, yet, I hope that won't be a problem. Also can I use version 1_1, is that a newer version? Is there a way to fake them (lib locks)? Is it okay if I make it C++ compatible? Also I don't see a hook for virtual memory mapping, which means every access will result in a page fault.
Oh yeah, it's perfectly possible not to use the default allocator in a linux program. That being said, there's nothing wrong with just using malloc/free to reserve a large chunk and then working with that to write your allocator :)
From what i've seen, LibAlloc seems to perform pretty well, at least for my use case. I try and keep kernel dynamic allocation to a minimum usually, since it's a lot harder/riskier to rely on paging kernel memory than user-space memory, but I haven't noticed any leaks or problems to be sure.

As for the synchronization primitives, it's perfectly fine to just leave the hook functions empty for now, so long as you aren't going to be using the allocator from more than one thread. (I would assume this is true anyway if you don't yet have sync primitives). Just make sure to return 0 from them, so liballoc doesn't think some kind of error happened :wink:

As for where the virtual memory happens in my hook functions, I assume you mean my call to mm_pmalloc. In my kernel, that's the mapped page allocator, which automatically maps and allocates virtual pages in kernel space. (Basically it's like malloc, but returns chunks of n*PAGE_SIZE),
User avatar
dchapiesky
Member
Member
Posts: 204
Joined: Sun Dec 25, 2016 1:54 am
Libera.chat IRC: dchapiesky

Re: Heap/Pool Dynamic Memory Allocation Implementation

Post by dchapiesky »

Octacone wrote: I will definitely not give up. I really want to learn about different algorithms because they are very applicable and usable. What is the full title of that book? Is it worth buying it? Right now I am not capable enough to understand such a deep concept, but in the future as my knowledge grows and expands I think I will be able to understand such abstract concepts.

This is serious advice....

Go to Wikipedia

type in whatever you are keen on learning about...

enjoy the content shown...

BUT THEN...

follow the external references, the PDF's, the research papers, the example code... all listed at the bottom...

If you are low on cash and need to learn - wikipedia is your friend.... the cross reference at the very bottom helps as well...

so for starters....

AVL Tree - https://en.wikipedia.org/wiki/AVL_tree

and from the bottom grouping you see it is part of

https://en.wikipedia.org/wiki/Tree_(data_structure)

For learning about algorithms on the cheap cheap... this is invaluable...

good luck

Daniel
Plagiarize. Plagiarize. Let not one line escape thine eyes...
User avatar
Schol-R-LEA
Member
Member
Posts: 1925
Joined: Fri Oct 27, 2006 9:42 am
Location: Athens, GA, USA

Re: Heap/Pool Dynamic Memory Allocation Implementation

Post by Schol-R-LEA »

Just to reiterate this point: OS dev, the forum or the topic as a whole, isn't a suitable venue for learning basic software techniques. You may need to take a step back for a bit before proceeding.

As for the textbook @Zaval mentioned, I assume he means Introduction to Algorithms by Cormen, Leiserson, Rivest, and Stein (and yes, that's the same Ron Rivest of RSA encryption fame). It is one of the classics of the field, and while it is rather... dense, and not especially approachable, it is still a pretty good introductory and reference work which is used in many, many universities for their courses on algorithms and data structures.

There are many other books on the topic as well, though you many need to fiddle with a few different ones to find which is most suitable to how you learn. If you have access to a college or university library - or even just a larger public library - you should be able to find a few. A search on Amazon for 'algorithm' or 'data structures' ought to find more books on the topic than any one person could hope to read.

Wikipedia does have extensive material on data structures, including trees of many different kinds, but you need to take into consideration that a) anything that can be a topic of disagreement (i.e., anything at all) may be subject to dispute, edit wars, inconsistent editing, defacement, incorrect information, deliberate disinformation, and flat out lies, and b) it is best used as a starting point, not an end point. While it is more likely to be reliable on a subject such as this, you need to follow up on anything said there to make sure they got it somewhere in the vicinity of correct.

While I would not recommend trying to learn the data structures or algorithms from YouTube videos, in this case they are useful as supplementary material. They would be useful primarily in helping to visualize the dynamic structures and their performance; however, you will want to take anything they say about specific implementation methods with a grain of... no, on second thought, a whole salt mine's worth of salt.

I would read the wiki articles first, then dig deeper with the books, and only turn to the videos if you are having trouble imagining how they work.
Rev. First Speaker Schol-R-LEA;2 LCF ELF JAM POEE KoR KCO PPWMTF
Ordo OS Project
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.
simeonz
Member
Member
Posts: 360
Joined: Fri Aug 19, 2016 10:28 pm

Re: Heap/Pool Dynamic Memory Allocation Implementation

Post by simeonz »

I agree with the other posters that deeper understanding of basic data structures and algorithms is required to tackle and appreciate the issues of dynamic memory allocation. Yet, this particular topic is poorly covered in the standard theoretical books, focusing on linear time best-fit, first-fit, etc. Those are used here and there, but are impractical most of the time. For future reference, here is a very congested overview.

Allocators need to do essentially two things.

1. To find appropriate chunks of free memory that will fit a particular requested size
2. To locate chunks around a given address that is freed and to coalesce (join) them into a bigger chunk

Those tasks can be solved in a number of ways and need to synergize. The first can be addressed with a simple array index. That is - the free chunks are placed into linked lists the heads of which are stored in an array. The indexes correspond to size partitions. Every chunk is put into a list that matches its size.

For example, free chunks of size 16 are chained in the list whose head resides in index 0, size 32 is put in index 1, followed by 48, 64, 80, 96, 112, ect. Or in another scheme, size 16 is in index 0, followed by 32, 48, 64, 96, 128, 192, 256, etc. The first partitioning is linear, the second is roughly exponential. The upper limit is around 4K-32K, because the page allocator is more suitable for big allocations. Note that this means that the array index is bounded to a reasonable limit. Edit: The schemes mentioned also provide easy means of computing the index corresponding to a given size.

The coalescing issue is a more complex one, not so much algorithmically, but in terms of tradeoffs involved. The simplest solution in my opinion (not necessarily the most effective) is to place all the chunks, free and occupied, into a doubly linked list, sorted according to the chunk address. The sorting is a misnomer, because the items in the list are added or removed only by splitting and joining other items, thus seeking position for insertion/removal is never an issue. The links for a chunk are usually stored right before it. When an address is freed, the status of its neighbours is checked (indicated by flags or such), and those neighbours are joined with the newly vacated memory if possible. Essentially, with this method, the memory is sliced into portions that are linked together with their neighbors. As an optimization, the links can be made more compact by observing that they can be encoded using the chunk sizes.

This is not the only way to deal with coalescing. The use of memory slabs is another option, and is considered superior by many. (Though the arguments are convoluted and workload dependent.) With this approach entire pages are dedicated to chunks of a given size. This eliminates the need for a global linked list that connects all memory chunks, because every chunk is adjacent to another chunk of the same size, and the page is essentially an array of such chunks. The size can be stored in a header that is pointed by the corresponding entry in the kernel page database or can be specified in the start/end of the page. Thus, the size of each allocation can be extracted by first computing the address of the page that encloses it and then looking up in the relevant place, depending on the method used.

There are other issues, such as alignment, non-blocking methods, per-cpu caches, tiered allocators, various security mitigations (checks for corruption and overflow), etc. And please, note that the above exposition is not all-inclusive by any means.

Again, I agree that anyone would benefit by studying basic data structures first, before dealing with allocators.
User avatar
bzt
Member
Member
Posts: 1584
Joined: Thu Oct 13, 2016 4:55 pm
Contact:

Re: Heap/Pool Dynamic Memory Allocation Implementation

Post by bzt »

Octacone wrote:I already have both PMM and VMM implemented and working. That is not what I am talking about. Specifically I am talking about Malloc/Free/Realloc/Calloc functions and their implementation. I am not talking about user space. I am trying to implement a dynamic memory/heap/pool/whatever allocator for my kernel. I am not referring to user space. There is a long way until I can even think of user space. What I need is malloc so my kernel can say hey memory manager give me 27 bytes of memory so I can use it for this specific struct.
I'm not sure I get what you mean. For one, malloc/free/realloc/calloc are libc functions, and as such 100% userspace. You can create a library that works for both kernel and applications, but you really don't want to do that. Kernel usually has different needs. I mean kernel usually allocates a lot at boottime, and maybe frees and allocs later sometimes. It also has only a few memory sizes it allocates (I mean required memory is always a multiple of a kernel struct, and rounding it to 16 bytes narrows even more down the number of allocation units). An application on the other hand uses any arbitrary allocation size, in any order of alloc and free.

So I would recommend you to use a different allocator in your kernel, which is more simplistic but takes advantage of knowing memory allocation characteristics. If you already have a PMM it's easy. Example: you want to allocate a struct for let's say device driver. You shouldn't use any arbitrary address for that, but the next slot in your struct device_driver_t *drivers array. You know what I mean? It's enough to allocate a page when that array size passed page boundary (by checking an integer) and free the last page when it shrinks.

Code: Select all

if ( ((numitems * sizeof(item)) >> 12) != ((numitems+1) * sizeof(item)) >> 12)) {
   pmm_alloc((numitems+1) * sizeof(item) + &items);
}
if ( ((numitems * sizeof(item)) >> 12) != ((numitems-1) * sizeof(item)) >> 12)) {
   pmm_free(numitems * sizeof(item) + &items);
}
That's because an efficient implementation would end up with a

Code: Select all

drivers = realloc(drivers, (numdriver+1) * sizeof(device_driver_t));
anyway, there's no real need to implement

Code: Select all

driver_ptrs = realloc(driver_ptrs, (numdriver+1) * sizeof(void*));
driver_ptrs[numdriver] = malloc(sizeof(device_driver_t));
Which would mean double call overhead and more memory wasted by fragmentation. I want to say that your kernel should never say hey memory manager give me 27 bytes of memory, rather it should say hey memory manager give me 32 bytes more to my driver struct array.
User avatar
zaval
Member
Member
Posts: 659
Joined: Fri Feb 17, 2017 4:01 pm
Location: Ukraine, Bachmut
Contact:

Re: Heap/Pool Dynamic Memory Allocation Implementation

Post by zaval »

Octacone wrote: I will definitely not give up. I really want to learn about different algorithms because they are very applicable and usable. What is the full title of that book? Is it worth buying it? Right now I am not capable enough to understand such a deep concept, but in the future as my knowledge grows and expands I think I will be able to understand such abstract concepts.
i missed this post. yes, that is the book, Schol-R-LEA mentions.
I have it (the second edition, the link below shows third), and I think it's well worth buying. As I've said, there the trees are explained very well (with B-trees includingly, but I didn't read about those yet, honestly :mrgreen: ). But not only trees it's huge actually.
Schol-R-LEA wrote: As for the textbook @Zaval mentioned, I assume he means Introduction to Algorithms by Cormen, Leiserson, Rivest, and Stein (and yes, that's the same Ron Rivest of RSA encryption fame). It is one of the classics of the field, and while it is rather... dense, and not especially approachable, it is still a pretty good introductory and reference work which is used in many, many universities for their courses on algorithms and data structures.

There are many other books on the topic as well, though you many need to fiddle with a few different ones to find which is most suitable to how you learn. If you have access to a college or university library - or even just a larger public library - you should be able to find a few. A search on Amazon for 'algorithm' or 'data structures' ought to find more books on the topic than any one person could hope to read.
PS. As an addiotional bonus, you can work out with this book. :lol:
ANT - NT-like OS for x64 and arm64.
efify - UEFI for a couple of boards (mips and arm). suspended due to lost of all the target park boards (russians destroyed our town).
User avatar
Geri
Member
Member
Posts: 442
Joined: Sun Jul 14, 2013 6:01 pm

Re: Heap/Pool Dynamic Memory Allocation Implementation

Post by Geri »

i use magic stamp (for valid or freed chunks), process id stamp, and size then an extra few byte at end to avoid memory corruption with multithreaded access. once if i run out from the ram, i restart the walking. this gives me a few 100k alloc+free / second (sized 1-100 byte). my memory management algo is like 200-400 lines so i didnt overcomplicated it, i directly implemented malloc/free/realloc/comalloc/cofree in the kernel, so i dont have to bother about multiple layers of memory management (but loosing a lot of speed as serving this would needs the kernel to sloth up).
Operating system for SUBLEQ cpu architecture:
http://users.atw.hu/gerigeri/DawnOS/download.html
User avatar
Octacone
Member
Member
Posts: 1138
Joined: Fri Aug 07, 2015 6:13 am

Re: Heap/Pool Dynamic Memory Allocation Implementation

Post by Octacone »

Geri zaval bzt simeonz Schol-R-LEA dchapiesky LiamT

Wow, so many replies.
Thanks for all the book suggestions! I will definitely have this post bookmarked, when the time comes for me to buy/read them. As I said earlier I am not quite ready for them, yet. They require a deeper understanding of UNDEFINED(something). :)

I don't really think I like the idea of wasting 5 (32 - 27) bytes, why would I do that? Using PMM for kernel memory needs would be very inefficient. If I need 4097 bytes that means I need 2 pages which means 4095 bytes wasted for nothing. That is why I need dynamic memory manager pool heap. /// Edit: @bzt I like your second example, it is roughly what I want. I managed to understand what you wanted to say. :=) \\ So my plan is to have an allocator that would be used for both kernel and user space allocations. I don't really want to screw around with two different sets of allocators. Right now I am having some rather stupid issues. First of all, I successfully ported LibAlloc, then I tried to use it and I got a page fault. Later on I discovered that the faulting address was 0x400000 -> location of my PMM bitmap -> that wasn't identity mapped of course -> then I realized I couldn't just map 0x400000 -> because it is 0x400000 + (Total_Blocks * 4) / 32 bytes in length -> so I identity mapped everything from 0x400000 to 0x40FFEF (size of the memory bitmap in bytes is 0xFFEF) -> then I also PMM reserved an area from 0x400000 to 0x40FFEF so it couldn't be used for anything else -> then I fixed it, sort of -> only allocating 1 block worked and when I tried to allocate >1 block, it failed -> that is what I am trying to fix right now. It is really hard to take care of all the addresses that have to be identity mapped and reserved and that are used for kernel structures. This is so exhausting. :roll:
Also I've read all the other useful comment, that I will study more deeply, once I start working on my own allocator (early 2018 hopefully). Thanks for putting your effort into writing them.
Last edited by Octacone on Mon Jun 26, 2017 3:49 pm, edited 2 times in total.
OS: Basic OS
About: 32 Bit Monolithic Kernel Written in C++ and Assembly, Custom FAT 32 Bootloader
LtG
Member
Member
Posts: 384
Joined: Thu Aug 13, 2015 4:57 pm

Re: Heap/Pool Dynamic Memory Allocation Implementation

Post by LtG »

Octacone wrote: Also I don't really like the idea of wasting 5 (32 - 27) bytes, why would I do that?
It depends, in some cases wasting a few bytes improves performance drastically. For instance assuming 64B cache lines (safe assumption on current x86, both 32 and 64-bit), and assuming you do the _sane_ thing and align the 32B allocations on 32B boundary guarantees that whenever you need to access any byte of that 32B structure, all the other bytes are also in cache, if you use 27B then you essentially guarantee that _some_ of those 27B structures occupy two separate cache lines and you get worse performance.

Assuming for instance that this was a structure used for each process and there's just one for each process, and assuming Windows like environment where there's usually around 100 processes you'll waste a total of 100x5B = 500B, so who cares? Even if it gives you just 1% performance increase it's worth it. If this is something you need millions of then you may not want to waste 5MiB of RAM..

The choice is yours.
Octacone wrote: Using PMM for kernel memory needs would be very inefficient. If I need 4097 bytes that means I need 2 pages which means 4095 bytes wasted for nothing. That is why I need dynamic memory manager pool heap.
You use PMM for both kernel and user memory, and in both cases the VMM talks to the PMM and maps the memory where it's needed. However you may want to have have either two VMM's (one for kernel and one for user) or at least have two malloc's (kmalloc for kernel and malloc for user).

Normally the code uses malloc, which uses VMM, which uses PMM.

In most cases the kernel only uses a few different types of structs, suppose you have 10 different structs and after rounding them off to closest reasonable size (ie. 32B or 64B, etc) you might end up with 1-5 different size of structs, at that point your kmalloc might have 5 pools, each of which get a new 4KiB page (from VMM) when they run out of memory and parceling the memory becomes very simple because there's only one size to worry about for each pool.

The point here is to keep the kmalloc simple and the point is also to take advantage of the fact that you _know_ the different allocations that you are _ever_ going to need for your kernel, so you can "special case" it for the kmalloc, the same doesn't apply for user malloc which needs to be more generic and that means less efficient and more complex and thus more likely to contain bugs. No point to mix malloc and kmalloc when they are so different, even if they have almost identical responsibilities.

Note: I said _ever_ above, of course a newer version might have other memory needs, but with said newer kernel version you of course ship newer kmalloc and thus it works out.
Octacone wrote: So my plan is to have an allocator that would be used for both kernel and user space allocations. I don't really want to screw around with two different sets of allocators.
As I said above, you should consider having separate kmalloc, it makes things simpler and better.
Octacone wrote: Right now I am having some rather stupid issues. First of all, I successfully ported LibAlloc, then I tried to use it and I got a page fault. Later on I discovered that the faulting address was 0x400000 -> location of my PMM bitmap -> that wasn't identity mapped of course -> then I realized I couldn't just map 0x400000 -> because it is 0x400000 + (Total_Blocks * 4) / 32 bytes in length -> so I identity mapped everything from 0x400000 to 0x40FFEF (size of the memory bitmap in bytes is 0xFFEF) -> then I also PMM reserved an area from 0x400000 to 0x40FFEF so it couldn't be used for anything else -> then I fixed it, sort of -> only allocating 1 block worked and when I tried to allocate >1 block, it failed -> that is what I am trying to fix right now. It is really hard to take care of all the addresses that have to be identity mapped and reserved and that are used for kernel structures. This is so exhausting. :roll:
Also I've read all the other useful comment, that I will study more deeply, once I start working on my own allocator (early 2018 hopefully). Thanks for putting your effort into writing them.
Personally I would avoid porting any allocator until you have finished your own. Make your own allocator, make it slow and inefficient but do it right. After that's done you better understand everything and porting at that point a more efficient allocator is easier because you understand the topic better.

Also I would keep things really simple and separate everything to the smallest possible pieces, don't worry about performance at all, you can worry about it later. I would also avoid identity mapping unless you can actually use it exclusively.

So the PMM becomes extremely simple, it only has two interface functions, alloc and free, which only ever give (or take back) 4KiB physical pages. At that point the PMM is dead simple. The VMM doesn't care what you want to allocate of where, you have to tell it, so it only does what it's told (it's kmalloc/malloc's problem to figure out where they want their allocations), the VMM only does sanity checks (userland can't allocate on top of kernel space, etc), which makes the VMM also quite simple. Then comes the kmalloc, if it only supports a couple of different sizes then it's quite simple and the most complex is the userland malloc, but that's in userland and thus can't crash the system and each app/runtime can have their own for all I/you care. The userland malloc should not be something that is enforced by the system anyway..

Pretty long post, hope it makes sense and is useful =)
User avatar
Octacone
Member
Member
Posts: 1138
Joined: Fri Aug 07, 2015 6:13 am

Re: Heap/Pool Dynamic Memory Allocation Implementation

Post by Octacone »

LtG wrote:
Octacone wrote: Also I don't really like the idea of wasting 5 (32 - 27) bytes, why would I do that?
It depends, in some cases wasting a few bytes improves performance drastically. For instance assuming 64B cache lines (safe assumption on current x86, both 32 and 64-bit), and assuming you do the _sane_ thing and align the 32B allocations on 32B boundary guarantees that whenever you need to access any byte of that 32B structure, all the other bytes are also in cache, if you use 27B then you essentially guarantee that _some_ of those 27B structures occupy two separate cache lines and you get worse performance.

Assuming for instance that this was a structure used for each process and there's just one for each process, and assuming Windows like environment where there's usually around 100 processes you'll waste a total of 100x5B = 500B, so who cares? Even if it gives you just 1% performance increase it's worth it. If this is something you need millions of then you may not want to waste 5MiB of RAM..

The choice is yours.
Honestly, I have never done anything alignment related so I know nothing about that topic. (except page alignment xP) 500 bytes doesn't sound like much, but why 5 and not 3? Also how can I assume that my kernel will only require rounded values? For example: a structure requires 7247 bytes for its existence to be persistent, that is already 200+ bytes wasted in there.
LtG wrote:
Octacone wrote: Using PMM for kernel memory needs would be very inefficient. If I need 4097 bytes that means I need 2 pages which means 4095 bytes wasted for nothing. That is why I need dynamic memory manager pool heap.
You use PMM for both kernel and user memory, and in both cases the VMM talks to the PMM and maps the memory where it's needed. However you may want to have have either two VMM's (one for kernel and one for user) or at least have two malloc's (kmalloc for kernel and malloc for user).

Normally the code uses malloc, which uses VMM, which uses PMM.

In most cases the kernel only uses a few different types of structs, suppose you have 10 different structs and after rounding them off to closest reasonable size (ie. 32B or 64B, etc) you might end up with 1-5 different size of structs, at that point your kmalloc might have 5 pools, each of which get a new 4KiB page (from VMM) when they run out of memory and parceling the memory becomes very simple because there's only one size to worry about for each pool.

The point here is to keep the kmalloc simple and the point is also to take advantage of the fact that you _know_ the different allocations that you are _ever_ going to need for your kernel, so you can "special case" it for the kmalloc, the same doesn't apply for user malloc which needs to be more generic and that means less efficient and more complex and thus more likely to contain bugs. No point to mix malloc and kmalloc when they are so different, even if they have almost identical responsibilities.

Note: I said _ever_ above, of course a newer version might have other memory needs, but with said newer kernel version you of course ship newer kmalloc and thus it works out.
So you suggest a completely different approach for malloc and kmalloc altogether. That means that I would have to round 33 bytes to 64, ouch. (27 bytes wasted in there). I always thought I would use the same allocator for both user and kernel space, looks like there is a different approach to that. I need to consider this deeply. Syncing 5 allocators is not an easy job. Do you think I should have a function called VMM_Allocate_Block/s(1/n) that would automatically identity map the given/allocated address? While keeping PMM_Allocate_Block/s(1/n) purely user can choose how to map it scenario?
LtG wrote:
Octacone wrote: So my plan is to have an allocator that would be used for both kernel and user space allocations. I don't really want to screw around with two different sets of allocators.
As I said above, you should consider having separate kmalloc, it makes things simpler and better.
Okay, will do.

LtG wrote:
Octacone wrote: Right now I am having some rather stupid issues. First of all, I successfully ported LibAlloc, then I tried to use it and I got a page fault. Later on I discovered that the faulting address was 0x400000 -> location of my PMM bitmap -> that wasn't identity mapped of course -> then I realized I couldn't just map 0x400000 -> because it is 0x400000 + (Total_Blocks * 4) / 32 bytes in length -> so I identity mapped everything from 0x400000 to 0x40FFEF (size of the memory bitmap in bytes is 0xFFEF) -> then I also PMM reserved an area from 0x400000 to 0x40FFEF so it couldn't be used for anything else -> then I fixed it, sort of -> only allocating 1 block worked and when I tried to allocate >1 block, it failed -> that is what I am trying to fix right now. It is really hard to take care of all the addresses that have to be identity mapped and reserved and that are used for kernel structures. This is so exhausting. :roll:
Also I've read all the other useful comment, that I will study more deeply, once I start working on my own allocator (early 2018 hopefully). Thanks for putting your effort into writing them.
Personally I would avoid porting any allocator until you have finished your own. Make your own allocator, make it slow and inefficient but do it right. After that's done you better understand everything and porting at that point a more efficient allocator is easier because you understand the topic better.

Also I would keep things really simple and separate everything to the smallest possible pieces, don't worry about performance at all, you can worry about it later. I would also avoid identity mapping unless you can actually use it exclusively.

So the PMM becomes extremely simple, it only has two interface functions, alloc and free, which only ever give (or take back) 4KiB physical pages. At that point the PMM is dead simple. The VMM doesn't care what you want to allocate of where, you have to tell it, so it only does what it's told (it's kmalloc/malloc's problem to figure out where they want their allocations), the VMM only does sanity checks (userland can't allocate on top of kernel space, etc), which makes the VMM also quite simple. Then comes the kmalloc, if it only supports a couple of different sizes then it's quite simple and the most complex is the userland malloc, but that's in userland and thus can't crash the system and each app/runtime can have their own for all I/you care. The userland malloc should not be something that is enforced by the system anyway..

Pretty long post, hope it makes sense and is useful =)
My PMM has a couple of extra functionalities, but everything else is as "expected/done by the book". I don't understand the VMM part. Currently the only related (public, meant to be used by me, not speaking of initialization functions) function I have is Map_Virtual_To_Physical and that is it. My VMM_Allocate_Block is basically:

Code: Select all

//this function does not exist but these two lines of code can simulate VMM_Allocate_Block
uint32_t test = PMM.Allocate_Block(); 
VMM.Map_Virtual_To_Physical(test, test, true, true, false); //identity mapping example
I am definitely going to enforce that. I don't want any third party allocators to mess with my operating system. (allocators not there by default, aka not compiled with the OS itself).
So as far I managed to understand -> kmalloc ==> kernel, common sizes that are even numbers; malloc ==> user space, any number of bytes (both odd and even).
Yeah, most of this makes sense to me ("most of it" :P). Thanks for writing such a long post, I enjoy reading them.
OS: Basic OS
About: 32 Bit Monolithic Kernel Written in C++ and Assembly, Custom FAT 32 Bootloader
User avatar
Geri
Member
Member
Posts: 442
Joined: Sun Jul 14, 2013 6:01 pm

Re: Heap/Pool Dynamic Memory Allocation Implementation

Post by Geri »

you cant avoid wasting large quantity of bytes. it will escpecially hurt you when you do small allocations (like if an idiot writes a paint program to your platform where every pixel is a malloc). wasting 16, 24 or so byte at least is unavoidable as you have to deal with a lot of information, you can be even happyer if you can come out from 32 byte actually
Operating system for SUBLEQ cpu architecture:
http://users.atw.hu/gerigeri/DawnOS/download.html
LtG
Member
Member
Posts: 384
Joined: Thu Aug 13, 2015 4:57 pm

Re: Heap/Pool Dynamic Memory Allocation Implementation

Post by LtG »

@Octacone..

I'm in a bit of a hurry, so don't have time to quote properly, also let me know if I missed anything..

Wrt to the "why 5 and not 3", not sure what you meant by that..? If you are referring to the "5" in the "100x5B", then that 5 is picked up from your earlier post (32-27), so I was just continuing with your example and pointing out that wasting 5B per allocation is irrelevant if there's only ever going to be a 100 such allocations (ie. 500B). Also it's irrelevant if there's 10k such allocations on modern x86 systems (though embedded may of course be different). So I was trying to point out that while you want to be frugal with most of your memory, in some cases (especially wrt to the kernel) it really doesn't matter.

Whether you want to optimize for cache lines is of course up to you, but only supporting a couple of allocation sizes for the kernel malloc (kmalloc) makes things simpler on that side. If your userland malloc crashes it's easier to diagnose, if your kmalloc crashes it's more difficult, so keeping kmalloc simple has benefits, keeping malloc simple isn't really feasible..

Didn't really get your 7247 => 200+B, where's the 200 come from? But regardless, if there's only handful of such structs allocated then 200B wasted on each is again meaningless.



Wrt to 33 -> 64 rounding, I was just giving some examples, the exact rounding you want to do is up to you. There's two things to consider, one is having a relatively low number of allocation sizes to keep the allocator simple and the other is for cache line optimizations. You can ignore the cache line optimizations for now if you like, but keeping things simple does help. So 33 could also be rounded to 36 or 48, etc.. Once you know all the structs you need you can decide how you want to round them, for starters you could just give each 4KiB to get the ball rolling and fix those once you have all of your structs ready, since you probably don't know all the sizes yet..

The basic idea is that when you call kmalloc:
void* ptr = kmalloc(32); // ok
void* ptr = kmalloc(33); // kpanic, invalid allocation size
void* ptr = kmalloc(64); // ok
etc..

So then inside kmalloc you just check the requested size, and based on it you have different page for each size from which you allocate, that way you don't have to deal with odd sizes, etc. The idea here is to keep it as simple as possible. So if you only needed three different allocation sizes, say 32B, 64B and 256B you might have:
void* free32BList;
void* free64BList;
void* free256BList;
And then just keep track of those, but if this complicates things for you, you can certainly do it anyway you like =)

Note also that I would not use identity mapping at all, what benefit do you think you get from it?

Finally, the userland malloc is not something that comes with the OS at all, it's part of the language runtime, yes, you'll have to create it if you want to compile C apps for your OS, but it's not part of the OS and you should _NOT_ want to enforce any malloc on anyone.

Even if you did enforce your malloc on your userland, you couldn't. Consider the following code on _any_ OS:

Code: Select all

void* lotsOfMem; // allocate ram from OS, using "enforced" malloc

void* myMalloc(size_t count) {
void* addr;
// somehow parcel out the large 10GiB block for internal allocations
return addr;
}

void main() {
lotsOfMem = malloc(10GiB);
uint32_t* manyInts = myMalloc(10 * 1000 * 1000 * 4);
}
The point here is that you can't enforce your malloc nor should you even attempt. AFAIK Windows apps that use malloc, the malloc internally uses VirtualAlloc (part of Win32 API) or something similar.

So the idea is that the language runtime malloc uses the OS's VMM to allocate what it wants. And you can offload the burden of figuring out what they want to userland, which makes simpler/stabler kernel, leaves more room for userland optimizations, etc.

So when userland calls:
VMM_Alloc(/*start*/ 0x10000, /*stop*/ 0x20000);

All your kernel VMM has to do is check that the userland is not requesting allocation on top of kernel memory. You may also want to check it's not doing something stupid, like requesting allocation on top of something that's already allocated to it (or you may want to allow it as a quick way of allowing to change the allocation type (read-only vs read-write vs execute, etc)). My VMM_Alloc example above didn't include "type", you may want to do that though.

So all the VMM_Alloc needs to do is basic sanity and security checking and then just do what it was asked to do, which makes its implementation quite simple.

Then as a starting malloc (for userland) you can do something simple (though inefficient) and just always round up the requested allocation to the nearest 4KiB and allocate that (yeah, I said it's inefficient). And couple of days later fix it when it becomes a problem.

So all four pieces (PMM, VMM, kmalloc and malloc) become really simple, and only reason to change them is to improve efficiency, but that's just optimization.
Post Reply