Page 1 of 2

Why is free(ptr) preferable to free(ptr, length)?

Posted: Sat Feb 06, 2021 11:15 am
by j4cobgarby
The POSIX spec defines the free function to just take the address of the beginning of the allocated memory, and then it will internally handle how long that region is so it can free the correct amount of memory. However, my operating system doesn't have to follow that specification, which brings me to my question.

When looking at ways of implementing the heap for my kernel malloc function, it seems to be unavoidable to have a significant amount of metadata in each block which the heap consists of, so that free can know how long each allocation is. I would much prefer the metadata in each block to be just a simple bitmap of which sub-blocks in the block of the heap are available (that was phrased horribly, but I can't think of a better way...)

Essentially, my free function would instead be declared as:

Code: Select all

void kfree(void *ptr, size_t len);
so that instead the caller will have to keep track of the length. I don't think that's a disadvantage though, as I can't think of an instance where the caller won't know how long the allocation was.

Does anyone know why free is usually declared in that way, then? It seems like it would make the heap allocator just use more memory for metadata.

Re: Why is free(ptr) preferable to free(ptr, length)?

Posted: Sat Feb 06, 2021 11:36 am
by iansjack
Keeping track of the length of an allocated block of memory is inconvenient for the application programmer. More importantly, it's error prone. It's far better for the operating system to look after this housekeeping task. One of the aims of an operating system should be to help to ensure simple, error-free application programming.

And if the application program has to keep track of the length of allocated memory, any memory savings are illusory.

Re: Why is free(ptr) preferable to free(ptr, length)?

Posted: Sat Feb 06, 2021 11:40 am
by PeterX
Yes, what iansjack said. It would lead to strange errors (including violations of process rights) if the application programmer could set any length.

Greetings
Peter

Re: Why is free(ptr) preferable to free(ptr, length)?

Posted: Sat Feb 06, 2021 11:48 am
by Gigasoft
There are almost no heap implementations that don't already know the length of each allocation, so requiring programs to provide it would just waste bytes. A bitmap allocator is very slow and would be considered an especially poor design choice at the time when computers were already slow enough.

Re: Why is free(ptr) preferable to free(ptr, length)?

Posted: Sun Feb 07, 2021 8:07 am
by rdos
I agree that it is usually best if the OS (or rather the compiler's runtime library) handle this, but the usual design of the heap where allocated data is mixed with control blocks is pretty bad too. It basically means that if an application overwrites the allocation, it will kill the control block for the next allocation and trash the allocation chain.

There has to be better methods to do this in a flat environment where allocations cannot be protected with segmentation.

Re: Why is free(ptr) preferable to free(ptr, length)?

Posted: Sun Feb 07, 2021 8:11 am
by nexos
rdos wrote:but the usual design of the heap where allocated data is mixed with control blocks is pretty bad too.
In theory, that is a bad problem. But, heap overflows are still problems even if control blocks are moved away. Also, if we have a magic number at the footer or header, we could check those for heap corruption. That means that heap corruption would be easier to catch.

I think that free should only take a pointer, not a size. This is because, as has already been pointed out, that would be more error prone then free(ptr).

Re: Why is free(ptr) preferable to free(ptr, length)?

Posted: Sun Feb 07, 2021 11:34 am
by vvaltchev
Hi j4cobgarby,
I'm so glad you asked this question because during the development of my operating system, Tilck, I've asked myself exactly the same question and, I considered both the interfaces.

Therefore, here are my conclusions so far. The free(ptr) interface is less error-prone and convenient in all the cases where the length of the buffer is not known at compile-time. I believe there's no point in explaining further the PROs of this approach. But, as you pointed out, it has some overhead: it requires some metadata before each "chunk" of memory (that's what glibc's malloc() does, see http://gee.cs.oswego.edu/dl/html/malloc.html). Also, it has other limitations as well: the fact that it needs some metadata, means that it's not possible to have contiguous chunks of data allocated by different calls (this feature is convenient in a kernel).

With an interface such as free(ptr, length) instead, the callers would have to manually keep track of "length". Typically, in userspace that's very inconvenient, but in a kernel project, where you care almost about every single "bit", it might have some nice PROs as well. Let me share my experience with the free(ptr, length) interface in a few points:

1. Yes, callers need to pass "length", but what it most (70-80%) of the kmalloc/kfree (in my project) are about allocating "objects" (structs) on the heap. In that case, the size is known at compile-time and you could just introduce simple macros like allocate_obj(type), free_obj(ptr, type).

2. In other cases, "length" is not known at compile-time, BUT the caller needs it for other reasons anyways (15%). For example, a ring buffer. Sure, it would be nicer to destroy it with just kfree(rb->buf), but the caller REALLY needs to know the buffer size the whole time, for writes and read. So, if the allocator had "length" in its metadata it would be a little waste. We have already length, so we can free the buffer with kfree(rb->buf, rb->length). Zero overhead.

3. In other cases (maybe 5% in my project), the "length" is not known at compile-time AND the caller doesn't need to keep it for other reasons. In such cases, I initially kept track of the "length" explicitly. Later, I created a trivial kmalloc/kfree wrapper that added a metadata with the chunk size before the chunk itself. BUT, I didn't like that interface because that meant that chunks allocated with the special kmalloc wrapped needed to be released with the special kfree wrapper. Very error-prone. At the end, I figured a smart way to make the length parameter optional in any case. So, if you pass 0, the allocator will still figure out the size of the chunk, without metadata at the price of some minimal runtime overhead. Now, I continue to pass length to kfree() every time I can, but it's totally optional.

It all depends on your allocator
It is essential at this point to remark that such decisions depend on the kind of allocator you're implementing. Real-world allocators typically made by a composition of several types of allocators. Some kind of allocators, like free-lists ones (see malloc()), really need a "length" field in the chunk's metadata and in no case could infer that length. In my case instead, I implemented a "Buddy allocator" (see: https://en.wikipedia.org/wiki/Buddy_memory_allocation). Therefore, because of the structure of the allocator: 1. have minimal (1 byte/chunk, really 4 bits) metadata, 2. can infer the size of an allocated chunk, by walking the binary tree.

OK, in my case, it was more complicated than that, because I had to use multiple heaps to fill all the free memory, simply because the size of a heap must be a power of 2. Also, to make the allocator faster, I had to create "small heaps" for allocating smaller objects in a more efficient way and keeping all the binary trees shorter (regular heaps have 2k as smallest chunk size, small heaps have 32 as smallest chunk size). So, as you can imagine, when I have the "length" parameter, can it's a bit faster because I immediately know if the chunk is in a small heap or not.

Waste of the buddy allocator alone
So, my allocator is fast and good for allocating objects with size 2^N, but wastes a lot of memory in case objects have a size like 2^N + something. In theory, on the top of my buddy allocator, I had to implement something like Linux's slab allocator, but I left that as a FUTURE todo. Some memory waste was fine, at the time. But, during the whole development, I took in mind that my allocator need to work with 2^N chunks.

At some point, I wanted to measure how much memory EXACTLY I was wasting with such allocator. Considering at it can waste at most 50% of the memory, with an average of 25% if the size of the allocations have an uniform distribution, I was expecting something like 10%, because most of allocations have a power-of-two size. In reality, it turned out to be something like 2% or less. With some tuning, I made the waste to be around 0.1%, which raised up to ~1.0% when I added ACPI support. That's because ACPICA doesn't care about my 2^N "rule".

Conclusion
My whole point is that, while the kfree(ptr) interface is generally better, in some cases, with some allocators, the kfree(ptr, length) interface can be also very good and avoid some overhead. In particular, if the length parameter is optional: that way, you pass it when you can, gaining some performance. When you can't, there's always a fallback that works. Still, if I had enough time to implement something like a slab allocator on the top of my buddy allocator, I could reduce much more the waste, because the top allocator would only allocate chunks of a given size, no need for the "small heaps" trick. Overall, by looking at the Linux kernel, I believe that, when your allocator(s) is sophisticated enough, you can both skip the "length" parameter and have minimum overhead with good performance, but that's an incredible amount of work. Therefore, I don't regret my decision to have a "length" parameter, overall.

P.S.: sorry for the long post.

Re: Why is free(ptr) preferable to free(ptr, length)?

Posted: Sun Feb 07, 2021 5:54 pm
by j4cobgarby
vvaltchev wrote:Hi j4cobgarby,
I'm so glad you asked this question because during the development of my operating system, Tilck, I've asked myself exactly the same question and, I considered both the interfaces.

Therefore, here are my conclusions so far. The free(ptr) interface is less error-prone and convenient in all the cases where the length of the buffer is not known at compile-time. I believe there's no point in explaining further the PROs of this approach. But, as you pointed out, it has some overhead: it requires some metadata before each "chunk" of memory (that's what glibc's malloc() does, see http://gee.cs.oswego.edu/dl/html/malloc.html). Also, it has other limitations as well: the fact that it needs some metadata, means that it's not possible to have contiguous chunks of data allocated by different calls (this feature is convenient in a kernel).

With an interface such as free(ptr, length) instead, the callers would have to manually keep track of "length". Typically, in userspace that's very inconvenient, but in a kernel project, where you care almost about every single "bit", it might have some nice PROs as well. Let me share my experience with the free(ptr, length) interface in a few points:

1. Yes, callers need to pass "length", but what it most (70-80%) of the kmalloc/kfree (in my project) are about allocating "objects" (structs) on the heap. In that case, the size is known at compile-time and you could just introduce simple macros like allocate_obj(type), free_obj(ptr, type).

2. In other cases, "length" is not known at compile-time, BUT the caller needs it for other reasons anyways (15%). For example, a ring buffer. Sure, it would be nicer to destroy it with just kfree(rb->buf), but the caller REALLY needs to know the buffer size the whole time, for writes and read. So, if the allocator had "length" in its metadata it would be a little waste. We have already length, so we can free the buffer with kfree(rb->buf, rb->length). Zero overhead.

3. In other cases (maybe 5% in my project), the "length" is not known at compile-time AND the caller doesn't need to keep it for other reasons. In such cases, I initially kept track of the "length" explicitly. Later, I created a trivial kmalloc/kfree wrapper that added a metadata with the chunk size before the chunk itself. BUT, I didn't like that interface because that meant that chunks allocated with the special kmalloc wrapped needed to be released with the special kfree wrapper. Very error-prone. At the end, I figured a smart way to make the length parameter optional in any case. So, if you pass 0, the allocator will still figure out the size of the chunk, without metadata at the price of some minimal runtime overhead. Now, I continue to pass length to kfree() every time I can, but it's totally optional.

It all depends on your allocator
It is essential at this point to remark that such decisions depend on the kind of allocator you're implementing. Real-world allocators typically made by a composition of several types of allocators. Some kind of allocators, like free-lists ones (see malloc()), really need a "length" field in the chunk's metadata and in no case could infer that length. In my case instead, I implemented a "Buddy allocator" (see: https://en.wikipedia.org/wiki/Buddy_memory_allocation). Therefore, because of the structure of the allocator: 1. have minimal (1 byte/chunk, really 4 bits) metadata, 2. can infer the size of an allocated chunk, by walking the binary tree.

OK, in my case, it was more complicated than that, because I had to use multiple heaps to fill all the free memory, simply because the size of a heap must be a power of 2. Also, to make the allocator faster, I had to create "small heaps" for allocating smaller objects in a more efficient way and keeping all the binary trees shorter (regular heaps have 2k as smallest chunk size, small heaps have 32 as smallest chunk size). So, as you can imagine, when I have the "length" parameter, can it's a bit faster because I immediately know if the chunk is in a small heap or not.

Waste of the buddy allocator alone
So, my allocator is fast and good for allocating objects with size 2^N, but wastes a lot of memory in case objects have a size like 2^N + something. In theory, on the top of my buddy allocator, I had to implement something like Linux's slab allocator, but I left that as a FUTURE todo. Some memory waste was fine, at the time. But, during the whole development, I took in mind that my allocator need to work with 2^N chunks.

At some point, I wanted to measure how much memory EXACTLY I was wasting with such allocator. Considering at it can waste at most 50% of the memory, with an average of 25% if the size of the allocations have an uniform distribution, I was expecting something like 10%, because most of allocations have a power-of-two size. In reality, it turned out to be something like 2% or less. With some tuning, I made the waste to be around 0.1%, which raised up to ~1.0% when I added ACPI support. That's because ACPICA doesn't care about my 2^N "rule".

Conclusion
My whole point is that, while the kfree(ptr) interface is generally better, in some cases, with some allocators, the kfree(ptr, length) interface can be also very good and avoid some overhead. In particular, if the length parameter is optional: that way, you pass it when you can, gaining some performance. When you can't, there's always a fallback that works. Still, if I had enough time to implement something like a slab allocator on the top of my buddy allocator, I could reduce much more the waste, because the top allocator would only allocate chunks of a given size, no need for the "small heaps" trick. Overall, by looking at the Linux kernel, I believe that, when your allocator(s) is sophisticated enough, you can both skip the "length" parameter and have minimum overhead with good performance, but that's an incredible amount of work. Therefore, I don't regret my decision to have a "length" parameter, overall.

P.S.: sorry for the long post.
Thanks for the detailed response! That's exactly the sort of information I was looking for.
I've decided I most likely will make the allocator keep track of the length of each block, and just read more about different ways of doing allocations.

Re: Why is free(ptr) preferable to free(ptr, length)?

Posted: Sun Feb 07, 2021 6:10 pm
by vvaltchev
j4cobgarby wrote: Thanks for the detailed response! That's exactly the sort of information I was looking for.
I've decided I most likely will make the allocator keep track of the length of each block, and just read more about different ways of doing allocations.
You’re welcome! I’m happy that it helped you to make a decision.

P.S.: sorry for huge amount of typos and other kind of orthographic errors: I was in a rush and didn’t have the time to re-read the whole post carefully.

Re: Why is free(ptr) preferable to free(ptr, length)?

Posted: Sun Feb 07, 2021 11:54 pm
by wxwisiasdf
You make free to give up a portion of memory.

Want to shrink it?, That is what realloc is for.

Re: Why is free(ptr) preferable to free(ptr, length)?

Posted: Mon Feb 08, 2021 3:50 am
by OSwhatever
This is an interesting question, but not necessarily only connected to kernel programming but programming in general. I already have a few allocators that works like this inside my kernel. One example is the buddy allocator and since you cannot store metadata inside the memory chunk without destroying the natural alignment provided with the buddy allocator, the meta data must be outside the allocated memory including the length.

There is more to this that storing the length together with the pointer can actually be a good idea. This is really ties more into the language than the kernel. Some languages provide fat pointers where you can store additional meta data like reference count and more. Also fat pointers makes the language more versatile where you can swap out GC algorithms to whatever you a like. Nim comes to mind that has this versatility. In practice with the length provided all allocated memory in the language then become slices and it also enables runtime bound checks for all memory. This means that if you break out a slice from an existing slice you can do a runtime bounds check.

Re: Why is free(ptr) preferable to free(ptr, length)?

Posted: Mon Feb 08, 2021 1:56 pm
by rdos
OSwhatever wrote:This is an interesting question, but not necessarily only connected to kernel programming but programming in general. I already have a few allocators that works like this inside my kernel. One example is the buddy allocator and since you cannot store metadata inside the memory chunk without destroying the natural alignment provided with the buddy allocator, the meta data must be outside the allocated memory including the length.
The kernel is a bit of a special case. I have several different allocators in kernel.

One allocates byte-aligned blocks and uses control blocks before the actual memory block. Most of these are then linked to a selector with exact limit checking, and so memory corruption because of bad pointers doesn't happen here. Freeing will use the limit in the selector, but also will be enforced by the control block.

Another allocates pages from the page tables and marks them as "allocate on access". This one has no control blocks and thus when freeing these objects a size must be specified to the kernel knowns how many pages to free.

The third one is an allocator that uses bitmaps and is mapped to either 32-bit or 64-bit physical addresses. It's typically integrated with device data and is used to allocate many smaller same-size objects for memory-based schedules. It's very useful for memory mapped hardware as linear to physical translations (and the reverse) can be done very fast. I use this for all USB hardware devices, but might move AHCI and network drivers to it too in the future. It also has a garbage collection function as all blocks allocated will be freed when the main allocator object is freed.
OSwhatever wrote: There is more to this that storing the length together with the pointer can actually be a good idea. This is really ties more into the language than the kernel. Some languages provide fat pointers where you can store additional meta data like reference count and more. Also fat pointers makes the language more versatile where you can swap out GC algorithms to whatever you a like. Nim comes to mind that has this versatility. In practice with the length provided all allocated memory in the language then become slices and it also enables runtime bound checks for all memory. This means that if you break out a slice from an existing slice you can do a runtime bounds check.
True, but applications have less choices than the kernel that also can control page tables & physical memory mapping.

Re: Why is free(ptr) preferable to free(ptr, length)?

Posted: Mon Feb 08, 2021 3:01 pm
by AndrewAPrice
For the majority of cases, you probably would know the size of the object you want to free. E.g. Your variable is "struct Thread*". Arrays are a little more difficult, but you're probably pairing the pointer with its size somewhere. Interfaces would require the most work to figure out what size to pass to free (maybe each implementation set a 'destruct' function pointer, or the base struct contains metadata such as the size.)

Re: Why is free(ptr) preferable to free(ptr, length)?

Posted: Mon Feb 08, 2021 3:09 pm
by AndrewAPrice
rdos wrote:True, but applications have less choices than the kernel that also can control page tables & physical memory mapping.
If you're allocating large page-aligned objects in the kernel, you probably don't want to use the same malloc/free as you do for arbitrary sized objects. Feel free to, but I think it'll make your implementation more difficult. It might be better to call your Virtual Memory Manager directly (e.g. "void* AllocatePages(size_t num_pages)", "ReleasePages(void* first_page, size_t pages)") and just leave comments that this is allocated via the virtual memory manager rather than mallloc.

Does your kernel allocate objects that are larger than 1 page in size? Mine doesn't (I'm building a microkernel) so I'm curious if you do.

Re: Why is free(ptr) preferable to free(ptr, length)?

Posted: Mon Feb 08, 2021 10:26 pm
by eekee
@vvaltchev: I'm thankful for your detailed explanation too. I'm thinking of a single-language OS. If this language is strongly typed, your point #1 would apply to 100% of allocations. I'm also happy to hear your buddy allocator wasted surprisingly little memory even before usage optimization.