Hi j4cobgarby,
I'm so glad you asked this question because during the development of my operating system, Tilck, I've asked myself exactly the same question and, I considered both the interfaces.
Therefore, here are my conclusions so far. The free(ptr) interface is
less error-prone and convenient in all the cases where the length of the buffer is not known at compile-time. I believe there's no point in explaining further the PROs of this approach. But, as you pointed out, it has some overhead: it requires some metadata before each "chunk" of memory (that's what glibc's malloc() does, see
http://gee.cs.oswego.edu/dl/html/malloc.html). Also, it has other limitations as well: the fact that it needs some metadata, means that it's not possible to have contiguous chunks of data allocated by different calls (this feature is convenient in a kernel).
With an interface such as free(ptr, length) instead, the callers would have to manually keep track of "length". Typically, in userspace that's very inconvenient, but in a kernel project, where you care almost about every single "bit", it might have some nice PROs as well. Let me share my experience with the free(ptr, length) interface in a few points:
1. Yes, callers need to pass "length", but what it most (70-80%) of the kmalloc/kfree (in my project) are about allocating "objects" (structs) on the heap. In that case, the size is known at compile-time and you could just introduce simple macros like allocate_obj(type), free_obj(ptr, type).
2. In other cases, "length" is not known at compile-time, BUT the caller needs it for other reasons anyways (15%). For example, a ring buffer. Sure, it would be nicer to destroy it with just kfree(rb->buf), but the caller REALLY needs to know the buffer size the whole time, for writes and read. So, if the allocator had "length" in its metadata it would be a little waste. We have already length, so we can free the buffer with kfree(rb->buf, rb->length).
Zero overhead.
3. In other cases (maybe 5% in my project), the "length" is not known at compile-time AND the caller doesn't need to keep it for other reasons. In such cases, I initially kept track of the "length" explicitly. Later, I created a trivial kmalloc/kfree wrapper that added a metadata with the chunk size before the chunk itself. BUT, I didn't like that interface because that meant that chunks allocated with the special kmalloc wrapped needed to be released with the special kfree wrapper. Very error-prone.
At the end, I figured a smart way to make the
length parameter optional in any case. So, if you pass 0, the allocator will still figure out the size of the chunk,
without metadata at the price of some minimal runtime overhead. Now, I continue to pass length to kfree() every time I can, but it's totally optional.
It all depends on your allocator
It is essential at this point to remark that such decisions depend on the kind of allocator you're implementing. Real-world allocators typically made by a
composition of several types of allocators. Some kind of allocators, like free-lists ones (see malloc()), really need a "length" field in the chunk's metadata and in no case could infer that length. In my case instead, I implemented a "Buddy allocator" (see:
https://en.wikipedia.org/wiki/Buddy_memory_allocation). Therefore, because of the structure of the allocator: 1. have minimal (1 byte/chunk, really 4 bits) metadata, 2. can infer the size of an allocated chunk, by walking the binary tree.
OK, in my case, it was more complicated than that, because I had to use
multiple heaps to fill all the free memory, simply because the size of a heap must be a power of 2. Also, to make the allocator faster, I had to create "small heaps" for allocating smaller objects in a more efficient way and keeping all the binary trees shorter (regular heaps have 2k as smallest chunk size, small heaps have 32 as smallest chunk size). So, as you can imagine, when I have the "length" parameter, can it's a bit faster because I immediately know if the chunk is in a small heap or not.
Waste of the buddy allocator alone
So, my allocator is fast and good for allocating objects with size 2^N, but wastes a lot of memory in case objects have a size like 2^N + something. In theory, on the top of my buddy allocator, I had to implement something like Linux's slab allocator, but I left that as a FUTURE todo. Some memory waste was fine, at the time. But, during the whole development, I took in mind that my allocator need to work with 2^N chunks.
At some point, I wanted to measure how much memory EXACTLY I was wasting with such allocator. Considering at it can waste at most 50% of the memory, with an average of 25% if the size of the allocations have an uniform distribution, I was expecting something like 10%, because most of allocations have a power-of-two size. In reality, it turned out to be something like 2% or less. With some tuning, I made the waste to be around 0.1%, which raised up to ~1.0% when I added ACPI support. That's because ACPICA doesn't care about my 2^N "rule".
Conclusion
My whole point is that, while the kfree(ptr) interface is generally better, in some cases, with some allocators, the kfree(ptr, length) interface can be also very good and avoid some overhead. In particular, if the length parameter is optional: that way, you pass it when you can, gaining some performance. When you can't, there's always a fallback that works. Still, if I had enough time to implement something like a slab allocator on the top of my buddy allocator, I could reduce much more the waste, because the top allocator would only allocate chunks of a given size, no need for the "small heaps" trick. Overall, by looking at the Linux kernel, I believe that, when your allocator(s) is sophisticated enough, you can both skip the "length" parameter and have minimum overhead with good performance, but that's an incredible amount of work. Therefore, I don't regret my decision to have a "length" parameter, overall.
P.S.: sorry for the long post.