Page 1 of 1

[SOLVED] Internal Kernel Pseudo-Garbage Collection

Posted: Wed Sep 03, 2014 10:06 pm
by KemyLand
Hi! Sorry if this is not gramatically correct, I'm not a native English speaker.

I have a little theorical question. I would want to implement a kernel garbage collection system. Please read-on because the title looks a little newbie I know.

Let's say I have a function that uses A LOT of memory allocation, say for example the VFS Path Divisor. I don't want to spend too much time (less in the VFS and Kernel Library Utilities) comparing hash tables and converting Virtual to Physical Memory inside KFree(), so I decided to do a pseudo-garbage collection system.

You would call a MMGC(void*) function, which would append the address, with some other info like Virtual or Physical, to a constant and premade Stream (I have already implemented a high-level stream manipulation system), then return. The Garbage Collector would be a Kernel-Side process which uses its timeslice only, and only for highly-optimized memory freeing from the stream. If the GC finds that the stream is now empty, it calls the process manager and yields its timeslice inmediately.

And for more optimization, when calling MMGC() and the stream is full, the current process yields its timeslice inmediately to the GC. (There's nowhere to dump the stream when full).

I would want to know if this approach is okay. I think it wouldn't cause a large overhead if the kernel is compiled with a good-sized stream, 2KB for example.

Re: Internal Kernel Pseudo-Garbage Collection

Posted: Wed Sep 03, 2014 11:23 pm
by KemyLand
PS: Userspace programs WON'T use this, only the kernel and drivers. That's also why I call it a pseudo-GC, because it really doesn't collects, but recieves and frees. The proper function is responsible for calling MMGC()

Re: Internal Kernel Pseudo-Garbage Collection

Posted: Wed Sep 03, 2014 11:26 pm
by KemyLand
It's still perfomance-talking okay to use free() every time it's needed inside the kernel? How does Linux does it, for example (if not too complex, which is the most probable case)

Re: Internal Kernel Pseudo-Garbage Collection

Posted: Wed Sep 03, 2014 11:34 pm
by KemyLand
Thanks, know I understand, I'll implement a sbrk syscall, and look for another driver model by the way...

(Marking this thread as Closed)

malloc (was: Internal Kernel Pseudo-Garbage Collection)

Posted: Wed Sep 03, 2014 11:58 pm
by Combuster
EDIT: The preceding missing (misinformed) comments and the resulting discussion were split off into a separate thread to give the OP's original question a chance for a proper answer.

Garbage collection is any mechanism that automates the testing of allocations and freeing of memory when it's no longer in use. Commonly it's specifically referring to managed languages with heavy mark-and-sweep style garbage collectors.

Regardless, most kernel designs have at least a form of reference counting somewhere in the system.

That said, the memory needs of a kernel are not unlike that of userspace. The only difference is that the kernel also keeps track of what physical memory is available, and can adjust page tables on the fly. Most designs will still have a (k)malloc and (k)free, which is inevitable because system structures need memory too.

The question is whether running the trailing part of free() on a separate thread is profitable. It might reduce average case latency, and it gives you an excellent option to zero blocks of memory so that you have a fresh supply whenever you might need it and can't leak leftover data from one application to another, but it's also more complicated to pull off.

Re: [SOLVED] Internal Kernel Pseudo-Garbage Collection

Posted: Thu Sep 04, 2014 5:03 am
by max
Okay uhm, a note to KemyLand, I think you should not give your idea up just because someone on this forum said it's not a good idea. You should at least wait for some other opinions, because not everyone on here knows everything best ;)

Well, as Combuster already pointed out, a blank toolchain can compile C code that can be executed. By blank I mean a set of binutils & gcc compiled for a specific target platform & architecture without having a C library or anything else added. This is essentially everything you need for the kernel itself. With this in your hand, you have the freedom to do whatever you want with your memory.

Now you have to make a distinction: kernel space and user space. I don't know how familiar you are with malloc, so I will give a short overview about whats different there.

In the userspace malloc is (usually!) just an algorithm that uses some way to interface with the system to first find out which memory areas it can use, and then manage that area of memory. If you port a C library like newlib to your userspace, there will already be an implementation of malloc. As I said, malloc needs some way to interface with the system to get it's usable memory area: sbrk is one of the functions that malloc uses to ask the system 1. where this usable memory lays and 2. to extend or shrink this memory. There are also other malloc implementations that for example use the mmap which is a bit more complex and relies on yet other system functions. So malloc is absolutely nothing but a function that manages a chunk of memory that the kernel told the process that it could use.

In the kernelspace malloc (usually!) is also just an algorithm, but the difference is that the kernel itself knows what areas of memory it can use for its own heap, so it doesn't have to ask anyone for it. For example my kernel is a higher half kernel, and has a it's heap in the higher half. Then theres a simple malloc/free implementation that just manages the memory inside that heap. If the heap runs out, then theres an extendHeap function that makes that heap bigger, so malloc doesn't fail.

Note: Even if this is the most common case, malloc and free could do something completely different in both spaces, the only thing thats important is that it does what it's standardized for.

So what I want to say is basically: it is absolutely okay to create any kind of memory management in your kernel, it's perfectly okay to write something like gcmalloc that has some kind of garbage collection algorithm in the background. The only thing you have to consider is if it makes sense to do it, if you have performance improvements, it makes your code simpler or whatever is necessary to fulfill your needs.

Greets, Max


EDIT:
DaemonR wrote:Not lies, any person who's ever used C knows you don't have to include any extra libraries to use malloc
Please man, check your stuff... "Normal" developers usually only use toolchains that are made for their specific system and that already contain a C library.

malloc is usually part of the C library, but not necessarily. If you make a OS specific toolchain for your userspace, what you really should, there will be NO malloc until you bring something in - like newlib - that provides an implementation for exactly this feature. I think you shouldn't feed new users on this forum with wrong information.

Re: [SOLVED] Internal Kernel Pseudo-Garbage Collection

Posted: Thu Sep 04, 2014 5:21 am
by Brendan
Hi,
KemyLand wrote:You would call a MMGC(void*) function, which would append the address, with some other info like Virtual or Physical, to a constant and premade Stream (I have already implemented a high-level stream manipulation system), then return. The Garbage Collector would be a Kernel-Side process which uses its timeslice only, and only for highly-optimized memory freeing from the stream. If the GC finds that the stream is now empty, it calls the process manager and yields its timeslice inmediately.
I can't think of a reason why this couldn't work (in kernel and/or in user space).

However; the overhead involved in communication (e.g. inserting the details onto a queue; then unblocking a task, doing a task switch, retrieving the details from the queue, then blocking again when the queue is empty) is likely to be significantly higher than simply freeing it immediately (with none of the communication overheads).

I'd also suggest that "pseudo-GC" might not be the best name for it. If it doesn't "auto collect" then it's explicit memory management (where "free()" is deferred) and calling it something like "deferred free" might avoid some confusion. ;)


Cheers,

Brendan

Re: [SOLVED] Internal Kernel Pseudo-Garbage Collection

Posted: Thu Sep 04, 2014 7:32 am
by Rusky
Depending on your allocation pattern it might be easier to allocate a buffer once at the beginning of those allocation-intensive areas, then just pointer-bump allocate from it until you're done, and free it in one go.

Re: [SOLVED] Internal Kernel Pseudo-Garbage Collection

Posted: Thu Sep 04, 2014 8:28 am
by sortie
C is a bit too low-level for built-in language garbage collection due to it permitting some odd things (you can xor-encrypt pointers and later retrieve them - or is that strictly speaking UB?), but I suppose it's possible if you add more strict requirements on what you are allowed to do in C (i.e. your kernel is in a subset of C). It's reasonable, though, to construct a higher-level voluntary garbage-collection written in low-level C. That's probably the best approach. I think talloc that was mentioned earlier is one good example of that. I would generally be wary of garbage-collection in a kernel due to its asynchronous nature. You don't want it to run at particular times and I'm uncertain whether it is a good idea from a security point of view.

Besides, the idea of garbage collection is generally that you can pretend you have infinite memory. This is very much not the case in kernel code, where you have very finite resources and you need to manage them meticulously. I would recommend using the traditional malloc/free (with error checking) pattern as it is very fitting for kernel use.

Re: [SOLVED] Internal Kernel Pseudo-Garbage Collection

Posted: Thu Sep 04, 2014 8:50 am
by Combuster
sortie wrote:Besides, the idea of garbage collection is generally that you can pretend you have infinite memory.
I personally consider garbage collection as a tool that provides mental relief for the developer. I have no illusions that garbage collection solves physical memory issues, but only to make it less repetitive and error-prone.