Kernel level garbage-collection discussion

Discussions on more advanced topics such as monolithic vs micro-kernels, transactional memory models, and paging vs segmentation should go here. Use this forum to expand and improve the wiki!
User avatar
dozniak
Member
Member
Posts: 723
Joined: Thu Jul 12, 2012 7:29 am
Location: Tallinn, Estonia

Re: Kernel level garbage-collection discussion

Post by dozniak »

Just remember, that when GC is running all other threads are stopped.

If you put GC at the lowest level, this means that for running a GC loop you will have to stop all the threads on the machine in entirety.
Learn to read.
OSwhatever
Member
Member
Posts: 595
Joined: Mon Jul 05, 2010 4:15 pm

Re: Kernel level garbage-collection discussion

Post by OSwhatever »

tjmonk15 wrote:bluemoon and OSWhatever: Locks aren't really problem in terms of GC due to the fact the the GC should not need locks period. It simply "loops" over all "objects" and determines if there are any references to it. If the reference count changes mid check, it is likely decreasing by 1, in which case it will be collected next check (if it reached 0), or it is increasing by 1 (where the count was already >= 1) and shouldn't be collected anyways.

As long as at object creation the reference count is set to 1 before being added to the GC Manager, and following those assumptions above, you will never need a lock.

There are, of course, edge cases to consider, such as objects that shouldn't be GC'd, because they are found/modified using pointer arithmetic (page tables? :)) and there should be a way in the memory manager to request non-GC'd memory for this purpose.

- Monk
GC usually also implicitly mean that you have some functionality that move objects in memory in order to reduce fragmentation. Typically your Java runtime has something like this. This is however not mandatory for a GC system.
User avatar
Owen
Member
Member
Posts: 1700
Joined: Fri Jun 13, 2008 3:21 pm
Location: Cambridge, United Kingdom
Contact:

Re: Kernel level garbage-collection discussion

Post by Owen »

dozniak wrote:Just remember, that when GC is running all other threads are stopped..
Only with non-concurrent collector designs. Concurrent collector designs have existed for years (and both Java and .net use them)
User avatar
dozniak
Member
Member
Posts: 723
Joined: Thu Jul 12, 2012 7:29 am
Location: Tallinn, Estonia

Re: Kernel level garbage-collection discussion

Post by dozniak »

Owen wrote:
dozniak wrote:Just remember, that when GC is running all other threads are stopped..
Only with non-concurrent collector designs. Concurrent collector designs have existed for years (and both Java and .net use them)
As far as I understand .net concurrent collector, it uses N (where N usually equals number of processor cores) garbage collector threads to compact things faster, while all other threads are frozen waiting for GC to complete. Not sure about Java ones.
Learn to read.
FallenAvatar
Member
Member
Posts: 283
Joined: Mon Jan 03, 2011 6:58 pm

Re: Kernel level garbage-collection discussion

Post by FallenAvatar »

Owen wrote:
dozniak wrote:Just remember, that when GC is running all other threads are stopped..
Only with non-concurrent collector designs. Concurrent collector designs have existed for years (and both Java and .net use them)
For an OS level, instead of a VM level, GC solution, putting this in an idle thread and allowing it to be interupted/pre-empted would make it concurrent I guess? But really, it wouldn't matter. Especially, when you make the assumptions you can when you are at the OS level instead on at the VM(user-space) level.
OSwhatever wrote:
tjmonk15 wrote:bluemoon and OSWhatever: Locks aren't really problem in terms of GC due to the fact the the GC should not need locks period. It simply "loops" over all "objects" and determines if there are any references to it. If the reference count changes mid check, it is likely decreasing by 1, in which case it will be collected next check (if it reached 0), or it is increasing by 1 (where the count was already >= 1) and shouldn't be collected anyways.

As long as at object creation the reference count is set to 1 before being added to the GC Manager, and following those assumptions above, you will never need a lock.

There are, of course, edge cases to consider, such as objects that shouldn't be GC'd, because they are found/modified using pointer arithmetic (page tables? :)) and there should be a way in the memory manager to request non-GC'd memory for this purpose.

- Monk
GC usually also implicitly mean that you have some functionality that move objects in memory in order to reduce fragmentation. Typically your Java run-time has something like this. This is however not mandatory for a GC system.
As you point out, this is not a required part of a GC implementation, and I think more importantly, that is it part of a VM implementation (not GC at all.) And in this case you can not make the same assumptions you can about VM/fragmentation routines, that you can about GC routines. The important part, that I see anyways, is segregating your code properly, so that you can make as many assumptions as possible (separation of concerns/responsibilities) such that your code can be more performant.

- Monk
OSwhatever
Member
Member
Posts: 595
Joined: Mon Jul 05, 2010 4:15 pm

Re: Kernel level garbage-collection discussion

Post by OSwhatever »

tjmonk15 wrote:As you point out, this is not a required part of a GC implementation, and I think more importantly, that is it part of a VM implementation (not GC at all.) And in this case you can not make the same assumptions you can about VM/fragmentation routines, that you can about GC routines. The important part, that I see anyways, is segregating your code properly, so that you can make as many assumptions as possible (separation of concerns/responsibilities) such that your code can be more performant.

- Monk
If there is no memory fragmentation involved, then I have some sort of GC system in my kernel. First of all, many objects are reference counted and when the reference reaches zero it is moved to list that a low prio thread is picking from and deletes the objects. The reference counting is not for the purpose of GC but that ensuring that there are no references to the object before deleting it. In an SMP system, reference counting is many times necessary as if you want to delete an object on one CPU you have no idea if you really can do it because some other CPU might be playing with the object and this is conveniently solved with reference counting.
FallenAvatar
Member
Member
Posts: 283
Joined: Mon Jan 03, 2011 6:58 pm

Re: Kernel level garbage-collection discussion

Post by FallenAvatar »

OSwhatever wrote:
tjmonk15 wrote:As you point out, this is not a required part of a GC implementation, and I think more importantly, that is it part of a VM implementation (not GC at all.) And in this case you can not make the same assumptions you can about VM/fragmentation routines, that you can about GC routines. The important part, that I see anyways, is segregating your code properly, so that you can make as many assumptions as possible (separation of concerns/responsibilities) such that your code can be more performant.

- Monk
If there is no memory fragmentation involved, then I have some sort of GC system in my kernel. First of all, many objects are reference counted and when the reference reaches zero it is moved to list that a low prio thread is picking from and deletes the objects. The reference counting is not for the purpose of GC but that ensuring that there are no references to the object before deleting it. In an SMP system, reference counting is many times necessary as if you want to delete an object on one CPU you have no idea if you really can do it because some other CPU might be playing with the object and this is conveniently solved with reference counting.
I think you are confusing some ideas here. GC is all about reference counting, and automatically (without [delete *ptr] or similar {my c++ is a lil rusty with ptrs}) deleting objects. If you do not have GC you MUST [delete] objects your self. You can reference count yourself and then delete when it reaches 0, but the idea of GC is that some other process/thread/mechanism does this for you.

By what we both said above, I would agree that you have GC in your kernal. But I fail to see what fragmentation has to do with this? TBH, I fail to see the benefit of "fixing" (memory) fragmentation of a process that is running. To me, relocating objects in memory would break pointer's abound, and may cause some "nifty"/fast pointer math to fail. Unles you do it in physical memory and update virtual memory accordingly. But since ALL memory access go through the page tables/cache I think you are more likely to shoot yourself in the foot than cause a speed-up, no?

I may be way off base with that previous statement, but as I think about it logically, my statement makes sense, if you think about how things work at a low level. Please, anyone here, correct me if I'm wrong.

- Monk
greyOne
Member
Member
Posts: 58
Joined: Sun Feb 03, 2013 10:38 pm
Location: Canada

Re: Kernel level garbage-collection discussion

Post by greyOne »

My kernel actually has something remotely similar to a garbage collector. It less of a collector, more like insurance against memory leaks in the for a task-specific table of allocated memory. When a call to malloc() or similar is made within the space of a Task, a reference to that memory is save in the table. When a call to free() is made, the reference is removed.

Once the task is finished, cleanup runs and everything left in the table is deallocated, followed by the table itself. It seems to work just fine, and in some sense, does qualify as garbage collection, in that it prevent garbage from hanging around in memory. It doesn't add all too much overhead, since every entry is 4 only bytes (the size of a pointer). I'm pretty sure though that this is probably something that isn't all too uncommon.

My 2 cents.
gerryg400
Member
Member
Posts: 1801
Joined: Thu Mar 25, 2010 11:26 pm
Location: Melbourne, Australia

Re: Kernel level garbage-collection discussion

Post by gerryg400 »

tjmonk15 wrote:
OSwhatever wrote:
tjmonk15 wrote:As you point out, this is not a required part of a GC implementation, and I think more importantly, that is it part of a VM implementation (not GC at all.) And in this case you can not make the same assumptions you can about VM/fragmentation routines, that you can about GC routines. The important part, that I see anyways, is segregating your code properly, so that you can make as many assumptions as possible (separation of concerns/responsibilities) such that your code can be more performant.

- Monk
If there is no memory fragmentation involved, then I have some sort of GC system in my kernel. First of all, many objects are reference counted and when the reference reaches zero it is moved to list that a low prio thread is picking from and deletes the objects. The reference counting is not for the purpose of GC but that ensuring that there are no references to the object before deleting it. In an SMP system, reference counting is many times necessary as if you want to delete an object on one CPU you have no idea if you really can do it because some other CPU might be playing with the object and this is conveniently solved with reference counting.
I think you are confusing some ideas here. GC is all about reference counting, and automatically (without [delete *ptr] or similar {my c++ is a lil rusty with ptrs}) deleting objects. If you do not have GC you MUST [delete] objects your self. You can reference count yourself and then delete when it reaches 0, but the idea of GC is that some other process/thread/mechanism does this for you.

By what we both said above, I would agree that you have GC in your kernal. But I fail to see what fragmentation has to do with this? TBH, I fail to see the benefit of "fixing" (memory) fragmentation of a process that is running. To me, relocating objects in memory would break pointer's abound, and may cause some "nifty"/fast pointer math to fail. Unles you do it in physical memory and update virtual memory accordingly. But since ALL memory access go through the page tables/cache I think you are more likely to shoot yourself in the foot than cause a speed-up, no?

I may be way off base with that previous statement, but as I think about it logically, my statement makes sense, if you think about how things work at a low level. Please, anyone here, correct me if I'm wrong.

- Monk
Some garbage collectors do use compaction of live objects. Remember not all languages allow pointers and pointer arithmetic. The benefit is that it is trivially easy to allocate memory from a compacted memory pool. ( pret = p; p += size; return pret; )
If a trainstation is where trains stop, what is a workstation ?
FallenAvatar
Member
Member
Posts: 283
Joined: Mon Jan 03, 2011 6:58 pm

Re: Kernel level garbage-collection discussion

Post by FallenAvatar »

greyOne wrote:My kernel actually has something remotely similar to a garbage collector. It less of a collector, more like insurance against memory leaks in the for a task-specific table of allocated memory. When a call to malloc() or similar is made within the space of a Task, a reference to that memory is save in the table. When a call to free() is made, the reference is removed.

Once the task is finished, cleanup runs and everything left in the table is deallocated, followed by the table itself. It seems to work just fine, and in some sense, does qualify as garbage collection, in that it prevent garbage from hanging around in memory. It doesn't add all too much overhead, since every entry is 4 only bytes (the size of a pointer). I'm pretty sure though that this is probably something that isn't all too uncommon.

My 2 cents.
When a process is ended (either a process initiated quit/stop/end message, or a user initiated quit) all of its address space (in a paged protected mode setup) is invalidated and therefore free for use, without any extra steps.

The main way to solve a memory leak as a user on windows is to kill the task and restart it, see above for why this works. This is NOT GC. GC runs while any other/all other processes are running, and collects memory that is not being used and frees it as it can.

- Monk
Last edited by FallenAvatar on Thu Mar 21, 2013 6:06 pm, edited 1 time in total.
greyOne
Member
Member
Posts: 58
Joined: Sun Feb 03, 2013 10:38 pm
Location: Canada

Re: Kernel level garbage-collection discussion

Post by greyOne »

tjmonk15 wrote: When a process is ended (either a process initiated quit/stop/end message, and a user initiated quit) all of its address space (in a paged protected mode setup) is invalidated and therefore free for use, without any extra steps.

The main way to solve a memory leak as a user on windows is to kill the task and restart it, see above for why this works. This is NOT GC. GC runs while any other/all other processes are running, and collects memory that is not being used and frees it as it can.

- Monk
When a process ends, sure.
However threads share their parent process's address space, so my point stands. (At least, they do in my implementation.)
They do however have their own stack.
FallenAvatar
Member
Member
Posts: 283
Joined: Mon Jan 03, 2011 6:58 pm

Re: Kernel level garbage-collection discussion

Post by FallenAvatar »

greyOne wrote:
tjmonk15 wrote: When a process is ended (either a process initiated quit/stop/end message, and a user initiated quit) all of its address space (in a paged protected mode setup) is invalidated and therefore free for use, without any extra steps.

The main way to solve a memory leak as a user on windows is to kill the task and restart it, see above for why this works. This is NOT GC. GC runs while any other/all other processes are running, and collects memory that is not being used and frees it as it can.

- Monk
When a process ends, sure.
However threads share their parent process's address space, so my point stands. (At least, they do in my implementation.)
They do however have their own stack.
I don't think a thread, and all of it's objects, should be collected when a thread ends. It should be perfectly valid for a thread to do some work on input (process variables) and provide any size of output (thread variables) and have that information stay resident/available to the parent Process.

What I suggested should only happen on Process end (the above quote and my previous point alludes to a combination of system and user input instead of either or, I have corrected it.) And on thread end would be a perfect importunity to invoke the GC for a few ms.
Post Reply