Im in the process of writing my own programming language and I have decided to use garbage collection as a means for automatic memory management.
I have decided that I would like to implement a two-stage generational copying garbage collector. I know how a copying and generational collector works, but I've not sure how they work when put together. In my mind generational and copying are similar in being that they both seperate the heap into two semi-spaces. The only differnece is in how they collect the unused objects.
If someone could shield some light on this, I would appreciate it. thanks in advance.
Garbage collection
Re:Garbage collection
Hi,
well, generational GCs don't neccessarily split the heap into two semi-spaces in general, but in your case that's true, of course (although I don't think that people generally refer to it as "semi-spaces" in that case, in order not to confuse casual listeners any more than strictly neccessary).
Anyway, there's no reason for why what you suggest shouldn't work. In fact, Wilson's excellent (though no longer complete, due to age) survey of garbage collection mechanisms, http://www.cs.colorado.edu/~diwan/class-papers/gcsurvey.ps, discusses exactly this setup as an example for generational GC. I'd suggest you to have a look at pp25, and follow up any references you need-- basically, each generation will have its own two semispaces, and GC taking place in one generation will copy objects either to the other semispace in the same generation or to the target semispace in the "older" generation, depending on how many collections/scavenges it has survived.
-- Christoph
well, generational GCs don't neccessarily split the heap into two semi-spaces in general, but in your case that's true, of course (although I don't think that people generally refer to it as "semi-spaces" in that case, in order not to confuse casual listeners any more than strictly neccessary).
Anyway, there's no reason for why what you suggest shouldn't work. In fact, Wilson's excellent (though no longer complete, due to age) survey of garbage collection mechanisms, http://www.cs.colorado.edu/~diwan/class-papers/gcsurvey.ps, discusses exactly this setup as an example for generational GC. I'd suggest you to have a look at pp25, and follow up any references you need-- basically, each generation will have its own two semispaces, and GC taking place in one generation will copy objects either to the other semispace in the same generation or to the target semispace in the "older" generation, depending on how many collections/scavenges it has survived.
-- Christoph
Re:Garbage collection
Garbage collection is a set of methods for reclaiming dynamically allocated memory without having to explicitily deallocate it.
The three best-known methods of garbage collection are reference counting, mark-and-sweep, and stop-and-copy.
A reference counting collector works by having a counter for each allocated object (in the sense of something which is allocated, not necessarily objects in the OO sense). the counter is incremented whenever a reference to it is assigned, and decremented whenever a reference to it is eliminated (this obviously requires you to manage all references; you cannot freely assign pointers as is done in C). When the count reaches zero, the object is considered 'dead' and the memory can be freed. While they are simple to implement, they are vulnerable to cyclical structures (e.g., a points to b, which points to c, which points to a; since they all have a live reference, they can never get deallocated unless you specifically search for the circular references).
A mark-and-sweep collector periodically stops the running program whenever more than a certain percentage of memory is in use. It goes through each reference, and marks each object that it finds a reference to. When it is done, any unmarked objects are freed. However, because it has to stop the program, programs using a mark-and-sweep collector tend to come to a halt frequently, which can seriously impact performance.
In stop-and-copy, the system sets aside half of the memory at any given time. When the use half of the memory is full, the garbage collector stops the program, then goes through the references like in mark-and-sweep. However, instead of marking them, it copies them to the unused half of memory. when all of the live objects have been copied, the previously used half of memory is set as unused. The major advantage over mark-and-sweep is that it can be faster with many large objects, and has the side effect of reducing memory fragmentation; however, it can be much slower than mark-and-sweep if there are a lot of small objects.
These are just the simplest and best known approaches; many variants of these exist, and more sophisticated techniques are available as well, many of which have little or no impact on perceived performance. They all have considerable advantages over manual memory management in regards to stability, security and ease of use.
The three best-known methods of garbage collection are reference counting, mark-and-sweep, and stop-and-copy.
A reference counting collector works by having a counter for each allocated object (in the sense of something which is allocated, not necessarily objects in the OO sense). the counter is incremented whenever a reference to it is assigned, and decremented whenever a reference to it is eliminated (this obviously requires you to manage all references; you cannot freely assign pointers as is done in C). When the count reaches zero, the object is considered 'dead' and the memory can be freed. While they are simple to implement, they are vulnerable to cyclical structures (e.g., a points to b, which points to c, which points to a; since they all have a live reference, they can never get deallocated unless you specifically search for the circular references).
A mark-and-sweep collector periodically stops the running program whenever more than a certain percentage of memory is in use. It goes through each reference, and marks each object that it finds a reference to. When it is done, any unmarked objects are freed. However, because it has to stop the program, programs using a mark-and-sweep collector tend to come to a halt frequently, which can seriously impact performance.
In stop-and-copy, the system sets aside half of the memory at any given time. When the use half of the memory is full, the garbage collector stops the program, then goes through the references like in mark-and-sweep. However, instead of marking them, it copies them to the unused half of memory. when all of the live objects have been copied, the previously used half of memory is set as unused. The major advantage over mark-and-sweep is that it can be faster with many large objects, and has the side effect of reducing memory fragmentation; however, it can be much slower than mark-and-sweep if there are a lot of small objects.
These are just the simplest and best known approaches; many variants of these exist, and more sophisticated techniques are available as well, many of which have little or no impact on perceived performance. They all have considerable advantages over manual memory management in regards to stability, security and ease of use.
Re:Garbage collection
If you have 4kb of memory, and allocate 4 1kb objects A, B, C and D and free B and D now, in total you have 2kb memory free, however, you can't allocate a 2kb object as C is still in the way. That is why memory fragmentation can be a problem