Native Garbage Collection
Posted: Sat Jan 15, 2011 12:53 am
I'm writing an OS, which is designed to be 100% object oriented. That is to say, the entire OS consists of logical units that can be thought of as something like instances of C++ classes. Because references to objects would get passed around quite a bit in this design, making manual memory management problematic, I think it would make a lot of sense to support garbage collection for the entire system at the Kernel level.
I was looking at Wikipedia's article on garbage collection, where it talks about "Tri-colour marking":
How would one get around this issue? I can think of a few ways, but they would all require user level code to cooperate with the garbage collector. For instance you could notify the garbage collector whenever a reference to an object is assigned. But this would add significant overhead, and may potentially pose a security risk, so it's far from a viable solution.
I was looking at Wikipedia's article on garbage collection, where it talks about "Tri-colour marking":
I think I understand what they are saying, but there is an important case that appears to be unaccounted for. Let's say the garbage collector runs during the following code:Wikipedia wrote:Because of these pitfalls, most modern tracing garbage collectors implement some variant of the tri-colour marking abstraction, but simple collectors (such as the mark-and-sweep collector) often do not make this abstraction explicit. Tri-colour marking works as follows:
Create initial white, grey, and black sets; these sets will be used to maintain progress during the cycle.
Initially the white set or condemned set is the set of objects that are candidates for having their memory recycled.
The black set is the set of objects that cheaply can be proven to have no references to objects in the white set; in many implementations the black set starts off empty.
The grey set is all the objects that are reachable from root references but the objects referenced by grey objects haven't been scanned yet. Grey objects are known to be reachable from the root, so cannot be garbage collected: grey objects will eventually end up in the black set. The grey state means we still need to check any objects that the object references.
The grey set is initialised to objects which are referenced directly at root level; typically all other objects are initially placed in the white set.
Objects can move from white to grey to black, never in the other direction.
Pick an object from the grey set. Blacken this object (move it to the black set), by greying all the white objects it references directly. This confirms that this object cannot be garbage collected, and also that any objects it references cannot be garbage collected.
Repeat the previous step until the grey set is empty.
When there are no more objects in the grey set, then all the objects remaining in the white set have been demonstrated not to be reachable, and the storage occupied by them can be reclaimed.
The 3 sets partition memory; every object in the system, including the root set, is in precisely one set.
The tri-colour marking algorithm preserves an important invariant:
No black object points directly to a white object.
This ensures that the white objects can be safely destroyed once the grey set is empty. (Some variations on the algorithm do not preserve the tricolour invariant but they use a modified form for which all the important properties hold.)
The tri-colour method has an important advantage: it can be performed 'on-the-fly', without halting the system for significant time periods. This is accomplished by marking objects as they are allocated and during mutation, maintaining the various sets. By monitoring the size of the sets, the system can perform garbage collection periodically, rather than as-needed. Also, the need to touch the entire working set each cycle is avoided.
Code: Select all
A = B;
C = D;
//garbage collector thread starts
//at the start B and D have been allocated and are referenced by A and C respectively (no other references to these objects exist)
//A and C are accessible from the root, and are placed in the grey set
//B and D are placed in the white set, as it is not yet known if they are referenced
//A's references are checked. A is moved to the black set, and B is moved into the grey set
//garbage collector thread yields
A, C = C, A; //swap A's reference with C's
//garbage collector thread resumes
//C's references are checked and B is found. C is moved to the black set, and B stays in the grey set
//D remains in the white set
//no other references to D are found, so it is deallocated