When to Collect Garbage

AndrewAPrice · Post by **AndrewAPrice** » Wed Jun 11, 2014 2:55 pm

tjmonk15 wrote:Do you use any kind of reference counting? Or any easy way to add it to your vm?

No, but I could. However, reference counting doesn't handle circular references.

I found this interesting teardown() solution to circular references: http://stackoverflow.com/a/1069406/2268205
But it seems like it'd be awefully slow if you called teardown() on a large object with many properties (each which may point to other objects with many properties of their own) just because you reassigned a variable?

Candy · Post by **Candy** » Wed Jun 11, 2014 11:01 pm

I found this interesting teardown() solution to circular references: http://stackoverflow.com/a/1069406/2268205
But it seems like it'd be awefully slow if you called teardown() on a large object with many properties (each which may point to other objects with many properties of their own) just because you reassigned a variable?

The typical solutions to cyclic references in GC are:

- Mark and sweep
- Generational move-everything-all-the-time
- Explicitly setting to NULL, using dispose, ... any solution you're actually telling your user to do for you.

The first two depend on something being still reachable; Mark&Sweep just tags them & removes the rest, Generational assumes the majority is dead & moves the stuff that's still alive and then does a bulk delete. The latter is admitting that it's broken & telling your user to sweep up the crap.

Your link is an example of the 3rd - making all references into weak references so they don't count anymore.

embryo · Post by **embryo** » Thu Jun 12, 2014 7:02 am

MessiahAndrw wrote:how do I tell the GC to not collect a white object that has been moved to a black object, mid-GC cycle?

I do not use terms like black or white. For me an object is reachable or not, and that's all possible states. According to the JVM specification every object must be placed on the bytecode stack before it can be used. Then I just intercept the placement action and mark the object as reachable. Does it became white or gray or black - is irrelevant for me, because I know - it shouldn't be collected.

MessiahAndrw wrote:My OS's 'native' executable format is bytecode (example assembly) - I developed a high level language that compiles down into this bytecode (example). The OS is essentially a bare metal virtual machine.

Why have you invented your version of a bytecode? Is there some pros and cons?

And the language looks like JavaScript. Or is it different? What is the difference?

AndrewAPrice · Post by **AndrewAPrice** » Thu Jun 12, 2014 9:46 am

embryo wrote:
MessiahAndrw wrote:how do I tell the GC to not collect a white object that has been moved to a black object, mid-GC cycle?
I do not use terms like black or white. For me an object is reachable or not, and that's all possible states. According to the JVM specification every object must be placed on the bytecode stack before it can be used. Then I just intercept the placement action and mark the object as reachable. Does it became white or gray or black - is irrelevant for me, because I know - it shouldn't be collected.

'Grey' in the incremental GC papers I've read are aggregate types (arrays, objects) that are confirmed reachable, but haven't yet had their children scanned (they're added to a 'to scan' list.) 'Black' means the item is confirmed reachable, and we've already scanned all their children.

So anyway, my question with incremental garbage collection is, say we have this code:

Code: Select all

  ObjectA = new Object();
  ObjectB = new Object();

  ObjectB.Child = new Object(); // add child object

  // incremental GC gets called here

  ObjectA.Child = ObjectB.Child; // move child object from B to A
  ObjectB.Child = null;

  // incremental GC gets called here

We have 3 objects. We assign a 'Child' object to ObjectB. Then the garbage collector runs and performs a 'mark'. However, it's an incremental garbage collector, so rather than marking everything at once recursively (because this could potentially be a very large program) we incrementally mark.

On the first GC call, ObjectA and ObjectB are considered 'unmarked'. We walk up the call stack, we notice that ObjectA and ObjectB are unmarked, so we mark them, and notice that they are aggregate types that could potentially have children, so we add them to a "to scan". We can ObjectA, mark each of it's children (which in this case are none), and remove it from ObjectA from the "to scan" list. Our incremental GC considered enough time has run, so it returns control to the program, with ObjectB still sitting on the "to scan" list.

Back in our code, we move our "Child" object from ObjectB (on the to scan list) to ObjectA (already scanned).

Our GC is called again. We walk up the call stack, notice that ObjectA has already been marked (this is an incremental GC, we don't restart from scratch every time the GC is called, otherwise it wouldn't be incremental), and ObjectB is already on the to scan list, so we walk the 'to scan' list. We mark each of ObjectB's children, however it no longer has any child objects because we assigned ObjectB.Child to null.

And so our 'Child' object is never marked, and is collected, because we assigned it to an object that was already 'marked' and not scanned again in an incremental GC until the GC collector has finished collecting and begins the next cycle, in which case it's too late because our 'Child' has already been assumed to be garbage and cleaned up!

The only solution I've found online is to have a write barrier (put it in my array/object/closure's assignment instruction) that says "if my parent is marked, also mark me, so the GC doesn't overlook me".

There is a case where garbage isn't collected, for example:

Code: Select all

  ObjectA = new Object();
  ObjectB = new Object();

  ObjectB.Child = new Object(); // add child object

  // incremental GC gets called here

  ObjectA.Child = new Object(); //  SPECIAL CASE
  ObjectA.Child = ObjectB.Child; // move child object from B to A
  ObjectB.Child = null;

  // incremental GC gets called here

Between GC cycles, where ObjectA is already marked, we assign it a temporary object (our write barrier noticed it has already ObjectA is marked, so also marks the temporary object). Then, we overwrite the property with another object. We end up with a temporary object that is marked, but is also garbage.

We will have to wait until the next pass to collect that temporary garbage. I don't think this is actually a _bad thing_ because if the incremental GC continues to run at frequent intervals, it will eventually detect that.

embryo wrote:
MessiahAndrw wrote:My OS's 'native' executable format is bytecode (example assembly) - I developed a high level language that compiles down into this bytecode (example). The OS is essentially a bare metal virtual machine.
Why have you invented your version of a bytecode? Is there some pros and cons?

For the same reason we're all writing operating systems -for fun and learning.

embryo wrote:And the language looks like JavaScript. Or is it different? What is the difference?

Yes - it does, and at the basic level it's a curly-brace language with dynamic objects, first class functions, and closures. My language lacks JS features like prototypes and exceptions, adds features like memory buffers as native types, and eventually I want to add coroutines, threads (JS is not multithreaded), metaproperties (LUA style metatables), static types (so you can declare 'unsigned a;' instead of 'var a;' to help with eventual JIT compilation), interfaces, lazy evaluation, and structures. What you see now is the core get-something-running language.

embryo · Post by **embryo** » Fri Jun 13, 2014 7:25 am

MessiahAndrw wrote:'Grey' in the incremental GC papers I've read are aggregate types (arrays, objects) that are confirmed reachable, but haven't yet had their children scanned

Then my objects became grey when they got to the bytecode's stack.

MessiahAndrw wrote:So anyway, my question with incremental garbage collection is, say we have this code:

Code: Select all

  ObjectA = new Object();
  ObjectB = new Object();

  ObjectB.Child = new Object(); // add child object

  // incremental GC gets called here

  ObjectA.Child = ObjectB.Child; // move child object from B to A
  ObjectB.Child = null;

  // incremental GC gets called here

Essentially, your question is about reference leak in between the incremental GC runs. I catch all such leakers right at the moment they are used. In the code above the usage moment is:

Code: Select all

  ObjectA.Child = ObjectB.Child; // move child object from B to A

Right here the reference is places at the stack, marked as reachable, and (if it has any reference children) is placed into the "to be scanned" queue. That's how I catch the leakers.

MessiahAndrw wrote:My language lacks JS features like prototypes and exceptions, adds features like memory buffers as native types, and eventually I want to add coroutines, threads (JS is not multithreaded), metaproperties (LUA style metatables), static types (so you can declare 'unsigned a;' instead of 'var a;' to help with eventual JIT compilation), interfaces, lazy evaluation, and structures.

It seems you are playing not only with OS, but also with a language design. May be it is more simple just to try existing languages? But of course, it can be more interesting to play with your personal language.

AndrewAPrice · Post by **AndrewAPrice** » Fri Jun 13, 2014 11:54 am

embryo wrote:Then my objects became grey when they got to the bytecode's stack.

Essentially, your question is about reference leak in between the incremental GC runs. I catch all such leakers right at the moment they are used. In the code above the usage moment is:
Code: Select all
  ObjectA.Child = ObjectB.Child; // move child object from B to A
Right here the reference is places at the stack, marked as reachable, and (if it has any reference children) is placed into the "to be scanned" queue. That's how I catch the leakers.

That's exactly what I'm thinking. It looks like we're on the same page!

embryo wrote:It seems you are playing not only with OS, but also with a language design. May be it is more simple just to try existing languages? But of course, it can be more interesting to play with your personal language.

Yes. I've logically separated the code between the virtual machine and kernel - so I can integrate the stand-alone VM in other projects - but I have an interest in topics like language design and software isolation, so I decided to build my whole OS around it.

The disadvantage (but I personally consider it fun) is that there's no legacy libraries or code to support or port. I have to re-implement everything.

embryo · Post by **embryo** » Sat Jun 14, 2014 8:01 am

MessiahAndrw wrote:I've logically separated the code between the virtual machine and kernel - so I can integrate the stand-alone kernel in other projects

And also you can integrate your VM in other projects. You can have a lot of fun

OSDev.org

When to Collect Garbage

Re: When to Collect Garbage

Re: When to Collect Garbage

Re: When to Collect Garbage

Re: When to Collect Garbage

Re: When to Collect Garbage

Re: When to Collect Garbage

Re: When to Collect Garbage