OSDev.org

Posted: **Mon Jun 09, 2014 7:20 pm**

My operating system runs all applications inside of a virtual machine. Due to the design of the virtual machine (and the language that runs on top of it) garbage collection is required.

I'm doing a stop-the-world mark and sweep algorithm. I may eventually go for an incremental GC but for now a simple mark and sweep algorithm works. The actual collection algorithm is not what I'm asking about. My question is: when should the garbage collector run?

The most common answer I see online is to call the garbage collector when you run out of memory. This would work if I had a single garbage collection pool for the entire system, but I dislike the idea of having a single pool for the entire system because;

- Having a GC pool per process (by which I mean each process contains its own pool of allocated objects and the garbage collector only collects for the specific process calling it) would be faster because it would be much more fine grained (scanning just your process's object, not every allocation in the system.)
- A system wide GC would stop the entire system, and a malicious program could keep invoking the collector slowing down the entire system. If it's per process, only the process that is collecting needs to stop.
- If each process has its own independent instance of the garbage collector multiple processes can collect in parallel.

However, if I only invoke the GC when the system is low on memory, and a process cannot invoke another's collector, how will the garbage another process is sitting on be returned to the system?

I've though about this and possible solutions. I'm not a garbage collector expert, and I'd really like other people's thoughts on this.

My first solution is not to only invoke the collector when the system runs out of memory, but also when a process attempts to cross a memory threshold, for example, 1MB, 10MB, 25MB, 50MB, 100MB, 250MB, etc.

My other solution to avoid processes from sitting on garbage is to start a timer when a process performs a memory allocation, and call the GC, say, 30 seconds since the first memory allocation in the process. If memory isn't allocated, the timer never starts and the process will never freeze.

What do you think?

Posted: **Mon Jun 09, 2014 11:38 pm**

Hi,

MessiahAndrw wrote:What do you think?

For performance, the best possible time to collect a piece of garbage is immediately after the last time its used, because:

It maximises the chance that the same memory can be re-allocated while its still in the CPUs cache, and therefore reduces the chance of "cache miss after alloc" later
It spreads out the load of collecting many pieces of garbage - instead of ending up with "lag spikes" caused by collecting many pieces of garbage at once you end up with lots of tiny little collections spread everywhere; which is extremely important for anything involving latency (which is anything that involves responding to any kind of request, which is about 90% of all software - e.g. responding to network packets, user mouse clicks, etc)
It avoids the need to search for garbage (you know where it is because you just used it for the last time), which severely reduces overhead
It maximises the amount of free RAM available at any point in time

The problem becomes knowing when something is used for the last time. The most sensible way to do that is to have some sort of special/explicit "this won't be used again" notification that software uses.

There is only one case where "immediately after the last time its used" is not ideal. When a process exits, it's more efficient to just let the OS destroy the virtual address space as a whole without bothering to free any/all individual little pieces separately beforehand. In this case, if you were using "this won't be used again notifications" then it'd be simple for software to omit those notifications in "clean-up code" used during process exit.

Finally; for a lot of cases data is cached by processes. For a simple example, for a web browser you might have DNS lookups being cached, plus a few PNG/JPEG pictures where the decoded pixel data is cached, plus several different web page's HTML/CSS data cached, plus pre-compiled javascript cached, etc. All of this cached data is technically "in use", but it may still be freed despite being in use (as long as software is given a chance to tidy up any loose ends). Ideally, if the OS is running out of RAM it should ask processes to free up some space, and processes should respond by freeing (some or all, depending on how desperate the OS is) of the RAM they're using to cache things (by tying up loose ends and using the "this won't be used again notifications").

Of course when I say "this won't be used again notifications" what I really mean is something like "free()" or "delete()".

Basically, what I think is that garbage collection is crap.

NOTE 1: I am not saying that memory management in languages like C/C++ is good either - the processes aren't asked to free cached data when the OS is running out of RAM, there's poor cache locality, and it's harder than it should be for programmers to identify memory leaks. I think "leak detection tools" could and should be built into the system; and (for example) "malloc()" could and should accept a second argument saying what the data is for and/or where the data was allocated to make it easy for tools to report where memory is being allocated and/or what it's being used for. Maybe (e.g. if/when a processes is being run in "debug mode") you could use "mark and sweep" to detect memory leaks and inform developers if/when they've failed to free something.

NOTE 2: In general; memory is just one of many different types of resource. There's also things like file handles, network connections, timer alarms, threads, etc. It seems strange to me that some environments assume programmers are too incompetent to get memory management right, but also assume that programmers are competent enough to manage all the other types of resources without an automated babysitter.

Cheers,

Brendan

Posted: **Tue Jun 10, 2014 2:11 am**

MessiahAndrw wrote:The most common answer I see online is to call the garbage collector when you run out of memory. This would work if I had a single garbage collection pool for the entire system, but I dislike the idea of having a single pool for the entire system because;

- Having a GC pool per process (by which I mean each process contains its own pool of allocated objects and the garbage collector only collects for the specific process calling it) would be faster because it would be much more fine grained (scanning just your process's object, not every allocation in the system.)
- A system wide GC would stop the entire system, and a malicious program could keep invoking the collector slowing down the entire system. If it's per process, only the process that is collecting needs to stop.
- If each process has its own independent instance of the garbage collector multiple processes can collect in parallel.

However, if I only invoke the GC when the system is low on memory, and a process cannot invoke another's collector, how will the garbage another process is sitting on be returned to the system?

What I'd do is have each process do its own GC, and have its garbage collector return memory to its own free pool rather than to the system. When a process makes an allocation, it first tries to make that allocation from its own free pool. If that does not succeed, it runs its garbage collector to try to expand the free pool. If the free pool is still not large enough, the process requests memory from the system.

When the system is unable to fulfill a memory request from a process, it does a GC run on its own memory. If that does not provide sufficient memory, the system starts signaling processes that a low memory condition exists. The process can choose to handle the low memory signal as it wishes, but the default handler for the signal would involve running the process's garbage collector and then releasing the process's free pool back to the system. If there still isn't enough memory, then there isn't really a good option, and you have a choice between either returning to the process that requested the memory with an allocation failure or invoking something like the Linux OOM killer.

Posted: **Tue Jun 10, 2014 3:40 am**

Out of interest, do you allow processes to exchange pointers with each other? In which case you will need to examine the state of all processes to perform collection for just one process, as the only live reference to an object in one process's pool could be within the stack or heap of another process. In this situation, a global pool may be more beneficial. Alternatively, have a separate pool for shared objects where a process needs to state explicitly at allocation time that an object may be shared.

Also, I'd probably consider separating timing of collections from out of physical memory conditions. The normal way to deal with these is to swap to disk. You'd probably want to consider collections at pre-determined time intervals, as well as when a process is approaching a certain heap size limit, which you can dynamically change at run-time. For example, if a process reaches this limit, and yet you find there is nothing to free on multiple GC runs, you may want to increase the limit for that process to reduce the GC overhead. You may also find that you exhaust all available physical memory and swap space in the system, and yet there is no dead data to free, at which point you need to look at killing processes as a whole.

Finally, as you have control over the VM and can analyze the byte-code, would it make sense to consider scheduling collections at certain points in the control-flow? e.g. after large loops or on return from complex functions?

Regards,
John.

Posted: **Tue Jun 10, 2014 4:36 am**

MessiahAndrw wrote:However, if I only invoke the GC when the system is low on memory, and a process cannot invoke another's collector, how will the garbage another process is sitting on be returned to the system?

It can be treated as a part of process memory pool. As Brendan has said above - the OS can ask processes to free their pools when OS memory is low, then asked process can start GC and free other not important caches.

MessiahAndrw wrote:My first solution is not to only invoke the collector when the system runs out of memory, but also when a process attempts to cross a memory threshold, for example, 1MB, 10MB, 25MB, 50MB, 100MB, 250MB, etc.

My other solution to avoid processes from sitting on garbage is to start a timer when a process performs a memory allocation, and call the GC, say, 30 seconds since the first memory allocation in the process. If memory isn't allocated, the timer never starts and the process will never freeze.

Such enforcement is a bit inefficient. GC will clean only small part of garbage and overhead of frequent runs will be great. A bit better solution is to start incremental GC at these thresholds. It also will lead to overhead, but the overhead in this case is less significant because of non invasive nature of the incremental GC (better user experience).

In general - the GC research has a great body with many options considered. It is better to look at your goals and optimize the GC for them using your runtime statistics. At least if important goals are achieved then you can consider your GC as a good one.

Posted: **Tue Jun 10, 2014 4:57 am**

Brendan wrote:For performance, the best possible time to collect a piece of garbage is immediately after the last time its used

Yes. But as everywhere in this world we have to trade superiority for achievability. And such trade can end with some unexpected results.

Brendan wrote:It spreads out the load of collecting many pieces of garbage - instead of ending up with "lag spikes" caused by collecting many pieces of garbage at once you end up with lots of tiny little collections spread everywhere

Yes. But spread ratio can be greater. Why not to spread small collections instead of tiny?

Brendan wrote:which is extremely important for anything involving latency

Is the latency of a few milliseconds acceptable?

Brendan wrote:It avoids the need to search for garbage (you know where it is because you just used it for the last time), which severely reduces overhead

Yes. But it is traded for programmer efforts concentrated on manual memory allocation and deallocation with following heavy bug hunting. Time to market is paid more than time efficiency in microseconds, this is also part of our world (but may be not very good part).

Brendan wrote:It maximises the amount of free RAM available at any point in time

Yes. But having 4 GB PC have you ever bothered to free memory for another window of your browser? And even in the server environment it is much better to use the best compiler instead of the best efficient programmer. At least it is cheaper.

Brendan wrote:Basically, what I think is that garbage collection is crap.

I understand you, but unfortunately you seems still have to rethink the trade between the best performance and the best development speed.

Brendan wrote:In general; memory is just one of many different types of resource. There's also things like file handles, network connections, timer alarms, threads, etc. It seems strange to me that some environments assume programmers are too incompetent to get memory management right, but also assume that programmers are competent enough to manage all the other types of resources without an automated babysitter.

And yes again. But now here is no "but". I agree - we need all those "automated babysitters" for file handles, network connections, timer alarms, threads, etc. Let the system manage our resources and free our time! But if we see an opportunity to make performance better with some hints to the system - why not to do it?

Posted: **Tue Jun 10, 2014 5:02 am**

linguofreak wrote:What I'd do is have each process do its own GC, and have its garbage collector return memory to its own free pool rather than to the system.

Your description looks like jEmbryoS actually works. Nice to meet something resembling my vision. But even my vision was inspired by existing JVMs. It means such approach already has a long story behind and also has a lot of supporters.

Posted: **Tue Jun 10, 2014 5:13 am**

jnc100 wrote:Out of interest, do you allow processes to exchange pointers with each other?

It can be a great problem. In my case only global classes are shared. But most of those classes have been borrowed from Open JDK and was not intended for such usage (as the common base for operating system). As a result there still is some hidden pointer exchange behind the scene which leads to freeing still used data. So - it is important to plan process interaction in details and as early as possible.

jnc100 wrote:as you have control over the VM and can analyze the byte-code, would it make sense to consider scheduling collections at certain points in the control-flow? e.g. after large loops or on return from complex functions?

It depends on analysis quality. But if you can make the tool that will see a great memory allocation followed by a great pile of a garbage, then you just have no need to run GC, but all you have to do is to free the pile without any search for garbage. It is better to include such tool in a compiler part of the system.

Posted: **Tue Jun 10, 2014 11:35 am**

As with most things involving development, there is no universal "best" way to do things. Some things work better in certain cases, but not in others. Very rarely do you end up with the best solution for every occasion.

Garbage collection is one of those things that has been rolling around in my head for a few years, and something that I will probably need to tackle at some point in my OS.

One approach that I like to take when I'm trying to decide on how to implement something is to find a real-world corollary situation that has already been solved, and try to apply that solution to my technical problem.

For instance, in your case, think about the "Recycle Bin" sitting on your desktop. It's very much the same type of situation as garbage collection.

Or, even better, an actual garbage can in your house. You probably have more than one garbage can in your house, because it's not terribly convenient to carry all of your garbage to a central location every time you need to throw something away. Then, at regular intervals, you (or someone) probably "collects" all of the garbage from all of the garbage cans and takes them to a central location and empties them. When the garbage can gets full, you can't really put more garbage in it until it has been emptied. So, you can either wait until it is full, then stop what you are doing and empty it, and then go back to working on your task, or you can put a recurring task on your to-do list to empty the trash every day, when you are not busy doing something else. Then the trash will, for all intents and purposes, always be empty when you need to throw something away.

Moving back to the problem of memory management, collecting garbage at idle time makes a lot of sense, considering your machine is idle a lot. But collecting garbage when you are out of memory is probably something you are going to have to do, regardless.

Posted: **Tue Jun 10, 2014 12:03 pm**

jnc100 wrote:Out of interest, do you allow processes to exchange pointers with each other?

No, I do copy-on-share for objects (except for memory buffers which will be a special case of shared memory, not covered by the GC.) Sharing pointers (especially function pointers) between processes is really cool, but would be a pain to cleanup unless I had a single global garbage collector.

I want to avoid the single global garbage collector because I don't want to stop-the-world, just a single process (e.g. if a badly written program keeps invoking the GC, I only want that process to stop.) Also, garbage collection per process is a very easy way to spatially divide the garbage collector, making a single collection faster since it only garbage collectors for a single process.

embryo wrote:A bit better solution is to start incremental GC at these thresholds. It also will lead to overhead, but the overhead in this case is less significant because of non invasive nature of the incremental GC (better user experience).

I like the sound of incremental sweeping. Right now, my GC is a simple mark-and-sweep. It walks up the local stack, marking each item it encounters (objects, arrays, function pointers (closures) are slightly more complicated because I then walk their properties) and mark each object. I then free every unmarked object.

A very simple way to make this incremental is to turn it into a coroutine:

Code: Select all

gc_incremental() {
start:
  scan_objects();
  yield;
  
  start_time = now;
  while(objects_to_free) {
    free_object;

     if(now > start_time + 100ms) {
         yeild;
         start_time = now;
     }
  }
  yeild;
  goto start; // restart
}

On the first pass, mark each object. On subsequent passes, free for 100 ms. When there are no more objects to free, and gc_incremental is called again restart the first pass. The only problem with this method is if marking takes a long time.

Re incremental marking: it would require a write barrier that marks objects on assignment. I don't see how I could avoid rewalking the call stack on each marking pass (because the call stack will differ between incremental markings)? However, walking the call stack would be fairly quick - I expect in a real world process, the majority of the time would be spent walking through objects not the call stack.

Brendan wrote:Finally; for a lot of cases data is cached by processes

My VM caches some objects like strings into a string table. This saves memory and makes string based property look-ups and comparisons very fast (any two strings will have the same pointer).

Brendan wrote:The most sensible way to do that is to have some sort of special/explicit "this won't be used again" notification that software uses.

I have a special case object (buffers) that do allow this. They can be garbage collected, but the user can also explicitly dispose of them, immediately freeing them. This will be useful for storing large items like media (images, audio), binary data, etc.

linguofreak wrote:What I'd do is have each process do its own GC, and have its garbage collector return memory to its own free pool rather than to the system. When a process makes an allocation, it first tries to make that allocation from its own free pool. If that does not succeed, it runs its garbage collector to try to expand the free pool. If the free pool is still not large enough, the process requests memory from the system.

That is a nice idea.

Anyway, when should the GC be called (incremental or full)?

I'm thinking:
- Continuous - trigger an incremental collection once every, say, 5 seconds as long as the process isn't sleeping. This will ensure that garbage will never sit around.
- Out of memory - trigger a full garbage collection if the system runs out of memory.
- On demand - the process explicitly calls the garbage collector (e.g. the user closed a file, or finished playing a level.)

Posted: **Tue Jun 10, 2014 1:40 pm**

Triggering when crossing some threshold (tuned so you don't GC too often but also don't have long collections of large heaps) is a good idea.

Collecting periodically will probably get in the way more than it will help, but you can trigger GC when the thread blocks and thus would ordinarily be doing nothing.

Collecting on a signal from the OS that it needs more memory is also a good idea. If you combine this with process control over swapping, you could even run the GC instead of swapping out, which will probably be a huge performance win.

You do also need to deal with non-memory resource management, which could also benefit from a similar signal from the OS.

Posted: **Tue Jun 10, 2014 2:46 pm**

Well, you can always mix and match algorithms. If you keep track of how many stack and heap references an object has, you can free immediately when both counts reach 0, and you can also guarantee that something with stack references never needs collecting.

Posted: **Wed Jun 11, 2014 6:02 am**

MessiahAndrw wrote:I like the sound of incremental sweeping.

And there is one more thing - concurrent GC. But it's only my term, outside of any classification.

MessiahAndrw wrote:Right now, my GC is a simple mark-and-sweep. It walks up the local stack, marking each item it encounters (objects, arrays, function pointers (closures) are slightly more complicated because I then walk their properties) and mark each object. I then free every unmarked object.

Do all of your objects sit on the stack?

MessiahAndrw wrote:A very simple way to make this incremental is to turn it into a coroutine

In fact you need to consider the ongoing activity of your process. It creates new objects and leaves them unused between the incomplete runs of your GC.

MessiahAndrw wrote:Re incremental marking: it would require a write barrier that marks objects on assignment.

If incremental GC works in the target process then there is no need for barriers.

MessiahAndrw wrote:Anyway, when should the GC be called (incremental or full)?

In my case the concurrent GC is started when memory usage threshold is passed. If concurrent works not fast enough, then (as a fall-back) the stop-the-world is started.

And by the way - what is your OS about? Why it needs GC?

Posted: **Wed Jun 11, 2014 1:42 pm**

embryo wrote:In fact you need to consider the ongoing activity of your process. It creates new objects and leaves them unused between the incomplete runs of your GC.

I was talking about write barriers to mark objects that are accessed during cycles, so yes, those mid-cycle created objects would be marked by the write barrier and would need to wait until the next cycle to clean up.

embryo wrote:If incremental GC works in the target process then there is no need for barriers.

I don't understand how you could not use barriers in an incremental GC? Using the colour coding I keep encountering on GC papers (white = not scanned, grey = to scan, black = scanned) - how do I tell the GC to not collect a white object that has been moved to a black object, mid-GC cycle?

The only way to not use barriers would be a copy-on-start algorithm - but the overhead making a duplicate copy of the heap, stack, and all objects would be greater than just doing a full mark pass?

embryo wrote:And by the way - what is your OS about? Why it needs GC?

My OS's 'native' executable format is bytecode (example assembly) - I developed a high level language that compiles down into this bytecode (example). The OS is essentially a bare metal virtual machine.

Posted: **Wed Jun 11, 2014 2:32 pm**

Do you use any kind of reference counting? Or any easy way to add it to your vm?

If you have reference counting, obviously some code gets generated to increment and decrement the reference count. If the reference count <= 0 in the decrement operation, couldn't you just add the object to a global queue for the GC to look at to clean up?

- Monk

OSDev.org

When to Collect Garbage

When to Collect Garbage

Re: When to Collect Garbage

Re: When to Collect Garbage

Re: When to Collect Garbage

Re: When to Collect Garbage

Re: When to Collect Garbage

Re: When to Collect Garbage

Re: When to Collect Garbage

Re: When to Collect Garbage

Re: When to Collect Garbage

Re: When to Collect Garbage

Re: When to Collect Garbage

Re: When to Collect Garbage

Re: When to Collect Garbage

Re: When to Collect Garbage