Page 1 of 1

Garbage collection

Posted: Sun May 21, 2006 3:25 pm
by dc0d32
well

we all know the cost of cache misses in any kind of application, specially operating systems. IMHO, the garbage collection and compaction techniques are better for cache hits which work on locality of reference. so why do we usu. implement our malloc as something like a budy system, whereas we could add an extra level of indirection and have everything(or say parts of it) GC'd?
(you can always make GC'able and non GC'able areas separate, so that things that are frequently used are not GC'd, but things required not too often can be, to improve cacheing)
this is not limited to just OS. it is applicable to everything that executes on a machine.

what do you say?

Re:Garbage collection

Posted: Mon May 22, 2006 4:13 am
by guest
GC is most likely avoided in general in kernels because of the lack of responsiveness caused by collect cycles, the kernel is perceived to need to respond instantly to any and all events and as such the kernel will almost never be idle for a prolonged period of time, either processing an IRQ or otherwise waiting for an IRQ0 after scheduling a userspace app. To get around this you'd probably need a kernel thread to do the collect cycle using an incremental or concurrent collector which should hopefully reduce the amount of unresponsiveness. However, it may be desirable to consider reference counting first.

If you really want to use GC, the main problem with compaction GCing is that the x86 has no way of distinguishing a pointer from an integer so during a collection cycle it is impossible to rearrange the memory unless you want to risk an integer counter, which happens to be the same value as a pointer into a buffer, being changed arbitrarily.

Only two ways around this have come to mind. The first is to have a table of addresses to the malloc buffers and use double pointers for everything, ie.

Code: Select all

void *gc_table[];

int main()
{
      void **res = malloc(1024);
      printf("gc_table array entry ptr: 0x%X", (uintptr_t)*res);
      gc_lock_for_access(res);
      do_something_with_buffer(*res);
      gc_unlock(res);
      return 0;
}
The problems with this approach should be immediately obvious, the fact that you need to lock the GC everytime you access a memory area [to prevent a compaction cycle whilst reading/writing the buffer] combined with double pointer indirection makes it rather inefficient.

The other possible solution is a table of pointers to the real pointers, ie.

Code: Select all

void *gc_ptr_regtable[];

int main()
{
     void *res = NULL;
     gc_register_ptr(&res);
     res = gc_malloc(1024);

     do_stuff_with_buffer_that_is_gc_aware(res);

     gc_uncollectable(res); //Mark res as being unmanaged
     do_stuff_with_buffer_that_is_not_gc_aware(res);
     gc_collectable(res); //Mark it as managed again (assuming the function doesn't keep a pointer)

     gc_unregister_ptr(&res);
     return 0;
}
This approach is a bit nicer in that you only have to register the pointer variables before using them [which will greatly speed up the mark phase of the collector as an added bonus], however this is still rather fiddly with registering and unregistering every pointer before and after use [which will add the possibility of forgetting to register/unregister unless you want to encapsulate in a CREATEPTR(type, name [, size]) macro or use a smart pointer].

There may be more efficient ways of integrating precise/compacting GC in a conservative environment, I'm not an expert in the subject area. I came up with these designs myself when investigating the possibility a while ago. (The smoothness of GC in .Net and Java is because they're interpreted which allows the runtime to know exactly what is and is not a pointer which eliminates the indeterminism of scanning the memory).

If you forget about the compaction and just use the sweep part of the GC with a buddy allocator backend then you may be able to still get some degree of compaction without any fiddling (By nature of a buddy allocator, it should tend to naturally compact), of course, rearranging of long standing objects isn't possible though but having less code complexity may make it worthwhile.

Re:Garbage collection

Posted: Mon May 22, 2006 5:00 am
by Solar
Garbage Collection works excellently with languages that have been designed to work with it. With C/C++/ASM and others, problems arise.

Re:Garbage collection

Posted: Mon May 22, 2006 6:00 am
by mystran
Since concurrent GC is rather trivial to write if you can have a word in compiler's codegeneration, and GCs are more efficient than reference counting almost by definition, I'd have to agree with Solar:

The principal reason for not using GCs in kernel is because we use language that are ill-suited for garbage collection.

Mind you, there's ofcourse no real problem with ASM, because you can have full control of the code generated.

That said, I'm considering (among other things) to possibly garbage collect some of my kernel heap at some point.

Re:Garbage collection

Posted: Mon May 22, 2006 9:55 am
by dc0d32
then (atleast in theory) we can say that these languages like java should give better runtime performance (provided you have the best of JIT and highly optimized gc)?

Re:Garbage collection

Posted: Mon May 22, 2006 1:39 pm
by mystran
Performance of Java has nothing to do with GC. To get an idea of the GC overhead for a typical of garbage collection, get Boehm GC, and strip your favourite big C++ (or C) application of all reference counting and other manual memory management. I'll run faster.

In fact, in some cases it might run faster even if you leave the reference counting there, and just make free() a NOP.

The surprising thing is that with all the JIT optimizations and what not, Java still manages to be slower than C++ in many cases... and C++ compilers could do many more optimizations that they typically do. I don't think it's safe to claim bad Java compilers either, not anymore at least.

Now, obviously there's a good reason why it's slower in many cases; it's more safe. C++ doesn't promise very much about runtime safety, so it doesn't need that many runtime checks. Java does promise a lot more, so it also needs to check a lot more, although some of it can be eliminated by static analysis (althought it's kinda limited unless you want to do global analysis, which is problematic in a language like Java where you load classes at runtime all the time).

In applications with complex data structures that don't need that many runtime checks (just the null-checks you need in any language anyway), Java isn't exactly much slower than C++ (especially with C++ doing manual memory management), it's actually sometimes faster.

In the other end of the spectrum, in applications where you do a lot of computed array indexing and floating point math, Java performance can be fractions of what C++ has (especially if you use optimizations like -ffast-math with C++). But at least it promises to throw ArrayIndexOutOfBounds or some such exception.

Honestly, don't blame Java on GC. Java performance is fine in most situations where GC is useful in the first place.

This is actually the same thing as with Lisp: people always remember to complain about bad Lisp performance (which is true if you want to play with bits) while in fact once you start doing the things Lisp is good in, a good implementation can outperform whatever you come up with in C++ or Java. In fact, while you try to get your initial implementation working, the Lisp programmer has probably benchmarked several optimization approaches in production environment. Oh well..

Btw, the fastest possible GC approach is to define the problem so that you got the things you save, and the things you just play with separate. Then you can just throw the whole heap away every time you're done -> zero GC overhead. Especially with DB backed applications this is valid approach.

Re:Garbage collection

Posted: Tue May 23, 2006 1:23 am
by Colonel Kernel
You all (probably) knew it was coming... ;)

At this point I must provide the obligatory link to Singularity -- an OS written mostly in C# that has an in-kernel concurrent mark-sweep GC. Regardless of everybody's own opinions on the subject, this is a working example of the concept in action that can show you some of the benefits and drawbacks.

Re:Garbage collection

Posted: Sun Jun 04, 2006 5:59 pm
by Crazed123
Solar wrote: Garbage Collection works excellently with languages that have been designed to work with it. With C/C++/ASM and others, problems arise.
D, anyone?