Page 1 of 1

Garbage Collection

Posted: Mon Apr 28, 2008 2:43 am
by pcmattman
Hi everyone,

I've been thinking lately a lot about how to efficiently clean up after a process finishes running (or a thread) and how to avoid running out of memory because x number of threads have used up too much space.

At the moment I'm liking the sounds of reference counting, which isn't too hard to implement and works pretty well for what I want to do. However, I'm pretty sure there are other methods out there that are better suited to an operating system environment.

Basically, I'm looking for algorithms that the kernel can use when a thread dies to free up any resources it was using, or any relevant discussion that can assist myself and others in this.

Thanks.

Posted: Mon Apr 28, 2008 3:00 am
by JamesM
Hi,

What resources in particular can't you track? Shared memory and file descriptors for any thread should be accessible by the kernel anyway, and when a process dies you can just destroy its entire virtual address space and release everything attached to it.

Cheers,

James

Posted: Mon Apr 28, 2008 3:32 am
by pcmattman
The problem is shared page tables (ie, for the kernel data) - you can't free those without breaking everything, so I can't just destroy an address space.

Specifically, I'm interested in thread stacks, mainly in the kernel. If a kernel thread dies, how does the resources it uses get cleaned up without destroying other relevant data?

Is there a garbage collection system that can work for this idea? And are there any OS-specific garbage collection algorithms that people have used in the past?

Posted: Mon Apr 28, 2008 3:39 am
by JamesM
pcmattman wrote:The problem is shared page tables (ie, for the kernel data) - you can't free those without breaking everything, so I can't just destroy an address space.

Specifically, I'm interested in thread stacks, mainly in the kernel. If a kernel thread dies, how does the resources it uses get cleaned up without destroying other relevant data?

Is there a garbage collection system that can work for this idea? And are there any OS-specific garbage collection algorithms that people have used in the past?
I still don't see why you need GC for this.

Shared page tables would surely be recorded somewhere - you should know where your kernel code and heap is! You can just not free those tables/entries.

You should also know where your thread's kernel stack is - it surely should be recorded in your thread control block or something.

But I still don't understand what 'resources' you're talking about - It's very vague. Most 'resources' (semaphores, file descriptors, etc) should be enumerated in your thread control block anyway, surely?

Cheers,

James

Posted: Mon Apr 28, 2008 4:03 am
by pcmattman
Shared page tables would surely be recorded somewhere - you should know where your kernel code and heap is! You can just not free those tables/entries.
I did some test cases, and it's actually faster and works better to use something like reference counting for cleaning up process address spaces, if only in my own kernel's case. At the same time, I do use the "for system programmers" bits in the page table and directory entries to mark kernel and userspace pages serperately.
You should also know where your thread's kernel stack is - it surely should be recorded in your thread control block or something.
Of course, but you can see my thread in the OS Development forum to find out it isn't as easy as just freeing that stack.
But I still don't understand what 'resources' you're talking about - It's very vague. Most 'resources' (semaphores, file descriptors, etc) should be enumerated in your thread control block anyway, surely?
Most resources, yes. The main things I'm talking about are stacks and binary images, in the context of a thread (or a context in my system, but that wouldn't work in the sentence).

Lastly,
I still don't see why you need GC for this.
I probably don't, but it seemed a logical extension to my system (hence the reason I posted).

Thanks for the constructive criticism.

Posted: Mon Apr 28, 2008 9:38 am
by bewing
I'm not sure I agree with JamesM. There are always abnormal terminations, where a task's resources don't get freed normally. Every OS that I know of has some sort of backup "resource deallocation" mechanism that gets called when an app dies, just in case it didn't clean up after itself properly.

I agree that the thesis of this thread seems a little vague though. To be more specific, I'll assume that you are talking specifically about userspace resources. Every virtual memory allocation must go through the VM manager (assuming a modular system), and it needs to keep track of all allocations according to "task number". When a task dies, it certainly needs to free any VM pages that were not properly freed during the process termination. But every task needs to have its own stack. Since that is true, I put a "stack buffer pointer" in my job table, for every task -- specifically for the purpose of freeing that space on termination. When a task is killed, one of the first things I do is free its stack. I also have a pointer to the code/data buffer; but that space is shared between threads and the main app -- so it contains a resource count, and is only freed when the count hits 0.

Shared resources need a reference count, without any question. You cannot free a shared resource until everyone is done using it.

Unshared resources all need to be tracked by their various managers, by task number. Any time a task dies, all the managers need to be informed of that, so they can all check their lists of allocated resources, and free any that were exclusively used by that task.

Alternately, you can abstract the concept of a "resource", and only have one manager that is responsible for de-allocating all zombie resources still allocated by dead tasks.

Posted: Mon Apr 28, 2008 10:48 am
by Korona
Linux uses reference counting to manage shared objects like thread descriptions and other structures.
My own kernel uses reference counting, too. I think it's the fastest/easiest method to manage resources that are used by more than one thread at the same time. Tracing garbage collection does not work in kernels as you cannot put all threads to sleep at one time in order to scan their kernel stacks for object references.