Garbage Collection

Discussions on more advanced topics such as monolithic vs micro-kernels, transactional memory models, and paging vs segmentation should go here. Use this forum to expand and improve the wiki!
Post Reply
pcmattman
Member
Member
Posts: 2566
Joined: Sun Jan 14, 2007 9:15 pm
Libera.chat IRC: miselin
Location: Sydney, Australia (I come from a land down under!)
Contact:

Garbage Collection

Post by pcmattman »

Hi everyone,

I've been thinking lately a lot about how to efficiently clean up after a process finishes running (or a thread) and how to avoid running out of memory because x number of threads have used up too much space.

At the moment I'm liking the sounds of reference counting, which isn't too hard to implement and works pretty well for what I want to do. However, I'm pretty sure there are other methods out there that are better suited to an operating system environment.

Basically, I'm looking for algorithms that the kernel can use when a thread dies to free up any resources it was using, or any relevant discussion that can assist myself and others in this.

Thanks.
User avatar
JamesM
Member
Member
Posts: 2935
Joined: Tue Jul 10, 2007 5:27 am
Location: York, United Kingdom
Contact:

Post by JamesM »

Hi,

What resources in particular can't you track? Shared memory and file descriptors for any thread should be accessible by the kernel anyway, and when a process dies you can just destroy its entire virtual address space and release everything attached to it.

Cheers,

James
pcmattman
Member
Member
Posts: 2566
Joined: Sun Jan 14, 2007 9:15 pm
Libera.chat IRC: miselin
Location: Sydney, Australia (I come from a land down under!)
Contact:

Post by pcmattman »

The problem is shared page tables (ie, for the kernel data) - you can't free those without breaking everything, so I can't just destroy an address space.

Specifically, I'm interested in thread stacks, mainly in the kernel. If a kernel thread dies, how does the resources it uses get cleaned up without destroying other relevant data?

Is there a garbage collection system that can work for this idea? And are there any OS-specific garbage collection algorithms that people have used in the past?
User avatar
JamesM
Member
Member
Posts: 2935
Joined: Tue Jul 10, 2007 5:27 am
Location: York, United Kingdom
Contact:

Post by JamesM »

pcmattman wrote:The problem is shared page tables (ie, for the kernel data) - you can't free those without breaking everything, so I can't just destroy an address space.

Specifically, I'm interested in thread stacks, mainly in the kernel. If a kernel thread dies, how does the resources it uses get cleaned up without destroying other relevant data?

Is there a garbage collection system that can work for this idea? And are there any OS-specific garbage collection algorithms that people have used in the past?
I still don't see why you need GC for this.

Shared page tables would surely be recorded somewhere - you should know where your kernel code and heap is! You can just not free those tables/entries.

You should also know where your thread's kernel stack is - it surely should be recorded in your thread control block or something.

But I still don't understand what 'resources' you're talking about - It's very vague. Most 'resources' (semaphores, file descriptors, etc) should be enumerated in your thread control block anyway, surely?

Cheers,

James
pcmattman
Member
Member
Posts: 2566
Joined: Sun Jan 14, 2007 9:15 pm
Libera.chat IRC: miselin
Location: Sydney, Australia (I come from a land down under!)
Contact:

Post by pcmattman »

Shared page tables would surely be recorded somewhere - you should know where your kernel code and heap is! You can just not free those tables/entries.
I did some test cases, and it's actually faster and works better to use something like reference counting for cleaning up process address spaces, if only in my own kernel's case. At the same time, I do use the "for system programmers" bits in the page table and directory entries to mark kernel and userspace pages serperately.
You should also know where your thread's kernel stack is - it surely should be recorded in your thread control block or something.
Of course, but you can see my thread in the OS Development forum to find out it isn't as easy as just freeing that stack.
But I still don't understand what 'resources' you're talking about - It's very vague. Most 'resources' (semaphores, file descriptors, etc) should be enumerated in your thread control block anyway, surely?
Most resources, yes. The main things I'm talking about are stacks and binary images, in the context of a thread (or a context in my system, but that wouldn't work in the sentence).

Lastly,
I still don't see why you need GC for this.
I probably don't, but it seemed a logical extension to my system (hence the reason I posted).

Thanks for the constructive criticism.
User avatar
bewing
Member
Member
Posts: 1401
Joined: Wed Feb 07, 2007 1:45 pm
Location: Eugene, OR, US

Post by bewing »

I'm not sure I agree with JamesM. There are always abnormal terminations, where a task's resources don't get freed normally. Every OS that I know of has some sort of backup "resource deallocation" mechanism that gets called when an app dies, just in case it didn't clean up after itself properly.

I agree that the thesis of this thread seems a little vague though. To be more specific, I'll assume that you are talking specifically about userspace resources. Every virtual memory allocation must go through the VM manager (assuming a modular system), and it needs to keep track of all allocations according to "task number". When a task dies, it certainly needs to free any VM pages that were not properly freed during the process termination. But every task needs to have its own stack. Since that is true, I put a "stack buffer pointer" in my job table, for every task -- specifically for the purpose of freeing that space on termination. When a task is killed, one of the first things I do is free its stack. I also have a pointer to the code/data buffer; but that space is shared between threads and the main app -- so it contains a resource count, and is only freed when the count hits 0.

Shared resources need a reference count, without any question. You cannot free a shared resource until everyone is done using it.

Unshared resources all need to be tracked by their various managers, by task number. Any time a task dies, all the managers need to be informed of that, so they can all check their lists of allocated resources, and free any that were exclusively used by that task.

Alternately, you can abstract the concept of a "resource", and only have one manager that is responsible for de-allocating all zombie resources still allocated by dead tasks.
Korona
Member
Member
Posts: 1000
Joined: Thu May 17, 2007 1:27 pm
Contact:

Post by Korona »

Linux uses reference counting to manage shared objects like thread descriptions and other structures.
My own kernel uses reference counting, too. I think it's the fastest/easiest method to manage resources that are used by more than one thread at the same time. Tracing garbage collection does not work in kernels as you cannot put all threads to sleep at one time in order to scan their kernel stacks for object references.
Post Reply