Threads and virtual memory layout

NickJohnson · Post by **NickJohnson** » Fri Mar 26, 2010 7:03 pm

Recently, I've been trying to implement full threading in my kernel. I already have a system in place that uses structures that are effectively threads, but they work in a stack, and only one "thread" is active per process. The point is, I already have a design that is oriented around passing around pointers to these threads/continuations, which I want to extend. Threads currently have pointers to their owning processes, but not the other way around; I also want this to be true in the future: the scheduler manages threads, the processor(s) runs threads, and process structures hold only permissions and paging information. This modularity seems like the most flexible option for when I try to implement SMP and just for general cleanliness.

Here is my problem. I have a table of processes that is in kernel space, linked to every address space, but my table of threads is unique to each address space. It would likely take a lot of virtual memory space to allow all threads to be in all address spaces: each thread structure is about a quarter page in size (because of FPU/SSE saving.) I also am likely to have a lot (16+, maybe 64+) threads in many programs due to the way my IPC works. However, I cannot properly pass around pointers to threads when threads are in separate address spaces! Should I expand kernel space to be large enough to hold all possible threads (which could be up to a gigabyte of address space), or deal with the ton of global state and overhead caused by constantly switching processes to reach other threads? What do those of you with threading do about this?

aeritharcanum · Post by **aeritharcanum** » Fri Mar 26, 2010 7:58 pm

Hi,

Actually, it's very normal for the kernel to take up a gigabyte of the virtual address space in every process. In Windows 2GB of every process's address space is taken up by the kernel. You may have your reasons for being conservative, but nobody would look at you askance if you used a gigabyte.

Also, in most kernels threads share their parent process's address space. The memory overhead of using separate address spaces for individual threads in a single process is really significant. And in the end, you end up having cloned and created an address space matching the parent's, and pointing to the same code/data frames in physical memory, with little divergence. Many times threads are spawned and are destroyed very soon after. The buildup and tear-down overhead for constantly creating and destroying address spaces may become a hanging point for the kernel.

I could concrete that point by giving a simple example: maybe at some point you'll want to consider allowing for asynchronous event notifications to userspace. Events such as a drive being unmounted, or an internet connection going offline, etc are commonly sent to userspace in more developed kernels, where a process would specify an entry point, and the kernel would call that entry point to notify the process of one event or another.

Imagine a driver reports to the kernel that drive 'D' is no longer plugged in/available. The kernel must close all file handles associated with drive D. As a hook, it also searches an event list which holds the entry points of all processes which have asked to be notified of drive activities. The most efficient way to notify the processes without stalling the kernel is to spawn a thread for each process on behalf of the process as if the process itself had spawned the thread, and have that thread be scheduled to begin execution at the event notification entry point. And have the thread destroyed immediately upon exit.

Would you then be cloning and spawning address spaces for multiple threads every time a notification of any kind needs to be sent to a process, and then freeing the page tables so soon after they were generated?

However, I cannot properly pass around pointers to threads when threads are in separate address spaces!

Imho, you could increase the kernel's address space. And also, unless you have an firm reason to stray from the norm, maybe have all threads operate in their parents' address spaces.

NickJohnson · Post by **NickJohnson** » Sat Mar 27, 2010 6:33 am

aeritharcanum wrote:Also, in most kernels threads share their parent process's address space. The memory overhead of using separate address spaces for individual threads in a single process is really significant. And in the end, you end up having cloned and created an address space matching the parent's, and pointing to the same code/data frames in physical memory, with little divergence. Many times threads are spawned and are destroyed very soon after. The buildup and tear-down overhead for constantly creating and destroying address spaces may become a hanging point for the kernel.

Actually, I don't have separate address spaces per thread; I only have them per process, as usual. I just have the table of threads for process A in a different address space than the table of threads for process B.

Also, my kernel already does do asynchronous events in userspace. It's actually my main IPC method, and why I'm expecting to have a lot of threads, even if they are short-lived. However, I'm also going to allow queuing of these events in a sort of mailbox, which I expect most user processes to use, because it is easier to write the C library that way. It's just the drivers and system servers that will have 64+ threads, because it helps with response time and is critical for IRQ handling.

After thinking about it overnight, I think it would be reasonable to expand the kernel address space. However, it probably won't be by as much as I had thought. If I put all of the processes' threads in one table with an allocator, it's not the maximum number of threads per process that will have to be accommodated, but instead the average number, which I expect to be less than two per process. I could probably be fine with only about 16 MB of address space for that table (16 MB = 1024 processes (max) * 4 threads (avg) * 4096 bytes/thread (max)).

Gigasoft · Post by **Gigasoft** » Sat Mar 27, 2010 11:56 am

You could just allocate each thread dynamically, and put them in a linked list. Then you don't need to waste memory that you don't use.

And I think the average number of processes running at once will be much less than 1024, especially for a homebrew system. You'll probably have a handful of service processes running, and then each user application running will typically consist of one process.

NickJohnson · Post by **NickJohnson** » Sat Mar 27, 2010 12:31 pm

Gigasoft wrote:You could just allocate each thread dynamically, and put them in a linked list. Then you don't need to waste memory that you don't use.

How does that save memory? The kernel clearly has the ability to control paging, so 16 MB virtual address space != 16 MB physical memory, even for an array. There's also nothing like constant time conversion from thread id to thread pointer.

Gigasoft wrote: And I think the average number of processes running at once will be much less than 1024, especially for a homebrew system. You'll probably have a handful of service processes running, and then each user application running will typically consist of one process.

I realize this, but I'm just trying to accommodate my kernel's current limit of 1024 processes. 1024 processes and 4096 threads seems reasonable for a desktop or medium server. There's nothing wrong with a little future proofing.

Seriously though, I think I've got a good plan. Problem solved.

FlashBurn · Post by **FlashBurn** » Sat Mar 27, 2010 4:51 pm

Just a little hint. If you really want to save mem (physical and virtual) then don´t save the fpu context in the thread structure, use a pointer to a mem block for that, because how many of your services and also user threads will use the fpu?

OSDev.org

Threads and virtual memory layout

Threads and virtual memory layout

Re: Threads and virtual memory layout

Re: Threads and virtual memory layout

Re: Threads and virtual memory layout

Re: Threads and virtual memory layout [SOLVED]

Re: Threads and virtual memory layout