Hi,
I split the address space into 3 parts - one for the process, one for the thread and one for the kernel. Kernel space is the highest 1 GB. The boundary between process space and thread space is adjustable (determined by a vlaue in the executable's header), so you can have 1 GB for process space and 2 GB for thread space, or 2 GB for process space and 1 GB for thread space. All boundaries are aligned to 1 GB because of PAE, where each GB has it's own page directory.
One day, when I get a 64 bit kernel the kernel itself will be above 4 GB, and 32 bit processes will get an extra 1 GB of thread space (making them 1/3 GB or 2/2 GB). A thread needs to ask the kernel where the top of thread space is.
Each thread's stack is always at the top of thread space. The highest 4 MB is marked as "allocate on demand", so if/when the thread uses it it's automatically allocated by the kernel. The thread can set more memory to allocate on demand if it wants (a 1 GB stack isn't a problem).
Then there's "heap space", which must be initialized before use. This initialization tells the kernel what area to use (for high level languages it would be hidden from the programmer). There's 2 sets of heap space functions, one set for process space and the other for thread space that are entirely independant.
There's many advantages and disadvantages to this scheme.
Because each thread has it's own part of the address space that nothing else can access it prevents one thread from messing up another thread's data, and each thread can have a huge stack without problems. The maximum amount of memory that can be used by a process is huge (even though a single thread is limited to 3 GB) - a process with 1500 threads can have 1 GB for process space and 2 GB for each thread (3001 Gb).
Having thread space also makes it easier to clean up memory after a thread terminates itself - you just free anything that was in it's thread space (I'm not sure how you'd clean up properly on OS's that don't have thread space).
Unlike some OS's my kernel schedules threads not processes, so on a multi-CPU computer a multi-threaded process can use several CPUs at the same time. Because thread space can only ever be used by one thread the memory manager doesn't need to lock/unlock linear memory when it's allocating/de-allocating memory in thread space.
The disadvantages are that porting software to this memory model will be difficult, and that I need to change CR3 every thread switch. My IPC (non-blocking messaging) makes porting software difficult anyway (and to be honest I'd prefer software was re-written to make proper use of my OS), so I don't care about the first disadvantage. Changing CR3 wipes the CPUs TLB and can effect performance, but I use global pages for kernel space, and the thread's private data wouldn't be in the cache anyway - the only thing I could be losing here is the caching of process space.
BTW I'm not suggesting you use anything like this - I'm just showing an alternative to the "standard" method..
For the "standard" method, I think it's easiest for a thread's stack to be determined by who-ever spawned the thread. For example:
Code: Select all
int spawn(new_EIP, stack_size) {
void *new_stack;
new_stack = malloc(stack_size);
if(new_stack == NULL) return -1;
return spawn_thread(new_EIP, new_stack);
}
Cheers,
Brendan