what are some good recomended address space layouts

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
proxy

what are some good recomended address space layouts

Post by proxy »

so i have my user space processes setup and they appear to work pretty well :)

anyway, i wanna have a specific layout for my user space processes. However, where i place my code, data, heap and threads in the user processes will impose limits on how many threads i can have, how big there stacks can be and how big my heap can grow per process....

so anyway, i plan to support multiple threads per process of course, and i figure i could put an upper limit on the number of threads per process and an upper limit on how big a threasds stack can grow (say 1 meg?). Based on this i can spread my thread stacks appart by 1 meg + 1 page for finest granularity i suppose. Also i wish to have at least 1 guard page between sections and such (is one enough?)

For those of you who are already passed this point, what is a recomended layout? I relaize there is no "right" layout, just curious of what people have come up with.

also, since i figure it's relavent, all addresses >= 0xc0000000 are kernel land.

proxy
AR

Re:what are some good recomended address space layouts

Post by AR »

Just a comment on the stack size, some games on Windows can use more than 8MB in the stack during the loading process so that may not be enough.

I was also under the impression that the stack was always in the same place for each thread and was swapped during thread context switches but doing it that way may be faster provided the limited size doesn't come back to bite you.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re:what are some good recomended address space layouts

Post by Brendan »

Hi,

I split the address space into 3 parts - one for the process, one for the thread and one for the kernel. Kernel space is the highest 1 GB. The boundary between process space and thread space is adjustable (determined by a vlaue in the executable's header), so you can have 1 GB for process space and 2 GB for thread space, or 2 GB for process space and 1 GB for thread space. All boundaries are aligned to 1 GB because of PAE, where each GB has it's own page directory.

One day, when I get a 64 bit kernel the kernel itself will be above 4 GB, and 32 bit processes will get an extra 1 GB of thread space (making them 1/3 GB or 2/2 GB). A thread needs to ask the kernel where the top of thread space is.

Each thread's stack is always at the top of thread space. The highest 4 MB is marked as "allocate on demand", so if/when the thread uses it it's automatically allocated by the kernel. The thread can set more memory to allocate on demand if it wants (a 1 GB stack isn't a problem).

Then there's "heap space", which must be initialized before use. This initialization tells the kernel what area to use (for high level languages it would be hidden from the programmer). There's 2 sets of heap space functions, one set for process space and the other for thread space that are entirely independant.

There's many advantages and disadvantages to this scheme.

Because each thread has it's own part of the address space that nothing else can access it prevents one thread from messing up another thread's data, and each thread can have a huge stack without problems. The maximum amount of memory that can be used by a process is huge (even though a single thread is limited to 3 GB) - a process with 1500 threads can have 1 GB for process space and 2 GB for each thread (3001 Gb).

Having thread space also makes it easier to clean up memory after a thread terminates itself - you just free anything that was in it's thread space (I'm not sure how you'd clean up properly on OS's that don't have thread space).

Unlike some OS's my kernel schedules threads not processes, so on a multi-CPU computer a multi-threaded process can use several CPUs at the same time. Because thread space can only ever be used by one thread the memory manager doesn't need to lock/unlock linear memory when it's allocating/de-allocating memory in thread space.

The disadvantages are that porting software to this memory model will be difficult, and that I need to change CR3 every thread switch. My IPC (non-blocking messaging) makes porting software difficult anyway (and to be honest I'd prefer software was re-written to make proper use of my OS), so I don't care about the first disadvantage. Changing CR3 wipes the CPUs TLB and can effect performance, but I use global pages for kernel space, and the thread's private data wouldn't be in the cache anyway - the only thing I could be losing here is the caching of process space.

BTW I'm not suggesting you use anything like this - I'm just showing an alternative to the "standard" method..

For the "standard" method, I think it's easiest for a thread's stack to be determined by who-ever spawned the thread. For example:

Code: Select all

int spawn(new_EIP, stack_size) {
    void *new_stack;

    new_stack = malloc(stack_size);
    if(new_stack == NULL) return -1;
    return spawn_thread(new_EIP, new_stack);
}


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
proxy

Re:what are some good recomended address space layouts

Post by proxy »

ok so basically you both are saying you effectivly put the thread stack at the top of user address space and basically have a seperate address space for each thread...

My only concern is that there are some valid constructs which will necessitate a thread having a pointer which refers to another threads stack for example...

Code: Select all

int blah = 10;
Thread *t = new Thread(func, &blah);
t->wait();
std::cout << blah << std::endl
it is a contrived example, but concept is pretty clear, and there are other valid ways to get the same net effect.

So if possible i would like to avoid having each thread have it's own address space, though i'll keep it in mind as an option.

anyone else care to share how they set things up?

proxy
proxy

Re:what are some good recomended address space layouts

Post by proxy »

i just re-read your post and was thinking about the second "normal" part. It's actually a nice idea. the "primary" thread would start and end and grow downward towards the heap. The the user space thread api could simply take a pointer to the new threads stack (which could be allocated on the heap).

this actually isn't bad at all, since MANY apps a re single threaded and the programmer can ask for a thread stack of appropriate size if more threads are needed (and i suppose a default size could be done for convienience as well).

I think i like that one (unless some people have better ideas ;))

proxy
mystran

Re:what are some good recomended address space layouts

Post by mystran »

I'm not going to play any MMU tricks with my threads. Threads will have the exactly same memory layout, so that if they want they can even use the same stack.

As for the stacks themselves, the kernel doesn't care. In fact, the only point where kernel cares is when it starts a new threads: the initial ESP value can be specified. So stacks are allocated just like normal memory. Since I'm paging in userspace, any policy is possible.

Since my kernel is totally ignorant about userspace stack-issues, this means that a few kernel-level threads can be easily multiplexed (in userspace) between any number of user-level threads.

That said, the question remains: how much space to reserve for each thread? I believe that it's ok to forget the whole idea of automatically growing stacks. I'm going to try and just allocate some 4MB (or so) of zero-filled memory for each thread. As long as the pager will map that memory on-write, there is no physical memory wastage.

Programs that need more can allocate bigger areas if they need to. If one has more than 100 threads in a single process, then one probably doesn't need that big stacks, and can adjust for a smaller default. If one has only a few threads, then giving them big stacks shouldn't be a problem either.

But these are specifically the kind of questions that I think are better left to be solved in userspace.

As for thread-private memory areas: the type of problem that this solves, is just as well solved by having several processes that share large portion of their heap. There is little advantage to consider them a single process if you effectively handle them as different address-spaces. It also means that you can never access data from another thread's stack efficiently, which means that any inter-thread message-passing must use heap-allocated structures (or copying).
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re:what are some good recomended address space layouts

Post by Brendan »

Hi,
mystran wrote:As for thread-private memory areas: the type of problem that this solves, is just as well solved by having several processes that share large portion of their heap. There is little advantage to consider them a single process if you effectively handle them as different address-spaces. It also means that you can never access data from another thread's stack efficiently, which means that any inter-thread message-passing must use heap-allocated structures (or copying).
It could be said that I've got "jobs consisting of one or more processes" rather than "processes consisting of one or more threads" - it's just different terminology. I prefer the latter, despite several past discussions where people have convinced me that the former (jobs consisting of processes) is technically more accurate from a scholar's point of view.

My reasons for this is that it's possible for threads to avoid using the thread-private memory areas (in this case my terminology becomes technically correct), and I'll also be trying to encourage the "applications consist of one or more processes, which consist of one or more threads" idea (which gets too confusing when the alternative terminology is applied - "applications consist of one or more jobs, which consist of one or more processes").

As for a thread being able to access another thread's stack data, IMHO (for my OS), at best it's rarely useful and at worst it's a security risk.

For inter-thread communication I expect threads to use the kernel's messaging functions (not process space) as the kernel's messaging is tied in with blocking (ie. "wait for a message") and the scheduler (which schedules threads only). IMHO this is more consistant and makes it easier to change code to talk to a thread in a different process instead of a thread in the same process (or to allow both at the same time).

If I were to change the terminology I'd avoid "job", "process", "task" or "thread" completely - instead I'd call them "class group" and "object". For example, a "class group" [process] consists of one or more "objects" [threads]. An object should only use private data [thread space] but can use public data [process space]. Any object can communicate with any other object (regardless of where or what) via. messaging. Every object is scheduled independantly by the kernel according to it's priority.

In hindsight this OOP terminology might actually make it easier to describe how software should be written for my OS...

For some strange reason I've always imagined a room full of mouse traps loaded with ping-pong balls, set so that triggering one mouse trap makes one ping pong ball fly. When the first ping pong ball lands and bounces around it triggers more mouse traps causing more ping pong balls to fly. This continues until the room is full of flying/bouncing ping pong balls, and the number of "loaded" mouse traps remaining is depleted, and finally the entire thing comes to rest.

Now imagine that each ping pong ball represents a message, the first ping pong ball is a request from the user and the mouse traps are threads (or objects) spread across many CPUs within many computers, collectively performing a huge amount of processing in parallel to get that initial request completed in record time.

As you can see, a process itself is just a container for the code used by objects/threads, and by itself it lacks any real significance.

Anyway, I think I've strayed too far off-topic for one day, and am overdue for some sleep :).


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Post Reply