OSDev.org

Posted: **Fri Jan 29, 2010 1:49 am**

I´m just thought about how I will implement the user stack and I have some problems there.

What should be the max allowed size of a user stack? I thought that 4088KiB are enough. But If I init every user stack so that it could be 4088KiB in size (+2 4KiB pages as guard pages), then I would also get a problem with the max number of user threads, because when I have 766 stacks I only have left 4MiB for the code+data of the program (the last 4MiB of a user process will be used for some data structures). So how can I solve this?

I thought that maybe the 1st thread of a program gets a stack which is 16KiB in size and if it needs more it has to call the kernel and say that it needs more, but this has to happen in the beginning of the program and then when the user code creates a new thread it tells how much stack space it needs at max (max allowed would be still 4088KiB).

Posted: **Fri Jan 29, 2010 2:17 am**

because when I have 766 stacks

Aren't they supposed to be mostly in different address spaces? Even if you have a really large number like 16 threads in the same process, and they all reserve 16MB of stack, that's still only a sixteenth of the address space. 4MB is not going to be enough when a program has the habit of allocating large structures on the stack.

I would definitely have a configurable maximum stack size, but I also wouldn't limit the stack size to anything less than the total address space reserved for stacks (which would be like, 512MB? Depending on design of course) This also allows you to support those cases when an application does spawn an idiot number of threads, and you can give them half a MB each for maximal joy.

Posted: **Fri Jan 29, 2010 6:10 am**

Combuster wrote: Aren't they supposed to be mostly in different address spaces? Even if you have a really large number like 16 threads in the same process, and they all reserve 16MB of stack, that's still only a sixteenth of the address space. 4MB is not going to be enough when a program has the habit of allocating large structures on the stack.

Yeah, I meant that I then can only have 766 stacks per process. As I plan to make my app server so that every app has his own thread in the app server there will and can be many threads per process or imagine a webserve which spawns a thread for every connection.

So I should go with a stack size which is dynamic. I will think how I can do this with my area allocator.

Combuster wrote: I would definitely have a configurable maximum stack size, but I also wouldn't limit the stack size to anything less than the total address space reserved for stacks (which would be like, 512MB? Depending on design of course)

At the moment I have no space reserved for stacks.

Posted: **Fri Jan 29, 2010 5:21 pm**

I'd allocate thread stacks in user space - except for that of the initial thread, which would start off as 4k and the app can then extend it to it's desired size.

An app which spawns a thread per client is generally a scalability nightmare (Though there is an exception for the case of servers under some microkernels)

Posted: **Mon Feb 01, 2010 5:40 pm**

Owen wrote: I'd allocate thread stacks in user space - except for that of the initial thread, which would start off as 4k and the app can then extend it to it's desired size.

All Process stacks will be in user space, except for the ring0 (kernel) stacks.

Owen wrote: An app which spawns a thread per client is generally a scalability nightmare (Though there is an exception for the case of servers under some microkernels)

My app server will be such exception and may be my storage server, too

I have one more question. Is there a special point why stacks are allocated (most of the time) at the end of userspace? I ask because for my vmm its easier if I only allocate the stack and the position in user space is not important. I can´t think of a problem with that.

Posted: **Mon Feb 01, 2010 6:11 pm**

The idea is to have the stack, which grows down, at the highest userspace address, and the heap growing toward the stack, and vice-versa, so the last thing that happens is that the stack grows down into the code section for the userspace thread, or the heap grows into the kernel space pages. (The latter is not as scary though, since a #GP will occur due to the privilege level difference.)

--All the best,
gravaera

Posted: **Tue Feb 02, 2010 3:52 am**

I don´t know how os like linux and windows do the heap management, but if you want more memory (so you would call it heap) you have to call the kernel and say how much more you need and you get an address back. I know that there is also the option of doing this with the help of a paging exception, but I do not want to go this way.

So there is no other reason for having the stack at the end. Another point is that there are shared libraries too, which are, most of the time, mapped in at the end of user space.

Posted: **Tue Feb 02, 2010 5:53 am**

FlashBurn wrote:What should be the max allowed size of a user stack? I thought that 4088KiB are enough. But If I init every user stack so that it could be 4088KiB in size (+2 4KiB pages as guard pages), then I would also get a problem with the max number of user threads, because when I have 766 stacks I only have left 4MiB for the code+data of the program (the last 4MiB of a user process will be used for some data structures). So how can I solve this?

It's not the case that the maximum number of threads you can have is equal to the maximum total stack space divided by the maximum stack space for each thread. Sure, you can space the stacks 4 megs apart in the virtual address space, but chances are, not all 766 are actually using that 4 megs (and if they are, well that's the app's own damn stupidity). So you can find a thread that's only used, say, 100 kB, and put a new thread's stack at the 2 MB point between it and the one below it.

Then you can make the maximum nice and chunky (e.g., 16 MB) and subdivide later as the number of threads grows.

Posted: **Tue Feb 02, 2010 6:55 am**

@midir

This would complicate things even more. I mean then I have to write a function which looks how much stack space a thread is using and then I need to assume that it won´t use more and I have to find a region which is big enough for a new thread.

As I understand the traditional address space is something like this:

Code: Select all

0xC0000000 stack ptr;; grows downwards
0x00400000 heap ptr;; grows upwards

The problem is that this makes handling things like multiply threads and shared libraries somewhat difficult. As I see it, it will be easier (on my os!) when everytime the program needs more memory it calls the kernel (like sbrk on linux) and that I need to know the max stack size of thread at the time of creation.

Another thing is, should I map in the whole stack or should I do this with the help of the paging exception? If I use the exception I could save some mem, but it would be slower.

I read somewhere that there is a function which tries to access some mem of the stack (at the beginning of the program) so that the os maps in the pages?!

Posted: **Tue Feb 02, 2010 7:37 am**

FlashBurn wrote:As I see it, it will be easier (on my os!) when everytime the program needs more memory it calls the kernel (like sbrk on linux) and that I need to know the max stack size of thread at the time of creation.

That is not really different. Traditionally, the "break" is the border between heap and stack space. By calling "sbrk", you increase the size reserved for one and decrease the size reserved for the other. This originates from the times when processes were single-threaded, but you may still use the area above the break for all threads' stacks. The heap is usually shared by all threads.

Posted: **Tue Feb 02, 2010 8:33 am**

My question is now, would it be a problem if the heap isn´t continuous? I mean if I would have a mem map like this:

Code: Select all

0x1000 - 0x400000 code
0x400000 - 0x500000 data
0x500000 - 0x580000 stack
0x580000 - 0x640000 data
0x640000 - 0x700000 shared library
0x700000 - 0x710000 data
0x710000 - 0x720000 stack
...

Would this give problems (it shouldn´t)?

Posted: **Tue Feb 02, 2010 10:31 am**

FlashBurn wrote:
Owen wrote: I'd allocate thread stacks in user space - except for that of the initial thread, which would start off as 4k and the app can then extend it to it's desired size.
All Process stacks will be in user space, except for the ring0 (kernel) stacks.

I meant that the process allocates the thread's stack and passes it to the create thread syscall, rather than the kernel allocating it

FlashBurn wrote:
Owen wrote: An app which spawns a thread per client is generally a scalability nightmare (Though there is an exception for the case of servers under some microkernels)
My app server will be such exception and may be my storage server, too

I have one more question. Is there a special point why stacks are allocated (most of the time) at the end of userspace? I ask because for my vmm its easier if I only allocate the stack and the position in user space is not important. I can´t think of a problem with that.

It's desirable to keep the stack as far away from everything else as possible from a safety point of view. Additionally, you want to randomize the locations of the application and the libraries it depends upon as much as possible to avoid return-to-libc and similar attacks. (Fortunately, such attacks are much harder to pull off on x86_64 and other register calling convention platforms; syscall vectors are also thusly often immune)

Posted: **Tue Feb 02, 2010 10:49 am**

Owen wrote: I meant that the process allocates the thread's stack and passes it to the create thread syscall, rather than the kernel allocating it

The problem with that is, that I want to have 2 guard pages so that the stack can´t grow above its frontiers.

Owen wrote: It's desirable to keep the stack as far away from everything else as possible from a safety point of view. Additionally, you want to randomize the locations of the application and the libraries it depends upon as much as possible to avoid return-to-libc and similar attacks. (Fortunately, such attacks are much harder to pull off on x86_64 and other register calling convention platforms; syscall vectors are also thusly often immune)

I have to say that security is a good thing, but I don´t think that I will care about these problems at the moment. I mean all one does with such things is protect the programmer from writing working code

I will have a look at what ways there are to protect my os from such things.

Posted: **Tue Feb 02, 2010 1:51 pm**

I haven't reached anywhere near the stage of having separate user mode memory environments yet so be warned there might be glaring stupidities in what I'm suggesting, but...

FlashBurn wrote:I mean then I have to write a function which looks how much stack space a thread is using [...] and I have to find a region which is big enough for a new thread.

That's what the guard pages can help you track. When a page isn't present, you can use the high 31 bits of the page table entry for your own data, such as why the page is unallocated. Is it swapped to disk, not meant to be allocated, or is it a thread stack guard? (and there's space enough here for a thread ID, to make thread stack usage tracking easier)

and then I need to assume that it won´t use more

No need to assume -- insist! If a 32-bit app is creating so many hundreds of threads that the stacks of some threads are starved of virtual space, that's its own fault for using too much memory, not yours.

Another thing is, should I map in the whole stack or should I do this with the help of the paging exception?

Allocate 0 or 1 pages initially. Use the guard page to detect every time the stack needs to grow.

If I use the exception I could save some mem, but it would be slower.

Only the first time the thread expands its stack into a new page, not all the time. I expect the overhead is negligible.

My question is now, would it be a problem if the heap isn´t continuous?

I think not. It depends on the heap implementation though.

One other thing, why do you want 2 guard pages? Is there an advantage to using 2 rather than 1?

Posted: **Tue Feb 02, 2010 2:09 pm**

FlashBurn wrote:
Owen wrote: I meant that the process allocates the thread's stack and passes it to the create thread syscall, rather than the kernel allocating it
The problem with that is, that I want to have 2 guard pages so that the stack can´t grow above its frontiers.

Any reason the user mode thread library can't allocate those?

(Incidentally, I have an app here which allocates ~32kb of RAM on the stack in one go regularly; it would quite readily jump past any guard pages unintentionally)

FlashBurn wrote:
Owen wrote: It's desirable to keep the stack as far away from everything else as possible from a safety point of view. Additionally, you want to randomize the locations of the application and the libraries it depends upon as much as possible to avoid return-to-libc and similar attacks. (Fortunately, such attacks are much harder to pull off on x86_64 and other register calling convention platforms; syscall vectors are also thusly often immune)
I have to say that security is a good thing, but I don´t think that I will care about these problems at the moment. I mean all one does with such things is protect the programmer from writing working code I will have a look at what ways there are to protect my os from such things.

Huh? Address space layout randomization doesn't prevent someone from writing working code, just prevents them from writing exploits. Return-to-Libc attacks are one of the biggest classes at the moment.

OSDev.org

user space stack size

user space stack size

Re: user space stack size

Re: user space stack size

Re: user space stack size

Re: user space stack size

Re: user space stack size

Re: user space stack size

Re: user space stack size

Re: user space stack size

Re: user space stack size

Re: user space stack size

Re: user space stack size

Re: user space stack size

Re: user space stack size

Re: user space stack size