OSDev.org

Posted: **Tue Jul 06, 2010 12:45 pm**

A more or less theoretical question that can not be searched for commonness of words used, how much stack space per process/thread is enough?

Doing some scaled multithreading tests in my OS and running out of memory on plain spot, i found that for some long-forgotten reason each process or thread got 512 Kb of stack space given to it.

That sounds quite excessive, yet it seems that mswindows allocates 1 Mb per thread by default, and there are values from 4 Kb to 8 Mb mentioned around googleable universe.

Some direct probing and testing, however, revealed that regular programs don't use more than about 256 bytes of stack, and i don't remember having any explicitly recursive implementations of things like quicksort, or local variables of epic sizes.

However, reducing stack to 1024 bytes causes total crashes at startup, and i grew it progressively up to 64 Kb, where things started to act normally. Thus, 256Kb seems to be actually a good figure with about 4 times of redundancy.

So, what kinds of stack sizes make sense and for what reasons, as you do think?

Posted: **Tue Jul 06, 2010 1:58 pm**

I think that it's smartest to use 4K paging for userspace apps. So the stack should be a multiple of 4K. You will need one extra 4K of virtual memory space allocated as a "guard page" to check for page faults. So for virtual memory space efficiency, I think 12K to 16K is wisest, plus the initial guard page.

Then, I think that on-demand allocation according to this algorithm is brilliant (see senaus' post, halfway down):
http://forum.osdev.org/viewtopic.php?f=15&t=14917

But, as you say, any application that is not using a clever recursion scheme but still uses more than 4K of stack space either has terrible design problems, or the userland API is buggy (system calls use too much stack), or the compiler is buggy. For your Windoze case, I'm sure it's the API.

Posted: **Tue Jul 06, 2010 2:11 pm**

I have an app here which easily chews through 80kb in one function call alone. Nothing drastic; just processes a huge data set on the stack. And why not - the data is useless after the function exits, because the parameters change each time through. Allocating the data on the heap would require tracking, growing and potentially shrinking the allocation as required, which is simply not worth it.

As I see it - especially on 64-bit operating systems - there is no reason not to by default allocate 1MB to the stack and potentially more.

Posted: **Tue Jul 06, 2010 3:22 pm**

Interesting.
In my OS case there was a decompression routine in one of the first programs loaded that held the buffers as local variables, consuming 45 Kb of stack.
With that slightly altered, everything works nicely with as little as 7Kb of stack per process.

I see the point, the more stack space there is, the more flexibility the application programmer will have in his ways.
The idea with paging-extensible stack is pretty interesting, but not without some address space shuffling.

I wonder if compiler can estimate the stack space the program will take statically (like local variables sizes summed together along non-recursive call graph for lower bound estimation), to pass down to the executable as some sort of a minimum, with user-defined linker parameters and kernel-defined minimum as other control mechanisms?

But, i guess, wasting a few Mb's out of hundreds is not such a big deal after all, and threads can be started with user-defined stack size.

Posted: **Tue Jul 06, 2010 3:37 pm**

In my case the compiler would have no clue how much stack would be use - that 80kB is all dynamically allocated, and the upper bound is something the compiler doesn't know.

I personally don't see there as being much cost for giving each thread a default of 1MB of address space for the stack (OK, 2kb per thread in page tables assuming PAE paging, but still not much)

Posted: **Tue Jul 06, 2010 10:38 pm**

Artlav wrote:I wonder if compiler can estimate the stack space the program will take statically...

No, it won't be able to.

Code: Select all

int func(size_t len)
{
   char array[len];
}

That's valid C99.

Posted: **Tue Jul 06, 2010 11:08 pm**

I think a decent option is to define a "standard" stack size... say, 1 page. Then, as the stack grows, map in stack pages where needed - demand paging the stack, effectively. You'd probably want to limit its growth eventually, but it does mean you don't have to worry about the stack size you give to application threads.

For kernel threads however I'd be enforcing a strict stack size limit - we use 32 KB for each thread's kernel stack (at each IRQ nesting level for that thread) and haven't had a problem. Any "big" structures, arrays or class instances are put on the heap, where they belong.

Posted: **Wed Jul 07, 2010 7:25 am**

For kernel threads you can possibly do with a fixed sized stack.

However, for user threads, how are you going to allow growable stacks there?

You can easily have a growable stack if you process have one thread, just give another page when it hits the first empty page.

Now with several threads in the same user space, where are you supposed to put the stack? With fixed stack it's easy, just somewhere on the heap. Now if all threads are going to be growable you can reserve a certain max size for each thread and Then just put the stacks after each other (in decrementing order that is) in virtual space.

Posted: **Wed Jul 07, 2010 9:39 am**

You could try the following: When the process starts executing some initial thread, it gets some growing stack at a default location. When it spawns another thread, it has to specify the location and maximum size of the new thread's stack. The kernel doesn't know in advance how many threads the process will run and how much stack space each of them needs, but the application programmer might have had at least a rough estimate and can plug in some useful values.

Posted: **Wed Jul 07, 2010 12:19 pm**

Hi:

The solution is simple: As mentioned before, allocate a certain maximum virtual size for the stack by default. Map in frames to back this as the program touches them. From here, if the program exceeds your default maximum stack size, you either use the technique in the link provided by a previous poster, or you have an system API call which allows a process to spawn a new thread, but with a specifier argument for the new thread's maximum stack size.

You can do both, too: have a dynamic touch-based method, and a thread spawn API call which specified the max stack size for the new thread. For the dynamic touch based method, you may want to allow the thread stack growth to get to a maximum of N times the default max size. Threads which are spawned with a custom stack size will have their stated stack size enforced with no extensions. APIs should do what the programmer asks them to do.

--All the best,
gravaera

Posted: **Thu Jul 08, 2010 12:39 pm**

In the old thread referenced previously, I noticed someone suggesting allocating the stack in 'chunks' allocated in a heap (whether the main heap or a separate one). That would allow large stacks or lots of small ones, but has the obvious problem of splitting variables/frames across stacks. I think that that issue could be solved, albeit with a little compiler cooperation (in fact, this can be done without kernel cooperation!), as follows:

When the program allocates a new structure (frame or alloca call), it should check that the new stack top is still in the current chunk; otherwise, allocate a new chunk and put the structure there; alloca calls require no extra work to unroll correctly (because the function ends with 'mov esp, ebp' anyway), while function calls just need to set ebp to the pre-move esp.

Anyone see any flaws with this plan which I haven't?

Posted: **Thu Jul 08, 2010 1:38 pm**

Allocating chunks will be slow and will happen quite regularly. Objects could exceed the size of a chunk. You need to know how to reclaim chunks. Many functions get by fine without frame pointers but would now need one. Checking if the chunk has been exceeded is going to slow down function calls. Allocator needs to be able to run without any available stack space.

You're almost suggesting the use of Cactus Stacks - and if you're adding the overhead of cactus stacks, use them and reap the benefits too.

Personally, I find allocating 1MB stacks in 2GB or multi-terabyte address spaces quite acceptable.

Posted: **Thu Jul 08, 2010 9:51 pm**

The whole point of senaus' method in the link that I posted is that it handles dynamically grown stacks for an arbitrary number of threads in one address space, and is completely invisible to the application -- as far as objects crossing "chunk" boundaries. The only slighly tricky thing about it is recognizing when to recover a memory chunk from a stack that is unwinding.

Posted: **Fri Jul 09, 2010 2:26 am**

What are the advantages of alloca-style stack management?

I'm just wondering as you need to reserve the full stack (+ 1+ page overflow) in address space anyway and you lose the ability to leave stack-based large buffers unmapped if unused. Plus, you need to explicitly insert alloca calls (or let your compiler do it for you) whereas I see no positive function in them. So, these are three disadvantages of alloca. The only advantage I can see is that when you increase the stack pointer by too much (way out of bounds array access) that it will fault on your stack. I don't see a big advantage in that since most overflows are by 1, N (smallish, like 100 or so) or by uninitialized (which is 2 billion on average). None of those three would've been caught. What are the advantages that I'm not seeing?

Posted: **Fri Jul 09, 2010 3:57 am**

pcmattman wrote:I think a decent option is to define a "standard" stack size... say, 1 page. Then, as the stack grows, map in stack pages where needed - demand paging the stack, effectively. You'd probably want to limit its growth eventually, but it does mean you don't have to worry about the stack size you give to application threads.

For kernel threads however I'd be enforcing a strict stack size limit - we use 32 KB for each thread's kernel stack (at each IRQ nesting level for that thread) and haven't had a problem. Any "big" structures, arrays or class instances are put on the heap, where they belong.

I was just going to suggest the last line you said. If I remember correctly, I have 8 kB of stack space for each (kernel) thread. I don't have user-space apps at this time yet though, so I can't really speak of whether it's been sufficient. I don't think you'd need more than 8 kB for one thread of stack space. The most common situations where it might become too large is either 1) lots of function calls, which call other functions, ... 2) a recursing function (same reasons as 1) or 3) a huge amount of data is allocated on the stack, for example a huge array. In case of 1 and 2, you can add an extra page (I've seen little apps that use that much space by just calling functions) and in case of 3, use the heap you're provided with (which is designed for this kind of thing).

OSDev.org

Stack space?

Stack space?

Re: Stack space?

Re: Stack space?

Re: Stack space?

Re: Stack space?

Re: Stack space?

Re: Stack space?

Re: Stack space?

Re: Stack space?

Re: Stack space?

Re: Stack space?

Re: Stack space?

Re: Stack space?

Re: Stack space?

Re: Stack space?