managing stack

turdus · Post by **turdus** » Wed Mar 07, 2012 2:19 am

bluemoon wrote: Interesting. But would it breaks if I manually push 0xDEADBEEF on stack?

Yes.
http://git.minix3.org/?p=minix.git;a=bl ... priv.h#l61 - defined here
http://git.minix3.org/?p=minix.git;a=bl ... oc.c#l1663 - checked here

The 3rd solution is a structured exception handler, where unresolved #PF have a last chance to survive by passing to user handler for rescue.
you then mmap in the user handler.

Which is - as I wrote before - complicated, and, hm let's say, more than dangerous. There's a good reason why direct calling of ring3 function from ring0 code is prohibited.
I'm still voting for LWP, which do not have this problem at all.

JamesM · Post by **JamesM** » Wed Mar 07, 2012 2:53 am

Or, you could just do what pthreads does and foist all of this on the user, requesting that they pass their own stack into pthread_create.

gerryg400 · Post by **gerryg400** » Wed Mar 07, 2012 3:29 am

The Posix spec. doesn't require that a pthread program create its own thread stacks. In fact even though Posix defines an interface that allows an application to supply its own stack the standard points out that there are issues with doing that.

At least some of the Posix implementations that I've seen support kernel allocated thread stacks. I'm not sure what the perceived problem is.

OSwhatever · Post by **OSwhatever** » Wed Mar 07, 2012 5:41 am

I'd say you should support both. Both kernel managed stacks and stacks provided by the user program. The advantage of kernel managed stacks are of course that they are dynamic and adjust itself to the size needed. User managed stacks must be fixed and cannot grow. Therefore the programmer usually want the kernel managed stacks as you can leave all the thinking to the kernel and you usually consume less memory.

However, as the kernel managed stacks are addressed planned, you have only a limited amount of stacks, especially on 32-bit systems. You have maximum allowed stack size, something like 1MB or 4MB. Now if you put these all in the user address space you will see how quickly it eats up the virtual address space. That limits the kernel managed stacks to something like 100 - 200 stacks otherwise there wouldn't be any space left for the heap and other memory areas. If you want more threads than that you have to supply our own stacks.

Also in some rare cases you want to control the amount stack used. Sometimes 4kB is far enough and perhaps only 256 bytes would do. Then you could supply your own stack would help you do reduce memory consumption.

I'm not sure about pthreads but I think it's these pitfalls it tries to solve by supplying its own stacks. If you don't supply your own, the kernel should handle it.

Gigasoft · Post by **Gigasoft** » Thu Mar 08, 2012 7:47 am

Not necessarily. That's what break syscall good for (known as "sbrk"). If it happens, you can move the whole stack upwards, and modify esp in tss. You can implement it without specific syscall too, the trick is you have to check heap_end==stack_end in PF handler before a page is allocated. The app would not notice at all, and continues without a problem.

That would break just about every program in existence. Don't ever do that.

So threads of the same process will share code and data, but will have their individual stacks. This is what Solaris call light weight process. In this case the kernel aware of threads, and switches among them by invalidating only the stack (opposite to invalidating the full address space on process switch).

That sounds like an unnecessarily complicated thing to do just to conserve virtual addresses. It's better to just share the same address space between all threads, not to mention less of a surprise to the programmer. People have expectations of what a thread is. If some pointers are always valid, while others sometimes mean this and sometimes that, it does not rhyme with that expectation. That is why you name your thread creation function pthread_create and not pfrankenstein_monster_eatmybrain, for example.

Which is - as I wrote before - complicated, and, hm let's say, more than dangerous. There's a good reason why direct calling of ring3 function from ring0 code is prohibited.

That is not how SEH usually works. Once the exception handler starts executing, the kernel is done and has placed all information necessary to process the exception on the user's stack. In the case of a stack fault, it should switch to a special stack for handling stack faults, either set aside for each thread or for the entire process (with suspension of all the other threads in the same process). An user mode layer should take care of everything that happens after execution returns to user mode, to avoid problems with nested exceptions exhausting the kernel stack.

JAAman · Post by **JAAman** » Thu Mar 08, 2012 9:39 am

Gigasoft wrote:
So threads of the same process will share code and data, but will have their individual stacks. This is what Solaris call light weight process. In this case the kernel aware of threads, and switches among them by invalidating only the stack (opposite to invalidating the full address space on process switch).
That sounds like an unnecessarily complicated thing to do just to conserve virtual addresses. It's better to just share the same address space between all threads, not to mention less of a surprise to the programmer. People have expectations of what a thread is. If some pointers are always valid, while others sometimes mean this and sometimes that, it does not rhyme with that expectation. That is why you name your thread creation function pthread_create and not pfrankenstein_monster_eatmybrain, for example.

that is simply not true... passing pointers to data existing on the stack to another thread is always a bad idea -- if you are doing that, your program is seriously messed up without any help from anyone else... if it works at all, its simply a fluke...

now i am certainly not in any position to claim any kind of programming skill or experience, but, imho, if you are giving another thread a pointer to a value on your local stack, you are already in a really really bad place

and it is not done to conserve address space, its done to protect the program from bugs in itself (personally, i consider passing pointers to data existing on your local stack to another thread is a bug, but apparently you don't), as well as to simplify stack management -- rather than being "unnecessarily complicated" it is actually much simpler than any other method (albeit with a slight performance penalty -- though not really much of one, and if the threads are running on different CPUs, it could actually improve performance)

Gigasoft · Post by **Gigasoft** » Thu Mar 08, 2012 10:42 am

A bug is a failure of a program to operate in the intended way. How does changing the correct behaviour of a correct program into an incorrect one protect anyone from bugs? Wouldn't it be more natural to say that the bug is with the thread implementation, which does not work like it claims to do?

IEEE Std 1003.1 wrote:Thread

A single flow of control within a process. Each thread has its own thread ID, scheduling priority and policy, errno value, thread-specific key/value bindings, and the required system resources to support a flow of control. Anything whose address may be determined by a thread, including but not limited to static variables, storage obtained via malloc(), directly addressable storage obtained through implementation-defined functions, and automatic variables, are accessible to all threads in the same process.

Accessing a stack variable belonging to another thread is a perfectly legal and normal behaviour. If it works, it is most certainly not a fluke, it means that the implementor bothered to read the specification that defined what he claims to have implemented. One is of course free to make a thread implementation where thread stacks are only accessible from the same thread, but then it wouldn't be a POSIX thread implementation.

One should try not to design a system around the assumption that the programmer does not know what he is doing. If someone does not know how to program, chances are that he will be doing something he knows how to do instead, such as stopping suspicious drivers, flipping hamburgers, flying an airplane, or selling real estate. Someone who passed a stack variable pointer to another thread probably meant to do just that, so one should let him expect that to work like the standard says it should.

Reference: http://pubs.opengroup.org/onlinepubs/00 ... tag_03_393

Edit: Looking back at the first page, I see that the mention of POSIX threads didn't come from the OP. However, even so, most people would expect stack variables to be a shared resource, since that's how threads work on virtually every other system.

gerryg400 · Post by **gerryg400** » Thu Mar 08, 2012 12:00 pm

Gigasoft wrote:Accessing a stack variable belonging to another thread is a perfectly legal and normal behaviour.

It's extremely common. An example (note I haven't even compiled this code but I think it shows Gigasoft's point)

Code: Select all

#define TOTAL_DATA      (10000)
#define NCORES          (4)
#define DATA_PER_CORE   (TOTAL_DATA/NCORES)

int control_func() {

    int     data[TOTAL_DATA];
    int     tid[NCORES];
    int     i;

    for (i=0; i<NCORES; ++i) {
        pthread_create(&tid[i], NULL, work_func, &data[i*DATA_PER_CORE]);
    }

    for (i=0; i<NCORES; ++i) {
        pthread_join(tid[i], NULL);
    }

    for (i=0; i<TOTAL_DATA; ++i) {
        printf("Data %d: %d\n", i, data[i]);
    }

    return 0;
}

void *work_func(void *data) {

    int i;

    for (i=0; i<DATA_PER_CORE; ++i) {
        data[i] = something();
    }

    return NULL;
}

turdus · Post by **turdus** » Thu Mar 08, 2012 2:05 pm

Gigasoft wrote:That sounds like an unnecessarily complicated thing to do just to conserve virtual addresses.

That sounds like you know nothing about threads. It has _nothing_ to do with conserving addresses.

Gigasoft wrote:Accessing a stack variable belonging to another thread is a perfectly legal and normal behaviour

No. From POSIX.1-2008: http://pubs.opengroup.org/onlinepubs/96 ... g_15_09_08

An "application-managed thread stack" is a region of memory allocated by the application-for example, memory returned by the malloc() or mmap() functions-and designated as a stack through the act of passing the address and size of the stack, respectively, as the stackaddr and stacksize arguments to pthread_attr_setstack().
...
The application may thereafter utilize the memory within the stack only within the normal context of stack usage within or properly synchronized with a thread that has been scheduled by the implementation with stack pointer value(s) that are within the range of that stack.

First, you must record your stacks (so bound checks or guard item checks can be and should be done); and second you are only allowed to refer to another thread's stack under special circumstances, which very unlikely called "normal". What's more, every process must have it's own stack-aware scheduler implementation. Now that is what I call "unnecessarily complicated".

It's much easier, clearer and safer to have the kernel do the scheduling and synchronization. That's true, there'll be a time penalty on switches, but stability worth it (if one of your pthread makes a boo-boo, all threads of the same process will be killed. On the other hand if the same happens with an LWP, the other threads run happily after).

You should make a little more research on the topic.

turdus · Post by **turdus** » Thu Mar 08, 2012 2:24 pm

gerryg400 wrote:The Posix spec. doesn't require that a pthread program create its own thread stacks.

Well, yes and no. See "man 7 pthreads" for example. Each thread _MUST_ have it's own stack:

As well as the stack, POSIX.1 specifies that various other attributes are distinct for each thread

It's just that the POSIX spec doesn't require to call pthread_attr_setstack() directly if address and size is unconcerned. pthread_create() will do it silently in the background if you do not supply it with stack attribute object.

bluemoon · Post by **bluemoon** » Thu Mar 08, 2012 2:28 pm

turdus wrote:
Code: Select all
---------- 4 GB
Kernel                           mapped globally
---------- 3 GB
v Stack v                        mapped per thread
[empty]
^ Heap  ^                        mapped per process
Data                             mapped per process
Code                             mapper per process
---------- 0 GB
So threads of the same process will share code and data, but will have their individual stacks. This is what Solaris call light weight process.

I actually checked some doc from here http://www.princeton.edu/~unix/Solaris/ ... ocess.html
And on definition of wiki http://en.wikipedia.org/wiki/Light-weight_process

Both have no suggestion of LWP map private stack out of the same address space. I doubt they do that as that extra invalidation slow down thread context switching while back in the old day they think 4GiB is pretty much and may just scatter the thread stacks on address space.

gerryg400 · Post by **gerryg400** » Thu Mar 08, 2012 2:46 pm

Turdus, the Posix spec is very clear about this, there should be no confusion. The code I posted, which has threads accessing the stack of other threads is Posix conforming. The Posix spec. gives examples just like it.

Your threading model is not Posix confirming.

Now, that is not a bad thing. Posix is not the be all and end all and your model does have some advantages.

First, you must record your stacks (so bound checks or guard item checks can be and should be done); and second you are only allowed to refer to another thread's stack under special circumstances, which very unlikely called "normal". What's more, every process must have it's own stack-aware scheduler implementation. Now that is what I call "unnecessarily complicated".

That's just not true. I can't see how you came to those conclusions from reading the Posix spec.

The application may thereafter utilize the memory within the stack only within the normal context of stack usage within or properly synchronized with a thread that has been scheduled by the implementation with stack pointer value(s) that are within the range of that stack.

Properly synchronised, in the Posix meaning, can mean between a pthread_create and a pthread_join.

turdus · Post by **turdus** » Fri Mar 09, 2012 2:27 am

@bluemoon: "Both have no suggestion of LWP map private stack out of the same address space". But of course, it's an implementation detail. It's not the question of address space, but who is responsible for the thread's stack, and whether kernel knows about threads or not. LWP: kernel, pthread: userspace.
@gerryg400: "Your threading model is not Posix confirming." Hell no! What's the point repeating the same mistakes over again?!?

Rudster816 · Post by **Rudster816** » Sat Mar 10, 2012 8:56 pm

So long as there is a couple of guard pages at the bottom of the stack, there really shouldn't be any problems with stack space colliding with the heap.

The only thing that I can foresee a legitimate application overriding the heap by using too much stack space is if a function tried to allocate an excessively large amount of local variables on the stack.

e.g.

Code: Select all

push ebp
mov ebp, esp
sub esp, 32768
...
push foo
call bar
...

In which case if the stack was already very close to its limit, AND the heap + code + data was 2.9GB+, then ebp might actually point to a valid address that's in the heap without incurring a page fault by trying to access an address in a guard page. I'm not sure how GCC behaves when you declare very large fixed arrays locally, but I assume there is some kind of threshold before it will just stick it in the .bss section.

I say, so long as there is a reasonable amount of non present pages (I'd say ~64kb worth would be sufficient), then any stack\heap collisions would be programming\compiler errors, not a kernel bug or design flaw.

Of course this is really only an issue on x86 non PAE paging. Hell, with x64 paging you could just dedicate an entire PML4E (512GB region) to a process's stack(s). It would be an extra 8KB or so worth of overhead to map, but it would make it really easy to differentiate between stack and heap usage.

Yoda · Post by **Yoda** » Sun Mar 11, 2012 5:01 am

Yes, the stack on 32-bit apps is philosophical problem for me too. Guard pages is simple solution but what if application allocates too much memory on stack for local variables? For example semantically correct small test program

Code: Select all

#include <stdio.h>

void main (void) {
  int  arr[1000000];
  printf ("Hello, World!\n");
}

successfully compiles, but at execution jumps over the guard pages on XP and causes task killing by exception on function call.

OSDev.org

managing stack

Re: managing stack

Re: managing stack

Re: managing stack

Re: managing stack

Re: managing stack

Re: managing stack

Re: managing stack

Re: managing stack

Re: managing stack

Re: managing stack

Re: managing stack

Re: managing stack

Re: managing stack

Re: managing stack

Re: managing stack