Page 2 of 2

Posted: Tue Sep 25, 2007 3:18 am
by JamesM
The kernel would then probe blocked threads when it needs the memory, checking whether the thread's stack pointer has left the newer region.
senaus: Could you not also check the page table entries for that stack, and see if the "accessed" bit is set? If the process has been run for a timeslice and hasn't accessed the stack, odds are (is it 100% certain?) that the stack is no longer in use.

Posted: Tue Sep 25, 2007 8:00 pm
by Avarok
Well, what I see happening is that we have:

2 address pages mapped onto 1 real hardware page. One of those address pages has lots of room to grow downwards and one has none. Hence why we did this (so we could allow the stack to keep growing and still keep the old pointers into the old data if there were any)

The problem that I was describing comes from the fact that as the stack unwinds, it will now only unwind on the new virtual addresses, leaving the old address page in tact even if the new one is completely free.

So while the effects will be visible on both pages and correctly be reflected on the pointers that might be in the stack; the old address page won't be invalidated automatically?

Posted: Wed Nov 28, 2007 9:22 am
by didroe
I know this thread is a couple of months out of date but I have something to add for anyone else browsing this thread.

Due to design decisions elsewhere in my OS I need to be able to have the stack extend indefinitely. I've come to the conclusion that there are 2 ways to do this:
  • Only ever use relative addressing for stack based variables, the compiler should be able to handle this as it should know how big the stack is. You do have to make sure you don't put dynamic stuff on there though. Then the stack can be moved in its entirety.

    One issue is that you are then limited to how much contiguous virtual space you can allocate (ok so there is a LOT of space in 64bits).
  • Alternatively, and I think this is the way I'm headed, you can allocate it in chunks. If you have a guard page above and below your stack; when you expand too far, you get a page fault and try allocating the next page. If that is already allocated to something else then you start a new stack 'piece' somewhere else with the address of the last one stored somewhere (maybe in the physical address of the top guard page's PT entry). When unwinding, you hit the top guard and then reload the stack pointer from the last piece.

    This means that you don't have the issue of references to the stack needing to be changed, and also you don't have the problem of having to reserve as much virtual space as you're ever going to need. Drawbacks are 2 lost pages per stack piece and the overhead of shuffling it about. The amount allocated per piece can be dynamic though and could take into account how much space the current process had allocated in the past, etc.

Posted: Tue Dec 04, 2007 9:55 am
by jal
didroe wrote:Alternatively, and I think this is the way I'm headed, you can allocate it in chunks. (...) Drawbacks are 2 lost pages per stack piece and the overhead of shuffling it about.
The lost pages are only lost in address space, not in real memory, so I wouldn't call it a big drawback. Something that you should be careful with though, is that you can get a situation of winding and unwinding around the chunk boundaries, so that you get a page fault with nearly every push or pop. That'll slow your applications to an unusable speed. So you must not remove a chunk until after its previous chunk has been unwinded.


JAL

Posted: Tue Dec 04, 2007 5:49 pm
by didroe
Yeah, there is that. Even keeping things allocated for longer, you would still have the issue of hitting guard pages when dealing with different pieces which would result in the same behavior. Not sure how to go about dealing with that yet. I guess you could detect so many faults to the guard page in a certain amount of time and then make a new piece containing the current boundary. Thus removing the hotspot.

Another issue that occurred to me after posting is you also have to be careful about what gets put on there so it doesn't cross a stack piece boundary. That should just be an case of being careful with alignment of local variables.

I'm thinking the best approach would be to have the compiler infer some kind of stack usage value from the code, this would set the initial stack size, piece size, and determine how far from other stacks it is in the address space (ie. how much room to expand before needing a new piece). Hopefully meaning it's less likely to need to allocate extra space in the first place.

I'm still unsure how all of this will play out in reality, it might be horribly inefficient. I really want to be able to have an arbitrary stack size though, and I'm using a single address space so even at 64 (48 usable) bits you don't have enough space to pad them far enough away from each other that they will never meet.