OSDev.org

Posted: **Sun Apr 22, 2018 3:50 pm**

Is there any programming language that allows dynamic allocation on the stack?

I often find that objects needs to allocate memory using the normal heap allocation but the objects are often limited to the function they are being used. What's happening is that you create the object but the object allocates memory and then when exiting the function the memory is returned and his goes for a lot of programming languages. This kind of behaviour has a certain overhead unless you would use the stack directly. In practice if you were using assembler you could directly just use the stack by subtracting the stack pointer with how much you want. The total amount is stored and the stack is restored when exiting the function.

This allocation option could greatly reduce the alloc/free time, reduce locking and is totally thread safe. C/C++ kind of supports this but then you have reserve the amount at compile time. C/C++ are widely used in system programming because they can use the stack a lot, but stack allocation would take it a bit further.

Why haven't I seen any language that offers this? Any disadvantages by allowing this option?

Posted: **Sun Apr 22, 2018 5:21 pm**

C11 (AFAIK) supports this by not requiring the size of an array to be known at compile time. So this:

Code: Select all

size_t size = calculateSize();
char data[size];

is valid C11

(GCC supports this even without C11, while Microsoft compilers don't support this at all though)

Posted: **Sun Apr 22, 2018 6:06 pm**

Allocating large portions of the stack and poorly controlled depth of recursion are obviously risky. OTOH, for small on-stack allocations it’s often OK to overallocate a little.
If you want to avoid unnecessary synchronization costs, you can implement per-thread/CPU heaps.

Posted: **Sun Apr 22, 2018 6:57 pm**

Uhm, I think you are a bit confused, or at least, misspoke a bit.

OK, a quick review of memory allocation is in order, not because I think you need it necessarily, but because I want everyone on the same page, as it were.

So: there are, broadly speaking, four types of application memory allocation:

read-only, which is both sized and initialized at compile time and (as the name implies) is immutable. This is used for things like code and constants.
global, which encompasses all mutable elements which are sized at compile time (and may or may not be initialized then as well), such as global variables and static variables.
function local, which is allocated as part of the function's activation record; while it is possible for this to be of variable size on a per call basis (such as with varadic arguments, or in some languages, dynamically sized arrays), it cannot be resized once the function has been entered, and when the function exits, the data automatically goes out of scope (becomes inaccessible), and is (usually) freed automatically.
heap memory, which is allocated from a general, global memory pool called the heap, and has to be managed either manually by the programmer, or by an automatic mechanism such as reference counting or garbage collection. Heap memory can persist past the scope of function that creates it, but can only remain accessible so long as there is a reference (e.g., a pointer) to it, otherwise it becomes garbage (or, if there is no garbage collection, a memory leak).

Note that I spoke of the activation record without specifically mentioning the hardware stack; this is because while the hardware stack is the obvious and easy way to allocate an activation record - so much so that activation records are often termed 'stack frames' - it isn't the only way. I mention this to try to separate the idea of the activation record (or function environment - it's one of those things where there are several names for it, each with different nuances) from the specific implementation via a stack frame.

Having said that, let's talk stack frames.

No, wait, first let's talk function calls, because you'll need this to understand those. When we talk about a function call, we are really talking about two operations: saving the position we are calling the function from, so we can return to it later, and jumping to the start of the function we're calling. In most modern systems, this is done in one instruction, but how that instruction works on different systems is illuminating.

I am assuming that you are familiar with the CALL instruction of the x86, given what you've already said, but just to be thorough, it is an instruction that

Pushes the address of the instruction itself onto the hardware stack; and
Jumps to the called address.

OK, I am being overly pedantic, but bear with me. The key thing here is, the x86 has a dedicated register for the stack pointer, and another for the frame base pointer, and the PUSH/POP, CALL/RET instructions automatically increment and decrement the stack pointer.

Thing is, not every ISA that uses a stack has a dedicated stack pointer. Most RISC designs don't, instead having a huge pile of general registers and some kind of convention as to which one to use for the stack, which to use for the frame pointer, and so forth. Manipulating the stack means explicitly subtracting from, or adding to, that register. This isn't as big a deal as it might sound, because by convention, they also use registers for most or all of the function arguments and return values, and is optimized for keeping local variables in registers as well - it is all designed to avoid accessing memory unless absolutely necessary. Most functions never create a stack frame to hold their local values, unless they are calling another function which might overwrite their locals or arguments - and when they do create a stack frame, they will generally move the stack pointer and frame pointer once, at the start of the function, rather than on every inner call.

Some RISC systems go so far as to have really ginormous register files (sets of registers) and use 'windows' of 32 to 64 registers out of, say, 1024, meaning that most functions never hit memory at all.

This is why I made such a point about describing 'activation records' as separate from 'stack frames'.

This also means that the instruction(s) for function calls and returns can't automatically use the stack the way those on the x86 does.

On the MIPS, for example, the instruction JAL (Jump And Link) stores the return value in register 31 ($ra, which is something of an exception to the 'general purpose only' rule, but not a huge one), rather than on the stack itself, and the stack pointer doesn't get touched. The JR instruction can later be applied to $ra to return; making sure that any calls that function makes itself don't scribble on $ra is left to the callee, usually by including it as an offset in its own stack frame if it needs to.

Now, back to stack frames on the x86.

You will no doubt have noticed that none of these function call methods have explained arguments yet. This is because, in both x86 and most RISC systems, arguments are always a matter of convention (there are some ISAs where they aren't, but unless the Mill gains traction, or someone decides to revive one of the LispMs, you probably won't run across one). On the x86, you would push the arguments onto the stack before the CALL instruction, leaving the arguments in a fixed set of positions relative to the return address. There is some leeway in this, and in the case of C, the arguments are pushed in reverse order to facilitate varadic functions, meaning that for

Code: Select all

printf("Hello, %s, what a fine %s it is!", "Donkey Breath", diurnal_period_str);

it would push the address of the variable diurnal_period_str, then a pointer to the string constant "Donkey Breath", then a pointer to the format string, meaning that the format string - which is the one which tells how many additional arguments there are - is in a constant location of fixed size.

As you can imagine, pushing each of these using the PUSH command would be slow and tedious, so most compilers will opt to decrement the ESP by the appropriate amount once and then handle placing the arguments, just as is done with the RISC designs, but that's not really a relevant detail here. The point is, the arguments are now on the stack, in a known (or knowable) order, and each can be found just by an offset from the current stack pointer.

You might have to allocate a space on the stack for a return value, in some calling conventions (there are several), but in most of the commonly used ones, the EAX and EDX registers are used to pass the return instead. This isn't too important either, but it is something to take note of.

Then we make the CALL. And the stack gets incremented. All good, so far, right? We know where the return value is - it's at the top of the stack. We still know where the arguments are, they are at addresses ESP+4, ESP+8, etc., going back up the stack.

Except now we need to keep a copy of the current stack pointer, or we'll lose track of them, and lose track of the return value. What do we do now?

Well, first off, we need someplace to put the current stack pointer. Where? the Base Pointer, of course! But first, we need to save that, because it was already used when the calling function was itself called.

So we PUSH EBP. This gives us a stack with the previous frame base, then the return, then the arguments. We can now MOV EBP, ESP and we've got the pivot of our stack frame. The offsets of our arguments are now EBP+8, EBP+12 etc.

But we still need to save the other registers (for which the PUSHALL instruction helps, though it isn't always the most efficient). Then we need to make space for our locals, which can be done with PUSHes or just by adding a certain amount to ESP.

All this just to get to the start of the function itself, but again, this is usually the compiler's job, not yours.

To return, we need to restore any saved registers we used (except EAX and EDX, if they are used for returns), then reset the stack to where the EBP is by MOV ESP, EBP or something similar.

This also has the effect of moving the locals out of scope - the values are still there in memory for now, but woe betide you if you try to access them, and in any case, you would need a pointer somewhere other than the stack frame to do so. This is why you don't return a pointer to a function-local variable, BTW - the variable is now in memory that is 'officially' freed, and could get overwritten at any time from then on out.

We then restore the old EBP, bringing the top of the stack back to where the return value lies. We can finally use RET to bring us back where we were when all this started (though the caller may need to clean the arguments off of the stack after we've returned to it).

Right.

In a language like C or C++, this is the standard way of working with variables of all kinds. This does leave you stuck if you need a data structure of variable size, or need to create a composite structure in a function that persists past the function's scope. This means heap allocation, which means using pointers to hold and track them, and keeping track of where you created them and when and where you free them. You can have a garbage collector doing it for you, as you would see in a language like Rust (or even as a library for C or C++, such as the Boehm collector, but you would then need to use that collector's functions for everything instead of malloc()free() or new/delete).

A lot of languages (e.g., Java, C#, Python. etc) don't give you let you manipulate the pointers directly at all; variables of object types are always involve allocating the data structures on the heap, and will silently manage garbage for you. They may use stack allocation for some kinds of variables and not others (e.g., value types' in Java and C# are local types, but all object types are in heap), others avoid the stack entirely and treat everything as a heap reference, and still others treat everything as heap references but might optimize certain cases by using the stack for things which the compiler or interpreter can determine have a fixed lifetime. They don't even necessarily use the CALL instruction at all - just because it is there doesn't mean it is required, and if it doesn't fit the language's programming model, it usually won't.

At this point, if you are still reading this at all, you are probably wondering what my point is, so:

TL;DR It is heap allocation that is persistent, not stack allocation. But, if you don't free a heap object in the function it was allocated in, you need to pass a pointer or reference to it back to the caller, and if you lose that you are SOL (your ability to access the data gets lost, and the object is either garbage collected or becomes a memory leak depending on the language and how it was allocated).

Posted: **Mon Apr 23, 2018 12:56 am**

OSwhatever wrote:Is there any programming language that allows dynamic allocation on the stack?

alloca() offers this for C, and by extension, C++.

You could even write up a C++ allocator object that does its allocations on the stack, and pass that to your containers. But be aware that this would be several flavors of surprising to the casual programmer, because you're making your container context-sensitive where it previously wasn't. I see all kinds of funny bugs springing up on this path.

As with every kind of "optimization", the guideline is measure, optimize, measure. I think you're expecting significant performance increases where only rather little ones are to be had. [1]

----

[1]: Especially with C++ and its allocator mechanism, which allows you to handle the memory management of a container yourself, potentially reducing heap allocation / deallocation by a significant amount. Again, make sure to measure, optimize, measure...

Posted: **Mon Apr 23, 2018 12:12 pm**

I would like to add a few remarks as well.

First, the platform ABI usually requires alignment of the call frames. This means that the stack pointer must be masked after a dynamic allocation, whereas static frames can be automatically padded to the necessary boundary at compile time. In languages with type alignment constraints, the allocation itself would have to start on aligned address. This adds up to the otherwise simple cost of decrementing the stack pointer.

Also, in multi-threaded programs, the stack allocation is limited, because it doesn't complement the heap space anymore. The physical memory gets committed on-demand through the guard page mechanism, but the virtual region for the stack is bounded, and thus it is not dynamic in the same sense anymore. The stack virtual address space is not shared and cannot be re-purposed to other uses.

The guard page has to be handled carefully. If the activation record or stack allocation exceed a single page, the page can be hopped over. Thus a stack allocation in user space either needs to check the amount allocated and possibly force access to the guard page consecutively, or alternatively, the size of the guard page has to be extended by the kernel (which is the preferable approach, but is not available on many platforms). This again adds to the simple pointer decrement cost. Guard pages are not a problem for kernel and non-MMU embedded programming, but the language has to address the lowest common denominator and these problems detract from the value of introducing the feature.

Also, allocations are subject to exploits. To counter this, the heap has been hardened with mitigations - safe unlinking, magic words at the end of the allocation, sanity checks of neighboring allocations, etc. Stack allocations would be less attractive with the performance cost of such mitigations, but are too vulnerable without them.

Posted: **Mon Apr 23, 2018 2:20 pm**

Aaaaand now I feel like quite the fool for misinterpreting the original question. My apologies.

Posted: **Mon Apr 23, 2018 2:34 pm**

Schol-R-LEA wrote:Some RISC systems go so far as to have really ginormous register files (sets of registers) and use 'windows' of 32 to 64 registers out of, say, 1024, meaning that most functions never hit memory at all.

I haven't studied these architectures in detail but do they have a separate stack for storage (as opposed to only storing register frames) or do they use the register file for storage as well? It is not that unusual that you choose to put arrays on the stack of let's say 256 bytes. Will that one end up in the register file in that case?

Posted: **Mon Apr 23, 2018 2:47 pm**

simeonz wrote:I would like to add a few remarks as well.

First, the platform ABI usually requires alignment of the call frames. This means that the stack pointer must be masked after a dynamic allocation, whereas static frames can be automatically padded to the necessary boundary at compile time. In languages with type alignment constraints, the allocation itself would have to start on aligned address. This adds up to the otherwise simple cost of decrementing the stack pointer.

You can easily align the stack pointer for every allocation.

simeonz wrote:Also, in multi-threaded programs, the stack allocation is limited, because it doesn't complement the heap space anymore. The physical memory gets committed on-demand through the guard page mechanism, but the virtual region for the stack is bounded, and thus it is not dynamic in the same sense anymore. The stack virtual address space is not shared and cannot be re-purposed to other uses.

Stack allocation serves the best in demand paged systems where the stack virtual region will be grown as it hit unused pages. You can find methods of shrinking the stack virtual regions for example when the thread ends.

simeonz wrote:The guard page has to be handled carefully. If the activation record or stack allocation exceed a single page, the page can be hopped over. Thus a stack allocation in user space either needs to check the amount allocated and possibly force access to the guard page consecutively, or alternatively, the size of the guard page has to be extended by the kernel (which is the preferable approach, but is not available on many platforms). This again adds to the simple pointer decrement cost. Guard pages are not a problem for kernel and non-MMU embedded programming, but the language has to address the lowest common denominator and these problems detract from the value of introducing the feature.

I think the allocator needs to check that it doesn't go past the end and it shouldn't be difficult. I see no problems with spanning several pages in the stack virtual regions as pages will be added as they are needed.

Posted: **Mon Apr 23, 2018 7:17 pm**

OSwhatever wrote:You can easily align the stack pointer for every allocation.

OSwhatever wrote:I think the allocator needs to check that it doesn't go past the end and it shouldn't be difficult. I see no problems with spanning several pages in the stack virtual regions as pages will be added as they are needed.

True, but the cost does add up compared to the simple stack pointer manipulation. Granted, it will be cheaper than a heap/pool allocation, even with lookahead/caching, but is less attractive after other considerations. Particularly, that stack allocations are comparatively vulnerable. As I already pointed out, and will annoyingly reiterate, heap allocators perform many sanity checks even in release builds and the platform allocator, whereas a stack allocator would have to become very bloated to justify its existence if it decides to support those as well.

Posted: **Tue Apr 24, 2018 3:17 am**

simeonz wrote:True, but the cost does add up compared to the simple stack pointer manipulation. Granted, it will be cheaper than a heap/pool allocation, even with lookahead/caching, but is less attractive after other considerations. Particularly, that stack allocations are comparatively vulnerable. As I already pointed out, and will annoyingly reiterate, heap allocators perform many sanity checks even in release builds and the platform allocator, whereas a stack allocator would have to become very bloated to justify its existence if it decides to support those as well.

You can have sanity checks with stack allocators too, what I'm thinking is having meta data around the allocated area. This is like MSVC having guard data around all stack variables in debug builds which is very helpful for debugging. However, this has nothing to do with vulnerability as the damage is already done when you detect it but it is certainly good for detecting buffer overflows.

Posted: **Tue Apr 24, 2018 4:53 am**

Sanity checks aside, stack memory is more vulnerable than (randomized) heap because its location is (more easily) known. That might or might not be a thing to consider, depending your system's approach to security.

Posted: **Tue Apr 24, 2018 9:02 am**

OSwhatever wrote:
Schol-R-LEA wrote:Some RISC systems go so far as to have really ginormous register files (sets of registers) and use 'windows' of 32 to 64 registers out of, say, 1024, meaning that most functions never hit memory at all.
I haven't studied these architectures in detail but do they have a separate stack for storage (as opposed to only storing register frames) or do they use the register file for storage as well? It is not that unusual that you choose to put arrays on the stack of let's say 256 bytes. Will that one end up in the register file in that case?

Well, I can't say I have used one of the Berkeley-type RISC systems (such as SPARC) which use register windows, I expect that there would be a shared 'global' register window, and the register conventions would call for one of those to be used as a global stack pointer. I am not sure, however, so if anyone can clarify this I would appreciate it.

Also, like with Stanford-style RISC systems (which includes the majority of RISC ISAs such as MIPS, ARM, POWER/PowerPC, etc.), anything that exceeded the available registers would be spilled to memory. While there was a goal to limit memory spills, it was still something that was expected to happen. I am not sure how this worked on those, however.

On a Stanford design, the register convention would define two registers as a stack pointer and frame pointer, as I already said. They would usually also define some registers as argument registers - on the MIPS, the older 'O32' convention had $a0 through $a3, while the later 'N32' and 'N64' conventions redefined four of the temporary registers ($t0-$t3) as $a4 through $a7 ($t4-$t9 still have their old names, and the convention allowed the older names to still be used to allow some interop between the two conventions, but you have to be careful about how you use them; oh, and just for added confusion, $t8 and $t9 are sometimes allocated to some specific use under all these conventions, with different names, but I forget what those uses and names are off the top of my head).

Posted: **Wed Apr 25, 2018 2:20 pm**

Another spin of this discussion is we can get the compiler to use the stack for memory when you go down in the stack call hierarchy (or up depending how to see it, closer to the base function).

Let's take this example of pseudo C something:

Code: Select all

Stuff *fillListWithStuff(int *numItems)
{
    Stuff list[howMuchStuffWeHave]; // We assume that we are allowed to "dynamically" allocate this on the stack

    foreach(Stuff s, SomeStuffList)
    {
          // Fill the stuff list with here with how many items we have, our copy is one the stack
    }

    *numItems = howMuchStuffWeHave;

    return list;
}


void getStuffList()
{
    int numItems;
    Stuff *stuffList = fillListWithStuff(&numItems);

    // Here we use stuffList

    CallAnotherFunction();

    // We cannot use stuffList here
}

So the point is that we can use the stack frames of prior function calls given that we haven't destroyed the memory by other function calls. The above example will of course work in C and the compiler will allow it but the user of fillListWithStuff must know how to use this function otherwise the user will run into problems.

To track this kind of behaviour for the compiler shouldn't be that hard so that it know if stuffList is ok to use or not. Does anybody use Rust here? Would the Rust compiler allow the above example?

I haven't seen the code like above example in my entire life in any production code and that's probably for good reason. More likely the caller provide the list to be filled or it is allocated dynamically. However, what I showed you was a valid example that could have worked and requires no dynamic allocation and works basically any number of reasonable items.

My point is again how the compiler could let us maximize the stack usage as temporary memory.

EDIT: This could be a bad fit in operating system like Linux when the signal handler runs on top of the stack at any time.

Posted: **Fri Jul 20, 2018 4:36 am**

In C# too does exists stackalloc but until Net Core 2.0 it needed unsafe code and it use was not raccomended by "normal" developers.

You would use it as this:

Code: Select all

unsafe {
     int* block = stackalloc int[100];
}

in Net Core 2.0 they have added a new type called Span... a Span of byte (Span<byte>) is really usable how in C you will use a "char *" to say a "bunch of bytes whatever they are" and now stackalloc is valid on safe code but only with Span.

My opinion is that the right solution is not to focus too much in this little details a good language should have fast heap allocation, a garbage collector and do stack escape analysis automatically (Java does this already, C# is working on it).

In Cosmos (that is a OS with a .Net runtime embedded) as we will have to write our CLR compiler we will try to use "Stack Escape Analysis" aggressively.

A note using preferably stack a C/C++ does means that on function call the objects needs to be copied any time (indeed C++ "objects" have a Copy operator) if the "object" has a lot of fields all this memcpy() will have a cost that can be bigger that simply allocate the object in the heap and pass the pointer to it to the function.

OSDev.org

Languages that support stack allocation?

Languages that support stack allocation?

Re: Languages that support stack allocation?

Re: Languages that support stack allocation?

Re: Languages that support stack allocation?

Re: Languages that support stack allocation?

Re: Languages that support stack allocation?

Re: Languages that support stack allocation?

Re: Languages that support stack allocation?

Re: Languages that support stack allocation?

Re: Languages that support stack allocation?

Re: Languages that support stack allocation?

Re: Languages that support stack allocation?

Re: Languages that support stack allocation?

Re: Languages that support stack allocation?

Re: Languages that support stack allocation?