Uhm, I think you are a bit confused, or at least, misspoke a bit.
OK, a quick review of memory allocation is in order, not because I think you need it necessarily, but because I want everyone on the same page, as it were.
So: there are, broadly speaking, four types of application memory allocation:
- read-only, which is both sized and initialized at compile time and (as the name implies) is immutable. This is used for things like code and constants.
- global, which encompasses all mutable elements which are sized at compile time (and may or may not be initialized then as well), such as global variables and static variables.
- function local, which is allocated as part of the function's activation record; while it is possible for this to be of variable size on a per call basis (such as with varadic arguments, or in some languages, dynamically sized arrays), it cannot be resized once the function has been entered, and when the function exits, the data automatically goes out of scope (becomes inaccessible), and is (usually) freed automatically.
- heap memory, which is allocated from a general, global memory pool called the heap, and has to be managed either manually by the programmer, or by an automatic mechanism such as reference counting or garbage collection. Heap memory can persist past the scope of function that creates it, but can only remain accessible so long as there is a reference (e.g., a pointer) to it, otherwise it becomes garbage (or, if there is no garbage collection, a memory leak).
Note that I spoke of the activation record without specifically mentioning the hardware stack; this is because while the hardware stack is the obvious and easy way to allocate an activation record - so much so that activation records are often termed 'stack frames' - it isn't the
only way. I mention this to try to separate the idea of the activation record (or function environment - it's one of those things where there are several names for it, each with different nuances) from the specific implementation via a stack frame.
Having said that, let's talk stack frames.
No, wait, first let's talk function calls, because you'll need this to understand those. When we talk about a function call, we are really talking about two operations: saving the position we are calling the function from, so we can return to it later, and jumping to the start of the function we're calling. In most modern systems, this is done in one instruction, but how that instruction works on different systems is illuminating.
I am assuming that you are familiar with the
CALL instruction of the x86, given what you've already said, but just to be thorough, it is an instruction that
- Pushes the address of the instruction itself onto the hardware stack; and
- Jumps to the called address.
OK, I am being overly pedantic, but bear with me. The key thing here is, the x86 has a dedicated register for the stack pointer, and another for the frame base pointer, and the
PUSH/POP,
CALL/
RET instructions automatically increment and decrement the stack pointer.
Thing is, not every ISA that uses a stack has a
dedicated stack pointer. Most RISC designs don't, instead having a huge pile of general registers and some kind of convention as to which one to use for the stack, which to use for the frame pointer, and so forth. Manipulating the stack means explicitly subtracting from, or adding to, that register. This isn't as big a deal as it might sound, because by convention, they also use registers for most or all of the function arguments and return values, and is optimized for keeping local variables in registers as well - it is all designed to avoid accessing memory unless absolutely necessary. Most functions never create a stack frame to hold their local values, unless they are calling another function which might overwrite their locals or arguments - and when they do create a stack frame, they will generally move the stack pointer and frame pointer once, at the start of the function, rather than on every inner call.
Some RISC systems go so far as to have really ginormous register files (sets of registers) and use 'windows' of 32 to 64 registers out of, say, 1024, meaning that most functions never hit memory at all.
This is why I made such a point about describing 'activation records' as separate from 'stack frames'.
This also means that the instruction(s) for function calls and returns can't automatically use the stack the way those on the x86 does.
On the MIPS, for example, the instruction
JAL (Jump And Link) stores the return value in register 31 (
$ra, which is something of an exception to the 'general purpose only' rule, but not a huge one), rather than on the stack itself, and the stack pointer doesn't get touched. The
JR instruction can later be applied to
$ra to return; making sure that any calls that function makes itself don't scribble on
$ra is left to the callee, usually by including it as an offset in its own stack frame if it needs to.
Now, back to stack frames on the x86.
You will no doubt have noticed that none of these function call methods have explained arguments yet. This is because, in both x86 and most RISC systems, arguments are always a matter of convention (there are some ISAs where they aren't, but unless the Mill gains traction, or someone decides to revive one of the LispMs, you probably won't run across one). On the x86, you would push the arguments onto the stack before the
CALL instruction, leaving the arguments in a fixed set of positions relative to the return address. There is some leeway in this, and in the case of C, the arguments are pushed in reverse order to facilitate varadic functions, meaning that for
Code: Select all
printf("Hello, %s, what a fine %s it is!", "Donkey Breath", diurnal_period_str);
it would push the address of the variable
diurnal_period_str, then a pointer to the string constant "Donkey Breath", then a pointer to the format string, meaning that the format string - which is the one which tells how many additional arguments there are - is in a constant location of fixed size.
As you can imagine, pushing each of these using the
PUSH command would be slow and tedious, so most compilers will opt to decrement the
ESP by the appropriate amount once and then handle placing the arguments, just as is done with the RISC designs, but that's not really a relevant detail here. The point is, the arguments are now on the stack, in a known (or knowable) order, and each can be found just by an offset from the current stack pointer.
You might have to allocate a space on the stack for a return value, in some calling conventions (there are several), but in most of the commonly used ones, the
EAX and
EDX registers are used to pass the return instead. This isn't too important either, but it is something to take note of.
Then we make the
CALL. And the stack gets incremented. All good, so far, right? We know where the return value is - it's at the top of the stack. We still know where the arguments are, they are at addresses
ESP+4,
ESP+8, etc., going back up the stack.
Except now we need to keep a copy of the current stack pointer, or we'll lose track of them, and lose track of the return value. What do we do now?
Well, first off, we need someplace to put the current stack pointer. Where? the Base Pointer, of course! But first, we need to save
that, because it was already used when the calling function was itself called.
So we
PUSH EBP. This gives us a stack with the previous frame base, then the return, then the arguments. We can now
MOV EBP, ESP and we've got the pivot of our stack frame. The offsets of our arguments are now
EBP+8,
EBP+12 etc.
But we still need to save the other registers (for which the
PUSHALL instruction helps, though it isn't always the most efficient). Then we need to make space for our locals, which can be done with
PUSHes or just by adding a certain amount to ESP.
All this just to get to the start of the function itself, but again, this is usually the compiler's job, not yours.
To return, we need to restore any saved registers we used (except
EAX and
EDX, if they are used for returns), then reset the stack to where the EBP is by
MOV ESP, EBP or something similar.
This also has the effect of moving the locals out of scope - the values are still there in memory for now, but woe betide you if you try to access them, and in any case, you would need a pointer somewhere
other than the stack frame to do so. This is why you don't return a pointer to a function-local variable, BTW - the variable is now in memory that is 'officially' freed, and could get overwritten at any time from then on out.
We then restore the old
EBP, bringing the top of the stack back to where the return value lies. We can finally use
RET to bring us back where we were when all this started (though the caller may need to clean the arguments off of the stack after we've returned to it).
Right.
In a language like C or C++, this is the standard way of working with variables of all kinds. This does leave you stuck if you need a data structure of variable size, or need to create a composite structure in a function that persists past the function's scope. This means heap allocation, which means using pointers to hold and track them, and keeping track of where you created them and when and where you free them. You can have a garbage collector doing it for you, as you would see in a language like Rust (or even as a library for C or C++, such as the Boehm collector, but you would then need to use that collector's functions for everything instead of
malloc()free() or
new/
delete).
A lot of languages (e.g., Java, C#, Python. etc) don't give you let you manipulate the pointers directly at all; variables of object types are always involve allocating the data structures on the heap, and will silently manage garbage for you. They may use stack allocation for some kinds of variables and not others (e.g., value types' in Java and C# are local types, but all object types are in heap), others avoid the stack entirely and treat everything as a heap reference, and still others treat everything as heap references but might optimize certain cases by using the stack for things which the compiler or interpreter can determine have a fixed lifetime. They don't even necessarily use the
CALL instruction at all - just because it is there doesn't mean it is required, and if it doesn't fit the language's programming model, it usually won't.
At this point, if you are still reading this at all, you are probably wondering what my point is, so:
TL;DR It is heap allocation that is persistent, not stack allocation. But, if you don't free a heap object in the function it was allocated in, you need to pass a pointer or reference to it back to the caller, and if you lose that you are SOL (your ability to access the data gets lost, and the object is either garbage collected or becomes a memory leak depending on the language and how it was allocated).