Muazzam wrote:Yeah, I knew that. But what's the point of garbage collection at all!
Unfortunately, no one here has properly answered this question, probably because it is not the question Muazzam really wanted to ask - or perhaps no one here wanted to believe that someone writing an OS would ask the question he was intending, which is, "what is dynamic memory management, and why would you want to use it in an application?"
To be fair, heap allocation rarely comes up in assembly language, at least today, as most assembly code is written stick to a combination of stack allocation and static allocation. Using dynamically resized data structures is mainly a high-level language technique - it
can be done in an assembly language, of course, assuming that the OS exposes an allocate/free system call (e.g.,
sbrk in Linux), or provides a library with the equivalent (
malloc(),
HeapAlloc()), but few have the patience to write such code.
Let me use
an explanation I wrote in another forum, with some modifications specific to this topic. I am going to be as detailed and clear as possible, perhaps insultingly so, but I don't have a clear sense of how much you already know so there isn't really much of an alternative. I am aware that you probably know most of this already, but without knowing what you
don't know, I can't afford to skip anything.
The first thing you need to know is there are four different ways memory can be allocated to an application program. The first is
read-only memory, which is where things like constants and string literals are stored; on protected/long mode systems, trying to change something in read-only memory will result in a protection fault.
The next kind of memory is
global memory, where global variables are set up. This is a fixed amount of space given to the program when it starts, and cannot be increased or decreased. Some global memory may be initialized when the program starts, using values that are part of the executable image, but there may be a separate area of global memory (the so-called
.bss section) is simply an amount of memory that the loader clears when the program starts - the executable file just stores the size of the .bss section, to avoid having a large blob of useless empty memory in the file.
The third kind of memory is
local (or automatic) memory, which is allocated when a function begins and deallocated when it returns. While there are a few different ways that this could be implemented, it is almost invariably done using a stack, a data structure so ubiquitous that x86 processors have built-in hardware support for it, and most other designs have something similar, either as a dedicated register or as a register usage convention.
So, what is the stack, anyway? It is a region of memory which is set aside for temporary values, combined with a stack pointer (a register specifically used for holding a pointer to the current item in the stack), which is used to act as a last-in, first-out data structure, conceptually resembling a stack of dishes or some other small, easily piled items. The top of the stack (usually the lowest used address in the stack, on x86 CPUs - stacks on PCs usually grow downward, for historical reasons - but I digress) is the memory address where the most recent item has been put. When you add something to the stack - something referred to as pushing the item onto the stack - the stack pointer gets incremented to point to the next address, and the item being pushed is copied to the newly-allocated stack location (actually, with the upside-down stacks usually used, it is usually decremented - subtracted by one - but again that's besides the point). When an item is removed - popped off of the stack - the stack pointer is simply decremented (or incremented), and the used memory is simply left as it is, ready to be reused.
Why is this so important that it has special hardware support? First, because it makes it possible to call functions or subroutines without having to hard-code the return address into the called function - when a function call is made, the address of the instruction following the call is pushed onto the stack (implicitly in the case of the x86, by the CALL instruction, explicitly in many RISC CPUs which use a linkage register for the return address) so that when the callee exits, the return operation (RET on the x86) can pop off the return address and jump to the address of the next instruction in the caller.
Second, it allows a the caller to pass the arguments to the callee by pushing them onto the stack, with the result that the callee can manipulate the arguments without worrying about changing them - the copy pushed onto the stack is treated as being 'local' to the callee. Whether the caller or callee has to clean up the stack afterwards is one of the questions that a calling convention has to answer, though most today leave it to the caller.
Third, when combined with a base pointer (also called a frame pointer), the stack makes for an easy way to hold a group of local variables for the callee function, something called an
activation record or
stack frame. Basically, the idea is that you push the current base pointer onto the stack, then copy the current stack pointer into the base pointer. This now becomes the base address of the activation record (hence the name 'base pointer'). Then the local values are implicitly 'pushed' onto the stack by decrementing the stack pointer to give enough space on the stack for them. To access the local values, you would use an offset from the base pointer, which would point to the location to set aside for it. When the function exits, you then copy the base pointer back to the stack pointer (implicitly popping off the locals), then pop off the old base pointer, restoring the state to the point of the call - if done correctly, the top of the stack should now point to the return address pushed by the calling instruction.
It is a complicated dance, and one which confuses almost everyone when they first encounter it. Stack discipline in assembly code is fairly tricky, but you probably already knew that.
This brings us to the last kind of memory, the
heap or dynamic memory. This a region of memory that can be allocated to programs in pieces, at run time, without being automatically reclaimed when the function that allocated it ends (some languages have a special means of reclaiming dynamic memory after it is no longer in use called garbage collection, but that's not relevant here). Most operating systems have a system call to allow the program to request that the system allocate enough memory to hold one or more values of a given type; it returns a pointer to the newly allocated memory space. There usually is a corresponding system call which allows the program to return the allocated memory to the system. In between allocation and release, the memory becomes available for use, so long as there is at least one pointer to the memory block; you can pass the address of that memory around from one pointer to another, treating it as if it were the same as a pointer to any other non-constant memory. However, once the system releases that memory, the pointers to the memory are all invalid, and should not be used - preferably, you should null them after releasing the memory they point to. When the process exits, the operating system (should) release the memory used, even if it wasn't deallocated, but as long as the program is running, the memory will still be allocated until freed.
There are just three rules to follow with dynamic memory: don't lose the starting address of the memory block, or you won't be able to deallocate later; always remember to deallocate it after you are done with it; and don't try to access the memory after it has been deallocated. These rules are trickier to follow than they sound, however, as it is easy to lose track of where the memory is and how you are using it at any given time. Failing to release memory when it is finished is a memory leak, while a pointer that points to a freed block i(or one that is invalid for some other reason) s a wild pointer.
So, if it is so complicated, why bother with dynamic memory? Because you can make arrays of arbitrary size without knowing how big it has to be beforehand. Also, you can use it to allocate structures with pointers in them, which can then point to other dynamically allocated memory, letting you chain the allocated memory into more complex data structures such as lists and trees. It simply wouldn't be practical to make these structures without dynamic allocation.
This brings us to application memory management. Most of the time, programs do not fetch a new block of memory each and every time they need one; the overhead of system calls is too high, and it potentially can slow the whole system down as the OS manages the allocations. So the usual practice is to allocate larger blocks of memory than is actually needed, and have use that as a kind of scratch space for allocating memory from. The allocations and frees are usually wrapped up in a pair of library functions corresponding to the system calls, and the program only calls the OS again if it needs more. The local allocator has to keep track of the local heap memory, but usually this isn't too complicated.
However, this leaves the programmer to keep track of the actual pointers to the memory, which can be very complicated indeed. For this reason, various approaches for automating memory management have been devised, with garbage collection being the most common (though a related form called 'reference counting' is often used as well, and often confused with it). Basically, in garbage collection, the library functions require the program to get all pointer variables from the allocator, and a part of the program - the garbage collector - checks periodically to see if the blocks of allocated memory still have any pointers pointing at them. When a block has no pointers to it, the allocator frees it for the program, without any action on the part of the programmer. Older garbage collectors had to stop the program in order to do this, but some modern multithreaded ones can do so in parallel to the program's other threads.
While GC is most associated with interpreters, it is applicable to compiled code as well - it was originally developed for the original Lisp 1.5 compiler in 1961. It is possible to use a garbage collector as a library and requiring the programmer to follow the GC discipline, but it is far more common for it to be implemented in the language itself. It has a number of other advantages for both security and simplicity, especially in scripting languages - it allows the language translator to abstract away the allocation and deallocation of variable memory entirely - but there are distinct costs to it as well. There are endless variations on garbage collection meant to reduce the overhead of doing the collection, but a certain amount of overhead is inevitable when garbage collection is used.
Needless to say, many of these issues are relevant to memory allocation at the OS level, as well, but there it becomes even more complicated as the OS-devver has to address issues of the specific memory hardware.
I hope that this has made things clearer for you, rather than more obscure. Comments and corrections welcome.