Jack wrote:
okay maybe something about this is obvious because I haven't seen it talked about much ... If you have program information loaded into memory (I'm assuming this is just a list of opcodes) how does the operating system go about executing it? It seems to run an operation you'd have to interpret the opcode or something into an assembler instruction but I couldn't find any kind of interpretation like this in the source I've been reading over.
I'm not quite sure I'm following you. Do you mean, how does a program get loaded into memory, and is there any special interpretation the system has to do before hand? Well, the answer to the last part is, yes and no. The answer to the first part should clarify that.
For an executable program (as opposed to one that runs thruogh an interpreter), this really begins at compiling or assembling time. Let's assume assembly since it's simpler to handle, and many compilers are really generating assembly code anyway, which is fed to an assembler automagically. Anyway, when the assembler generates the opcode stream from the assembler instructions (you seem to have the terminology a bit backwards from the usual usage), it does not, in most cases, have all of the information it needs to generte an executable binary - it usually has at least a few external references to libraries or the OS system calls, and it may not know where in memory the system could might load it. Thus, instead of a complete binary, it produces what is called an object file, which consists of the assembled code with all of the unreferenced data replaced by references to a special table in the file.
The first group of these references, the external references to libraries or separately assembled parts of the program, are handled by the linker, which combines all of the object code files and the libraries into a single formatted executable - the program file, in other words. However, the linker still connot determine ahead of time where the code will be loaded to, and may not be able to handle links to dynamic libraries completely. Thus the program file still has to keep a placeholder for any addresses it references, and for system calls and dynamic library calls. This is what is referred to as the executable format - typical examples are .EXE and .COM under DOS and Windows, or COFF, a.out and ELF for various Unices.
The simplest of these is the dos .com format - it is a simple binary executable, exactly as run by the CPU. Since DOS always runs .com files the same way, and the DOS calls are all handled by fixed interrupt vectors, it doesn't have to add any more data to the code, and can run it as it is. The DOS loader simply copies the contents of the file to location CS:0100 (the first 256 bytes of the code segment are used as a header by DOS) and starts execution from there.
Most other formats require the loader to patch the remaining unresolved references before starting the programs. How this is done varies from format to format.
This is just a quick overview of the issue. I recommend reading
Linkers and Loaders by John R. Levine (ISBN 1-55960-496-0) for more detail (parts of the book are available online at
http://www.iecc.com/linker/).