octacone wrote:Now lets talk about system calls:
Did I understand them correctly:
1. A function of mine gets called for some reason
2. That function utilizes some forbidden code
3. That function goes through a system call handler
4. That handler has the power to run the code it received
Not necessarily "forbidden", just "provided by the kernel". System calls are used to communicate with the kernel for a number of reasons, and while many of those reasons are because it requires privileged instructions, or because the kernel is isolating a service from the application for reasons of security and/or stability, many system calls are just requests for something that happens to reside in the kernel (such as the IPC primitives).
System calls are the interface to the kernel, and several functions in the C standard library are primarily portable wrappers around system calls, though they usually also clean up the result in some way for the client-programmer; this is even more true of the system-specific libraries. For example, in Unix/Linux, the sbrk() function usually just invokes the system call of the same name and returns the result directly. The standard malloc(), in turn, uses a system call such as sbrk() to get a block of memory from the system, usually significantly larger than the one actually needed so it can do some process-local memory management rather than have to repeat the system call each time, and while it is more than just a wrapper, the heart of it's behavior is calling the kernel memory manager at need. Similar statements apply to all of the I/O functions, at least in a monolithic system (microkernels run most if not all drivers in user mode, so the kernel's involvement is limited to managing IPC and queuing, and hybrids usually do the same for a subset of drivers).
And, of course, anything that is a service of the kernel itself, such as when an application voluntarily surrenders the CPU to wait on something (via something like sleep() or wait()), that requires a system call too, even if no privileged instructions are actually used in the scheduler.
octacone wrote:
5. That handler messes up something called ESP (optional: and my imaginary multitasking system stops working)
ESP is the Extended Stack Pointer, the 32-bit version of the stack pointer, SP (the long mode equivalent is RSP). It keeps track of the top of the current stack.
Now, I am assuming you know this, but just to be clear: the hardware stack is a region of memory that is used for storing temporary values in a last in, first out order, meaning that the stack pointer holds the address of the most recent element added to the stack (the "top" of the stack, though because x86 stacks grow downward, it is actually the lowest address in the stack that is currently in use). When an item is 'pushed' onto the stack, the stack pointer is incremented by one system word (or rather, decremented, as it is growing downward), and the new value is stored in that location (the actual order in which this is done is not particularly relevant for most purposes, and may even differ from model to model of the CPU). To 'pop' a value off of the stack, you copy the value and then roll back (increment, in this case) the stack pointer.
In the x86, this is used mainly for three purposes: to store the return address (the instruction one after the CALL instruction) of a function call, to store a 'frame' or 'activation record' holding the local variables of each function, and to hold some or all of the arguments to a called function. The reason that the CALL instruction stores the return address is so that the RET doesn't have to be hard-coded with a return address, making it possible to call the same function from several places over the course of the process; RET implicitly pops the top of the stack and uses that value as the return address.
The reason that the stack frame is used to hold arguments and local variables is similar: to give the function have a temporary location for its values which can be automatically cleared when the function returns just by resetting the stack pointer. If a function requires any arguments, most calling conventions require that the caller push the arguments onto the stack in a specified order
before the CALL instruction, which puts them in a place where the callee can find them. If the function requires any local variables, they are pushed onto the stack at the start of the function.
Each function has its own frame or base pointer, which indicates where its arguments and locals are to be found. When a function starts, the base pointer (EBP in this case, Extended Base Pointer) of the caller needs to be pushed onto the stack first, putting it just below the return value. Then, the stack pointer is copied to the base pointer, which sets the base of the stack frame for the current function. The frame pointer is then is used to give a reference point for the arguments above it and the local variables below it. By using an offset up into the stack for the arguments, and down into the frame for the locals, it let's the function have a temporary location for its values which can be automatically cleared when the function returns just by resetting the stack pointer back to the value frame pointer's value, and popping the old frame pointer back into EBP.
I said all of this just to make sure we are on the same page in this regard. As I said, I expect you knew this, but I wanted to make sure we were in agreement on the terminology and so forth.
Now, one of the things that happens in a context switch (regardless of whether it is a system call or an ordinary function call) is that the context (the state of the registers at the time of the call) has to be saved before the called operation runs, and restored after is is finished (modulo whatever the return of the operation is) so that the caller can use the registers without trashing them. In a function call, it only needs to save the registers it actually uses, and if a return value is passed through a register (as is the case in most x86 calling conventions) then those don't need to be saved regardless.
In either a system call or an interrupt trap, the state of the running process - usually including all of its registers, regardless of how they are used - needs to be saved, not to the stack, but to a process record, and the process's scheduler state needs to be updated to something like "paused", waiting", or "sleeping" (in many cases, it will have to wait on some long operation such as a disk read, but if it is a request that the kernel can service immediately it might be marked as something like "paused" to show that it was the last running process, instead). This includes the stack and stack frame, so if the system call doesn't save ESP, then the process will go off the rails once the system call returns, hence the problem you mention.