System Calls and User Mode

Octacone · Post by **Octacone** » Tue Nov 15, 2016 9:24 am

Okay first of all: how important are they? Can I just skip this section?
How many things are going to need to be modified? Like GDT, bootloader, IRQ, ISR etc???
How does it generally work? Something like:
1. GDT needs 2 more gates
2. TSS needs to be flushed
3. I need some assembly magic
4. What to do after all this, are my function still going to work as they used to, is this the point I need to disable all of my functions and start working on system calls.

Now lets talk about system calls:
Did I understand them correctly:
1. A function of mine gets called for some reason
2. That function utilizes some forbidden code
3. That function goes through a system call handler
4. That handler has the power to run the code it received
5. That handler messes up something called ESP (optional: and my imaginary multitasking system stops working)

Any help appreciated!

That is how I see those two "structures".

iansjack · Post by **iansjack** » Tue Nov 15, 2016 10:09 am

A system call is just a request, from a user program, for the operating system to do some work for it. I prefer to use syscall and sysret, rather than call gates, to implement system calls.

Ycep · Post by **Ycep** » Tue Nov 15, 2016 11:14 am

System calls are interrupts which programs use to communicate with kernel.
For example, if your syscall interrupt is 0x81, you could implement it in such way that each AX value has different function, and then if AX=17 then your kernel may output character stored in BX to terminal.
User mode is a processor ring which has much less instruction previleges with much more safety. Most(if not all) operating systems assign that ring for applications.
For example, if your processor is executing unprevileged/faulty code, it may only affect its protection level itself and higher ones.
The only problem with protection rings is that switching between them is really CPU intensive.

Ch4ozz · Post by **Ch4ozz** » Tue Nov 15, 2016 11:17 am

Without usermode and syscalls your OS will run everything in kernelmode which is a pretty bad idea lol.
A syscall is simply a software interrupt which will jump from Ring 3 to Ring 0 into your interrupt handler.
You usually pass the number of syscall in eax and the rest of params in the other registers (or stack if register count is not enough)
Syscalls are needed to gather infos from the OS without making any compromises in terms of security.
In my OS a usermode process cant access any other memory except of his own process' memory, including other processes.
Paging without implementing usermode seems pretty useless to me

BTW: Usermode (aka Ring 3) cant call special instructions like mov cr3, eax; cli, hlt; and so on
So usermode cant possibly alter the system in any way

iansjack · Post by **iansjack** » Tue Nov 15, 2016 12:33 pm

System calls do not have to be implemented as interrupts. And they aren't in, for example, 64-bit Linux.

Schol-R-LEA · Post by **Schol-R-LEA** » Tue Nov 15, 2016 12:40 pm

octacone wrote:Now lets talk about system calls:
Did I understand them correctly:
1. A function of mine gets called for some reason
2. That function utilizes some forbidden code
3. That function goes through a system call handler
4. That handler has the power to run the code it received

Not necessarily "forbidden", just "provided by the kernel". System calls are used to communicate with the kernel for a number of reasons, and while many of those reasons are because it requires privileged instructions, or because the kernel is isolating a service from the application for reasons of security and/or stability, many system calls are just requests for something that happens to reside in the kernel (such as the IPC primitives).

System calls are the interface to the kernel, and several functions in the C standard library are primarily portable wrappers around system calls, though they usually also clean up the result in some way for the client-programmer; this is even more true of the system-specific libraries. For example, in Unix/Linux, the sbrk() function usually just invokes the system call of the same name and returns the result directly. The standard malloc(), in turn, uses a system call such as sbrk() to get a block of memory from the system, usually significantly larger than the one actually needed so it can do some process-local memory management rather than have to repeat the system call each time, and while it is more than just a wrapper, the heart of it's behavior is calling the kernel memory manager at need. Similar statements apply to all of the I/O functions, at least in a monolithic system (microkernels run most if not all drivers in user mode, so the kernel's involvement is limited to managing IPC and queuing, and hybrids usually do the same for a subset of drivers).

And, of course, anything that is a service of the kernel itself, such as when an application voluntarily surrenders the CPU to wait on something (via something like sleep() or wait()), that requires a system call too, even if no privileged instructions are actually used in the scheduler.

octacone wrote: 5. That handler messes up something called ESP (optional: and my imaginary multitasking system stops working)

ESP is the Extended Stack Pointer, the 32-bit version of the stack pointer, SP (the long mode equivalent is RSP). It keeps track of the top of the current stack.

Now, I am assuming you know this, but just to be clear: the hardware stack is a region of memory that is used for storing temporary values in a last in, first out order, meaning that the stack pointer holds the address of the most recent element added to the stack (the "top" of the stack, though because x86 stacks grow downward, it is actually the lowest address in the stack that is currently in use). When an item is 'pushed' onto the stack, the stack pointer is incremented by one system word (or rather, decremented, as it is growing downward), and the new value is stored in that location (the actual order in which this is done is not particularly relevant for most purposes, and may even differ from model to model of the CPU). To 'pop' a value off of the stack, you copy the value and then roll back (increment, in this case) the stack pointer.

In the x86, this is used mainly for three purposes: to store the return address (the instruction one after the CALL instruction) of a function call, to store a 'frame' or 'activation record' holding the local variables of each function, and to hold some or all of the arguments to a called function. The reason that the CALL instruction stores the return address is so that the RET doesn't have to be hard-coded with a return address, making it possible to call the same function from several places over the course of the process; RET implicitly pops the top of the stack and uses that value as the return address.

The reason that the stack frame is used to hold arguments and local variables is similar: to give the function have a temporary location for its values which can be automatically cleared when the function returns just by resetting the stack pointer. If a function requires any arguments, most calling conventions require that the caller push the arguments onto the stack in a specified order before the CALL instruction, which puts them in a place where the callee can find them. If the function requires any local variables, they are pushed onto the stack at the start of the function.

Each function has its own frame or base pointer, which indicates where its arguments and locals are to be found. When a function starts, the base pointer (EBP in this case, Extended Base Pointer) of the caller needs to be pushed onto the stack first, putting it just below the return value. Then, the stack pointer is copied to the base pointer, which sets the base of the stack frame for the current function. The frame pointer is then is used to give a reference point for the arguments above it and the local variables below it. By using an offset up into the stack for the arguments, and down into the frame for the locals, it let's the function have a temporary location for its values which can be automatically cleared when the function returns just by resetting the stack pointer back to the value frame pointer's value, and popping the old frame pointer back into EBP.

I said all of this just to make sure we are on the same page in this regard. As I said, I expect you knew this, but I wanted to make sure we were in agreement on the terminology and so forth.

Now, one of the things that happens in a context switch (regardless of whether it is a system call or an ordinary function call) is that the context (the state of the registers at the time of the call) has to be saved before the called operation runs, and restored after is is finished (modulo whatever the return of the operation is) so that the caller can use the registers without trashing them. In a function call, it only needs to save the registers it actually uses, and if a return value is passed through a register (as is the case in most x86 calling conventions) then those don't need to be saved regardless.

In either a system call or an interrupt trap, the state of the running process - usually including all of its registers, regardless of how they are used - needs to be saved, not to the stack, but to a process record, and the process's scheduler state needs to be updated to something like "paused", waiting", or "sleeping" (in many cases, it will have to wait on some long operation such as a disk read, but if it is a request that the kernel can service immediately it might be marked as something like "paused" to show that it was the last running process, instead). This includes the stack and stack frame, so if the system call doesn't save ESP, then the process will go off the rails once the system call returns, hence the problem you mention.

Octacone · Post by **Octacone** » Tue Nov 15, 2016 12:52 pm

iansjack wrote:A system call is just a request, from a user program, for the operating system to do some work for it. I prefer to use syscall and sysret, rather than call gates, to implement system calls.

I am confused. It says on the wiki that they are not CPU independent? Does this other way utilize GDT and TSS?

Lukand wrote:System calls are interrupts which programs use to communicate with kernel.
For example, if your syscall interrupt is 0x81, you could implement it in such way that each AX value has different function, and then if AX=17 then your kernel may output character stored in BX to terminal.
User mode is a processor ring which has much less instruction previleges with much more safety. Most(if not all) operating systems assign that ring for applications.
For example, if your processor is executing unprevileged/faulty code, it may only affect its protection level itself and higher ones.
The only problem with protection rings is that switching between them is really CPU intensive.

I get all that stuff. This is one implementation of it. Why are you mentioning AX/BX registers?

Ch4ozz wrote:Without usermode and syscalls your OS will run everything in kernelmode which is a pretty bad idea lol.
A syscall is simply a software interrupt which will jump from Ring 3 to Ring 0 into your interrupt handler.
You usually pass the number of syscall in eax and the rest of params in the other registers (or stack if register count is not enough)
Syscalls are needed to gather infos from the OS without making any compromises in terms of security.
In my OS a usermode process cant access any other memory except of his own process' memory, including other processes.
Paging without implementing usermode seems pretty useless to me

omarxx024 would disagree (he says that you don't need to be in user mode) , but I agree with you. Imagine somebody playing around with Basic OS and manually probing memory. Ouch!
Okay so lets just say that system call handler is a drug dealer, sort of. You OS is years in front of mine. Can I treat eax'es as switches like if eax=14534 then the function equals Print_String...?

BTW: Usermode (aka Ring 3) cant call special instructions like mov cr3, eax; cli, hlt; and so on
So usermode cant possibly alter the system in any way

What about them? I need cr3, eax, cli, hlt is there a replacement?

iansjack wrote:System calls do not have to be implemented as interrupts. And they aren't in, for example, 64-bit Linux.

Is there another way? You mean using macros?

iansjack · Post by **iansjack** » Tue Nov 15, 2016 1:35 pm

I explained, in my first post, that the best way (IMO) to implement system calls is via the syscall and sysret instructions. Saves a lot of messing about; and no context switch is necessary.

Octacone · Post by **Octacone** » Tue Nov 15, 2016 1:50 pm

iansjack wrote:I explained, in my first post, that the best way (IMO) to implement system calls is via the syscall and sysret instructions. Saves a lot of messing about; and no context switch is necessary.

Awesome, any code to take a look at? I can't find anything other than interrupt ones.

Roman · Post by **Roman** » Tue Nov 15, 2016 2:03 pm

http://wiki.osdev.org/SYSENTER

Octacone · Post by **Octacone** » Tue Nov 15, 2016 3:42 pm

Roman wrote:http://wiki.osdev.org/SYSENTER

This is very confusing.

Thanks for sharing anyways.

iansjack · Post by **iansjack** » Tue Nov 15, 2016 4:03 pm

Have you read about it in the Intel manuals? It really is a far simpler, and more efficient, way of implementing a system call.

rdos · Post by **rdos** » Tue Nov 22, 2016 2:51 am

Lukand wrote:System calls are interrupts which programs use to communicate with kernel.
For example, if your syscall interrupt is 0x81, you could implement it in such way that each AX value has different function, and then if AX=17 then your kernel may output character stored in BX to terminal.
User mode is a processor ring which has much less instruction previleges with much more safety. Most(if not all) operating systems assign that ring for applications.
For example, if your processor is executing unprevileged/faulty code, it may only affect its protection level itself and higher ones.
The only problem with protection rings is that switching between them is really CPU intensive.

They can be implemented in many different ways, like interrupts (that's the legacy way), syscall/sysenter or with call gates (only 32-bit). They can also be invalid instructions that fault and then are processed as syscalls in the kernels fault handler.

Which one is most efficient depends on processor. Because Windows and Linux started to use syscall/sysenter, these are now optimized in newer processors (like Intels Atom), and perform faster than simpler things like call-gates. However, from the perspective of efficiency and simplicity, there could be no faster way to get to a certain point in the kernel than call-gates, because they don't need secondary decoding in kernel, and doesn't use-up any additional registers. Still, because no major operating system use them, they perform worse on modern processors.

My OS can use several methods which will depend on operating mode and processor. I code them in user-space as invalid instructions, and when they are accessed by the program, they are either transformed to call-gates, or sysenter / syscall instructions by the general protection fault handler. Patching code is somewhat problematic with modern multicore processors, but at least I found a two step process that works on all modern processors in SMP configurations.

rdos · Post by **rdos** » Tue Nov 22, 2016 3:42 am

iansjack wrote:Have you read about it in the Intel manuals? It really is a far simpler, and more efficient, way of implementing a system call.

No, sysenter isn't more efficient on older processors (if you count the intermediate code to load registers in user space and decode functions in kernel space). It's not simpler either because it requires loading certain registers in kernel (MSRs), it requires assigning syscalls numbers (otherwise you cannot decode them from a single entry-point) and lastly, of course, these new things assume you are using a flat memory model in both kernel and user-space. Then, of course, every new syscalls needs to be assigned a unique number, it must be added to the decoder in kernel, and the decode has to know where the handler procedure is located, all which are highly unwanted. That's why people have to rebuild their Linux kernel all the time.

Also, in fact, sysenter is just a modern variant of the legacy interrupt cruft. The only difference is that it enters kernel with another instruction. The legacy stuff is still there.

Octocontrabass · Post by **Octocontrabass** » Tue Nov 22, 2016 10:48 am

rdos wrote:No, sysenter isn't more efficient on older processors (if you count the intermediate code to load registers in user space and decode functions in kernel space).

Have you done benchmarks? If that turns out to be true, you can detect those processors and use INT instead of SYSENTER.

rdos wrote:It's not simpler either because it requires loading certain registers in kernel (MSRs),

You only have to load them once.

rdos wrote:it requires assigning syscalls numbers (otherwise you cannot decode them from a single entry-point)

How are you identifying system calls in your OS?

rdos wrote:and lastly, of course, these new things assume you are using a flat memory model in both kernel and user-space.

It's a safe assumption, considering how much better it is compared to segmentation.

rdos wrote:Then, of course, every new syscalls needs to be assigned a unique number, it must be added to the decoder in kernel, and the decode has to know where the handler procedure is located, all which are highly unwanted.

How do you set up system calls without knowing where they are?

rdos wrote:That's why people have to rebuild their Linux kernel all the time.

I've never heard of anyone rebuilding Linux for new system calls, just bug fixes or new drivers.

OSDev.org

System Calls and User Mode

System Calls and User Mode

Re: System Calls and User Mode

Re: System Calls and User Mode

Re: System Calls and User Mode

Re: System Calls and User Mode

Re: System Calls and User Mode

Re: System Calls and User Mode

Re: System Calls and User Mode

Re: System Calls and User Mode

Re: System Calls and User Mode

Re: System Calls and User Mode

Re: System Calls and User Mode

Re: System Calls and User Mode

Re: System Calls and User Mode

Re: System Calls and User Mode