System Call

gaiety · Post by **gaiety** » Tue Mar 22, 2005 10:22 pm

For the OS, it will provide the system call like printf for the user application to printf the message. However, how to provide the system call?

Let say I have build the printf fucntion in the kernel file, the code will be like these.

Code: Select all

  void printf(char message, ...){
 ....
}

Then how we program the user application for the use of printf function. It will be not possible to write a new compiler,
so, how do you done it.

Thank you for answering my question.

Crazed123 · Post by **Crazed123** » Tue Mar 22, 2005 10:55 pm

There are several ways, some people use IPC and messages, others use software interrupts, some use call gates.

Brendan · Post by **Brendan** » Tue Mar 22, 2005 11:37 pm

Hi,

gaiety wrote:For the OS, it will provide the system call like printf for the user application to printf the message. However, how to provide the system call?

Let say I have build the printf fucntion in the kernel file, the code will be like these.
Code: Select all
  void printf(char message, ...){
 ....
}
Then how we program the user application for the use of printf function. It will be not possible to write a new compiler,
so, how do you done it.

Thank you for answering my question.

Most OS's use a calling mechanism that's capable of changing from the CPL=3 user code to CPL=0 kernel code (options are: software interrupts, call gates, SYSENTER, SYSEXIT and exception handlers). On top of the basic calling mechanism there's parameter passing (options are: use stack where caller cleans stack, use stack where called code cleans stack or registers). Then there's returned data (options are: return one value in a register or return multiple values in multiple registers) - either of these may additionally return and/or modify data pointed to by an address passed as an input parameter.

Using the stack to pass parameters is slower and more complicated because the CPL=3 user level code uses a different stack to the kernel's CPL=0 stack. This can mean parameters need to be transfered from one stack to another before use, or figuring out where the old stack was. It's usually better to use the CPU's registers (like my OS, Linux, DOS, BIOS, etc). This doesn't create many problems for C libraries because they tend to wrap kernel functions in library code anyway.

Now, a "printf()" type of function has further problems because it uses a variable number of input parameters, and therefore can't use registers. I'd be very tempted to only provide a simpler "print_string()" function in the kernel. For example, the C library's "printf" function would use "sprintf" to convert the data into a single string and then use the kernel's "print_string()" function to display the resulting string.

This is mostly how software adds data to the kernel's log in my OS - an "add_string_to_log(char *string)" function that passes the address of the string in a register and returns an error code in EAX. For normal video output (e.g. "STDOUT" in C) the kernel isn't really involved because the OS is modular.

As for the actual system call I have a call table that's used by several different machanisms. The kernel function number is passed in EAX, while other parameters are passed in EBX, ESI and EDI. Data is also returned in EBX, ESI, and EDI (ECX and EDX aren't used for input or output due to the way SYSCALL and SYSENTER work). For e.g.:

Code: Select all

kernelAPItable:
   dd allocPage        ;0x00000000
   dd freePage         ;0x00000001
   dd addStringToLog   ;0x00000002
   dd spawnThread      ;0x00000003

%define MAXAPIfunction  3

kernelAPIinterrupt:
    cmp eax,MAXAPIfunction
    ja .l1
    call [kernelAPItable + eax * 4]
    iretd
.l1:
    mov eax,errFunctionNotDefined
    iretd


kernelAPIcallGate:
    cmp eax,MAXAPIfunction
    ja .l1
    call [kernelAPItable + eax * 4]
    retf
.l1:
    mov eax,errFunctionNotDefined
    retf

This means code can do something like:

Code: Select all

    mov ebx,linear_address
    mov eax,0
    int 0x20             ;Allocate a page at EBX
    test eax,eax         ;Was there an error?
    je .error            ; yes
                        ; no, new page allocate at EBX
    mov eax,1
    call 0x30:0x00000000 ;Free a page at EBX
    test eax,eax         ;Was there an error?
    je .error            ; yes

Of course this is hidden by macro's in assembly (e.g. "API_INT 0x00000001", "API_CALL 0x00000001") and library functions in C.

Cheers,

Brendan

slash · Post by **slash** » Wed Mar 23, 2005 8:39 am

Consider that you have converted the printf args into a simple string then simplest way of calling it is that you define it some number like :

#define printf 55

Then push the no. args of printf in this case 1,then its args on the kernel stack then its number in this case "55" and trigger a syscall interrupt that you have registered with PIC(Programmable Interrupt Controller).The interrupt handler should look for the no# on top of the stack and jump to the function that is defined as that no#.

Pype.Clicker · Post by **Pype.Clicker** » Wed Mar 23, 2005 8:49 am

Consider that you have converted the printf args into a simple string then simplest way of calling it is that you define it some number like :

#define printf 55

AaaaaAAaaaaaAaaaaaarrRrrRgh !

*please*
#define SYSCALL_PRINTF 55

Then push the no. args of printf in this case 1,then its args on the kernel stack then its number in this case "55" and trigger a syscall interrupt that you have registered with PIC(Programmable Interrupt Controller).

The PIC has *nothing* to do with system calls. It only supports for hardware interrupts.

The interrupt handler should look for the no# on top of the stack and jump to the function that is defined as that no#.

That's what said above (any chance you read what brendan said?), except that you seem to ignore the handler will be running on its *own* stack, thus it'll have to do pointer juggling to retrieve the arguments on the user-process's stack (that *can* be achieved. that's just more convenient to use registers, most of the time)

gaiety · Post by **gaiety** » Thu Mar 24, 2005 9:10 pm

I think I understand 80% of your idea.

(ECX and EDX aren't used for input or output due to the way SYSCALL and SYSENTER work).

Can you explain a little bit about the SYSCALL, SYSENTER, and SYSEXIT? And how ECX and EDX will be use. If you can, maybe a little code will be help much more.

I still got some question but will be ask in next few day as I need to think about of how the stack work when calling system call. I say like these, when my system call the print_string, let say the code will like these.

Code: Select all

in my user application

%define systemcall 0x00000001
call systemcall

this will use push a stack of ip into user stack, then it jump to kernel code

Code: Select all

systemcall:
      load kernel stack and segmenet
      .....      use of the stack of kernel file
      load the user application stack and segment
      return

Do every body do with saving the user stack and setup the kernel stack. Then it will continue using the kernel stack. After finishing, load back the user application stack and return.

It I do like these, the system call will run completely or maybe I missied out something.

Thank you for answering my question.

AR · Post by AR » Fri Mar 25, 2005 12:34 am

No, you cannot call the kernel code directly, you can use a call gate and a far call if you want though. A typical (low-end without SYSENTER/SYSRETURN or SYSCALL/SYSEXIT) looks like:

Code: Select all

mov EAX, syscallnumber
mov EBX, param1
mov ECX, param2
mov EDX, param3
mov ESI, param4
mov EDI, param5
int 80h      ;Trigger interrupt to trap to the kernel

Interrupt 0x80 has to be in your IDT as a ring 3 interrupt for this to work, the program will enter the kernel via the IDT entry (which will load the stack and CS from the descriptor and TSS). [Note: The use of each register is up to you, the CPU doesn't care but you have to convert the register contents in the Kernel to figure out what call was requested]

SYSENTER/SYSRETURN are faster than interrupts and AMD's SYSCALL/SYSEXIT are even faster than SYSENTER but is only supported on AMD CPUs (or AMD64 CPUs which should include Intel's IA-64). The call code looks similar except for the call being SYSENTER instead of "int 80h", to set these up you need to program the MSRs in the CPU though (Model Specific Registers), these instructions also require a flat memory model, although that is rarely a concern since most people use a flat model anyway.

Colonel Kernel · Post by **Colonel Kernel** » Fri Mar 25, 2005 12:58 am

AR wrote:SYSENTER/SYSRETURN are faster than interrupts and AMD's SYSCALL/SYSEXIT are even faster than SYSENTER but is only supported on AMD CPUs (or AMD64 CPUs which should include Intel's IA-64).

You mean EM64T, right? IA-64 is a completely different beast altogether...

AR · Post by AR » Fri Mar 25, 2005 1:03 am

yeah, I'm not up on Intel's naming conventions, I always use AMD CPUs if possible.

Candy · Post by **Candy** » Fri Mar 25, 2005 4:06 am

actually you messed up the combo's. It's syscall/sysret or sysenter/sysexit.

The difference being that syscall never switch stacks and that sysenter always switches stacks. IMO, if you use syscall you can almost use a plain call for compiling and then backpatch them or something.

Brendan · Post by **Brendan** » Fri Mar 25, 2005 4:13 pm

Hi,

AR wrote:SYSENTER/SYSRETURN are faster than interrupts and AMD's SYSCALL/SYSEXIT are even faster than SYSENTER but is only supported on AMD CPUs (or AMD64 CPUs which should include Intel's IA-64). The call code looks similar except for the call being SYSENTER instead of "int 80h", to set these up you need to program the MSRs in the CPU though (Model Specific Registers), these instructions also require a flat memory model, although that is rarely a concern since most people use a flat model anyway.

I'm largely unconvinced here - the actual SYSENTER/SYSEXIT and SYSCALL/SYSRET instructions probably are faster by themselves, but they don't do things "properly" so extra cycles are needed that might actually make them slower than the alternative software interrupt or call gate methods.

To illustrate, here's the "kernel system call" macros I'm using:

Code: Select all

%macro APICALL 1
   mov eax,%1
   call SELECTORAPI:0
%endmacro

%macro APIINT 1
   mov eax,%1
   int 0x20
%endmacro

%macro APISYSENTER 1
   CPU 686
   push edx
   push ecx
   mov edx,%%l1
   mov ecx,esp
   sysenter
%%l1:   CPU 486
   pop ecx
   pop edx
%endmacro

%macro APISYSCALL 1
   CPU 686
   push ecx
   mov eax,%1
   syscall
   CPU 486
   pop ecx
%endmacro

Then on the kernel side I've got the following code:

Code: Select all

processAPIsoftInt:
   cmp eax,0x200            ;Is function number out of range?
   jae .badFunction         ; yes, error
   call [APItable+eax*4]
   iretd
.badFunction:
   mov eax,errUndefined
   iretd

processAPIcallGate:
   cmp eax,0x200            ;Is function number out of range?
   jae .badFunction         ; yes, error
   call [APItable+eax*4]
   retf
.badFunction:
   mov eax,errUndefined
   retf

processAPIsysEnter:
   CPU 686
   cmp eax,0x200            ;Is function number out of range?
   jae .badFunction         ; yes, error
   call [APItable+eax*4]
   sysexit
.badFunction:
   mov eax,errUndefined
   sysexit
   CPU 486

processAPIsysCall:
   CPU 686
   cmp eax,0x200            ;Is function number out of range?
   jae .badFunction         ; yes, error
   push ebp
   mov ebp,esp
   mov esp,[gs:CPULISTentryStruct.TSSesp0]
   sti
   call [APItable+eax*4]
   cli
   mov esp,ebp
   pop ebp
   sysret
.badFunction:
   mov eax,errUndefined
   sysret
   CPU 486

As you can see the amount of extra code required increases depending on how "fast" the basic system call instruction is meant to be..

Note: I haven't bothered testing this SYSCALL or SYSENTER code yet, or testing how quick each method is on each CPU. I have compared software interrupts and call gates to find that call gates are a bit faster.

In general I use software interrupts when the code isn't used often and call gates where the code is used often (this is because I'd rather use 2 bytes of code rather than 7 unless performance matters more).

The SYSENTER and SYSCALL instructions will be emulated on computers that don't support them (code to handle them built into the "undefined opcode" exception handler). This means they'll always work, but may be slow.

The idea here is that when code is compiled for a specific CPU the compiler will be able to generate the fastest code for that CPU (but the code will still work for all CPUs).

Cheers,

Brendan

AR · Post by AR » Fri Mar 25, 2005 9:01 pm

Brendan wrote:The SYSENTER and SYSCALL instructions will be emulated on computers that don't support them (code to handle them built into the "undefined opcode" exception handler). This means they'll always work, but may be slow.

The idea here is that when code is compiled for a specific CPU the compiler will be able to generate the fastest code for that CPU (but the code will still work for all CPUs).

On Windows, SYSENTER/SYSCALL/int 20h is decided by the Kernel at boot up, it then stores the opcode on a page of memory mapped in every address space which system DLLs use (ie. Kernel32.dll/ntdll.dll) to call the kernel. This method means that apps don't need to care about the CPU capabilities but still benefit from them.

The kernel entry/exit code may get longer but you would still need to benchmark it. I think I read the clocks for each were something like 20 for int, 8 for SYSENTER and 5 for SYSCALL.

gaiety · Post by **gaiety** » Sat Mar 26, 2005 3:08 am

I am ready a newbie, I don't know how to see and link the code of Brendan

Code: Select all

%macro APICALL 1
   mov eax,%1
   call SELECTORAPI:0
%endmacro

%macro APIINT 1
   mov eax,%1
   int 0x20
%endmacro

%macro APISYSENTER 1
   CPU 686
   push edx
   push ecx
   mov edx,%%l1
   mov ecx,esp
   sysenter
%%l1:   CPU 486
   pop ecx
   pop edx
%endmacro

%macro APISYSCALL 1
   CPU 686
   push ecx
   mov eax,%1
   syscall
   CPU 486
   pop ecx
%endmacro

so we will call the APICALL with the code

Code: Select all

APICALL 0x01

But I don't know how to use the

Code: Select all

%macro APISYSCALL 1

as where will we use to call this function(APISYSCALL 1), besides, we have a

Code: Select all

syscall

in the marco, so we need to setup the syscall so that it will jump to here

Code: Select all

processAPIcallGate:
   cmp eax,0x200            ;Is function number out of range?
   jae .badFunction         ; yes, error
   call [APItable+eax*4]
   retf
.badFunction:
   mov eax,errUndefined
   retf

or here

Code: Select all

processAPIsysEnter:
   CPU 686
   cmp eax,0x200            ;Is function number out of range?
   jae .badFunction         ; yes, error
   call [APItable+eax*4]
   sysexit
.badFunction:
   mov eax,errUndefined
   sysexit
   CPU 486

I am comfuse now. Unable to see how the code work.

Thank you for answering my question.

AR · Post by AR » Sat Mar 26, 2005 4:08 am

Ok, take everything I say as IIRC as I haven't implemented anything beyond a basic system call interrupt.

Brendan's APICALL is a call gate (Which is an entry in the GDT/LDT). APISYSCALL executes the AMD SYSCALL instruction to do the same thing as APICALL, except it should (theoretically) be faster. SYSCALL is an instruction, it gets the value for CS/EIP (it may or may not transfer the stack as well) out of the CPU's MSRs (Model Specific Registers).

The GDT Call gate will contain a CS/EIP for processAPIcallGate. The SYSENTER's MSRs (SYSENTER does the same thing as SYSCALL but uses different MSRs, so it would be possible to have them both side by side but I can't really see a benefit unless the OS is AMD only with only 2 system calls) will contain the CS/EIP for processAPIsysEnter.

Brendan · Post by **Brendan** » Sat Mar 26, 2005 2:53 pm

Hi,

gaiety wrote: I am ready a newbie, I don't know how to see and link the code of Brendan

The OS has "function numbers", not unlike DOS's "int 0x21" or Linux's "int 0x80". The function number is always passed in the EAX register. There's 4 different "system call mechanisms" supported by the kernel, each with different size/speed tradeoffs.

Each of the 4 different mechanisms use the same call table inside the kernel, called "APItable". This table is created like this:

Code: Select all

APItable:
   dd function0x000
   dd function0x001
   dd function0x002
   dd function0x003
   ..etc

A single bit of software could call the same kernel function 4 times using a different mechanism/macro each time. For example:

Code: Select all

   APIINT 0x0123
   APICALL 0x0123
   APISYSENTER 0x0123
   APISYSCALL 0x0123

For the software interrupt, a programmer would write "APIINT 0x0123" to use kernel function number 0x0123. This macro gets expanded into "mov eax,0x0123" followed by "int 0x20". The software interrupt is handled by code in the kernel that looks like:

Code: Select all

processAPIsoftInt:
   cmp eax,0x200            ;Is function number out of range?
   jae .badFunction         ; yes, error
   call [APItable+eax*4]
   iretd
.badFunction:
   mov eax,errUndefined
   iretd

Basically, if the function number is too high for the table it returns an error. Otherwise it does the "call [APItable + eax*4]" to call the correct routine for the function number.

The "call gate mechanism" is almost identical, except it uses a call gate instead of a software interrupt. While they might look complicated, the "syscall" and "sysenter" mechanisms also work the same.

Unfortunately, the SYSCALL and SYSENTER instructions skip some important steps (like storing a return address), so they need extra code to make up for the stuff they skip. That's why they look complicated (and why I'm not convinced they are faster once you add the overhead of the extra code they need).

For example, if software used "APISYSENTER 0x0123" (instead of "APIINT 0x0123") then the macro would expand to:

Code: Select all

   push edx
   push ecx
   mov eax,0x0123
   mov edx,.here
   mov ecx,esp
   sysenter
.here:
   pop ecx
   pop edx

This is because the SYSENTER instruction doesn't store any return address or return ESP, and it would be impossible to return to the caller properly without them (the SYSEXIT instruction expects them to be in EDX and ECX, but the SYSENTER instruction doesn't put them in these registers).

The SYSCALL and SYSRET instructions are worse because it doesn't switch to the kernel's stack (which complicates memory management code), it disables interrupts (where my kernel is fully interruptable and re-entrant) and it also messes up CPL=3 segment limits (I haven't found a good work-around for this yet).

Anyway, the idea is that software can detect what is supported and use the fastest method where practical (and/or software can be compiled specifically for a specific CPU). For example, if you're optimizing you might find that the smaller software interrupt method actually works faster on a specific CPU because it's not thrashing the CPU's trace cache, or that the call gate method is faster than sysenter and/or syscall because of the extra instructions sysenter/syscall need, or that sysenter and/or syscall actually is faster. Of course for initialization code that is only ever run once, I'd recommend the software interrupt method anyway (smaller code where performance is irrelevant).

There was also a bug in the code I posted - I missed an instruction in the macro for SYSENTER. It should've been:

Code: Select all

%macro APISYSENTER 1
  CPU 686
  push edx
  push ecx
  mov eax,%1
  mov edx,%%l1
  mov ecx,esp
  sysenter
%%l1:  CPU 486
  pop ecx
  pop edx
%endmacro

Hope this makes it a little clearer...

This is only one possible way to do things though - you could design your OS to use newer AMD chips only (ie. SYSCALL/SYSRET only, which would make it much faster), or use software interrupts only (e.g. DOS), or do what Windows does (software does near call to a fixed address that contains the actual system call instructions followed by a near return).

The problem with Window's method is that the near call is going to cost more than you'd gain - 4 cycles for the near call plus 5 cycles for syscall (assuming you can avoid the extra baggage, which I doubt) plus 4 cycles for the near return adds up to about the same as a call gate anyway [note: I don't know exact cycle counts, this is just an estimate].

Cheers,

Brendan

OSDev.org

System Call

System Call

Re:System Call

Re:System Call

Re:System Call

Re:System Call

Re:System Call

Re:System Call

Re:System Call

Re:System Call

Re:System Call

Re:System Call

Re:System Call

Re:System Call

Re:System Call

Re:System Call