The inconveniences of sysenter/sysexit

CPLH · Post by **CPLH** » Mon Jan 12, 2009 8:35 pm

Recently when trying to implement sysenter and sysexit calls I found that they are quite restrictive and inconveniently written entirely for optimization. When you use sysenter, the cpu doesn't read or write anything to the stack. I guess this is an optimization so that the system doesn't do extra memory accesses..

As a result, returning to the original calling area becomes more difficult. Basically, sysexit returns to the address based on information written in registers ecx and edx... Unfortunately the kernel handler that answers the "sysenter" doesn't know where to return to, so it cannot place anything into ecx or edx without some external information. The CPU automatically changes SS and ESP so pushing anything into the stack before hand won't work.

The most efficient solution I can think of is setting the registers to the required values before hand. So:

Code: Select all

mov edx, next ;Where to return to.
mov ecx, esp ;Stack to set when we return.
sysenter
next:
...


sysenterHandler:
push edx
push ecx

; Handle system functions here...

pop ecx
pop edx
sysexit

Unfortunately with this method, even if you change the initial registers to something less popular, you will still be using up more registers than if using conventional software interrupts.
If you don't change the initial registers and you are careful with assembly you can get away without pushing and popping the ecx and edx.... I suggest not doing this however.
...A less optimized solution would be to store the registers in memory, and pass one register containing the memory location, and have the kernel read the registers from memory. ...but once again, we are trying not to use memory.

I have heard that the Linux people have figured out how to preserve their compatibility and still allow to use up all the registers as before.. I read it as being called something like an "ugly hack".... I couldn't find how they did it though. ...maybe their "ugly hack" had more to do with reverse compatibility rather than extending the amount of usable registers.

Does anybody know a better method of getting around this inconvenience?

Veniamin

Edit: I am not sure if this is the correct place to post. If it is not, can you please move it to the correct place?

Troy Martin · Post by **Troy Martin** » Mon Jan 12, 2009 8:42 pm

Push the return address onto the stack and use it to sysexit? You'd have to push ECX and EDX before the sysexit and pop them (after the sysexit is called) on the app side.

Or I could just be spewing random garbage ideas into this thread, but it's an idea (and probably a Bad Idea(tm) as well.)

JohnnyTheDon · Post by **JohnnyTheDon** » Mon Jan 12, 2009 9:07 pm

Push the return address onto the stack and use it to sysexit?

Nope. sysenter overwrites esp and doesn't save it. One way to do it would be to save as many parameters as possible in registers, and then push the rest on the stack, and send ESP and EIP using registers (Not ESP and EIP themselves, but general purpose registers).

However, I'm not sure about allowing user programs to ask for a return to any location they please. If the processor faults on the user side during a sysexit to an invalid address, you'll be fine. If it faults in the kernel, I guess you'll have to validate the return address.

xyzzy · Post by **xyzzy** » Tue Jan 13, 2009 2:03 am

What you've described is pretty much how I plan to do it when I get round to implementing SYSENTER/SYSEXIT support for my OS, and AFAIK it's what a lot of implementations do.

Love4Boobies · Post by **Love4Boobies** » Tue Jan 13, 2009 2:59 am

Troy Martin wrote:You'd have to push ECX and EDX before the sysexit and pop them (after the sysexit is called) on the app side.

His problem isn't preserving the two registers for the caller. Obviously since nothing is pushed to the stack, ECX and EDX contain the return address. Therefore, in order to properly use the two registers in the system call, they need to be saved somewhere (i.e. stack).

Combuster · Post by **Combuster** » Tue Jan 13, 2009 4:27 am

You can also tell the kernel to use a predefined return address and stackpointer. That means you always return to the same place with the same stack. Which of course you can hack around

Code: Select all

...
; set registers here
call sysenter
...

sysenter:
   MOV [tempstack], ESP
   SYSENTER

return_from_kernel:
   PUSH ESP
   RET

kernel side

Code: Select all

sysenter:
    CALL do_syscall
  
    MOV EDX, [stored_eip]
    MOV ECX, [stored_esp]
    SYSEXIT

bewing · Post by **bewing** » Tue Jan 13, 2009 5:57 am

Interesting method. In "return_from_kernel" don't you mean "POP ESP"?
(ie. to recover the "good" esp value from the tempstack?)

Brendan · Post by **Brendan** » Tue Jan 13, 2009 5:58 am

Hi,

For a new API, I'd reserve ECX and EDX for both SYSENTER and SYSCALL, and expect the caller can put the return EIP and return ESP into these registers before using SYSENTER.

Note: SYSCALL works differently to SYSENTER - the SYSCALL instruction automatically copies the return EIP and return ESP into ECX and EDX, so for SYSCALL you have no choice and can't use ECX or EDX for parameters.

Please note that (at least for my OS) it's important to have the exact same kernel API (same function numbers, same parameters in the same registers, etc) that can be used by any method (e.g. a software interrupt, a call gate, SYSENTER, SYSCALL, emulated SYSENTER or emulated SYSCALL). This means that you can't use ECX or EDX for software interrupts or call gates, because you can't use them for SYSCALL and you want the API to be the same.

For an old API (e.g. a kernel API that existed before SYSENTER and SYSCALL were introduced) where ECX and EDX already have established uses, you would need ugly hacks to support SYSENTER and/or SYSCALL properly (otherwise you risk breaking compatibility with existing software), but I doubt this matters for anyone here (unless they didn't do much research before designing their kernel API - in this case I'd suggest learning from your mistakes and trying again

).

JohnnyTheDon wrote:However, I'm not sure about allowing user programs to ask for a return to any location they please. If the processor faults on the user side during a sysexit to an invalid address, you'll be fine. If it faults in the kernel, I guess you'll have to validate the return address.

The caller could use "jmp dodgyAddress" or "mov esp, dodgyAddress" instead of supplying a dodgy return EIP/ESP address for SYSENTER, so your kernel needs to protect against this anyway and therefore won't need to validate the caller's return EIP/ESP if the kernel only uses these addresses for SYSEXIT. If the kernel does need to use the caller's return EIP or the caller's return ESP for anything, then it will need to validate these addresses first. Of course this means that for SYSENTER it's faster to pass parameters in registers instead (to avoid the need to validate the caller's return ESP).

Cheers,

Brendan

Brendan · Post by **Brendan** » Tue Jan 13, 2009 6:02 am

Hi,

Combuster wrote:You can also tell the kernel to use a predefined return address and stackpointer. That means you always return to the same place with the same stack. Which of course you can hack around

That can work too; unless you've got multiple threads sharing the same address space (where you'd need re-entrancy locks so that only one thread can use the kernel API at a time)...

Cheers,

Brendan

CPLH · Post by **CPLH** » Tue Jan 13, 2009 7:54 am

Let me remind everyone that pushing something into stack before using sysenter will not work because SS and ESP change automatically. Some people did not seem to see that.

Combuster, your method indeed frees up the two registers however slows down each and every function call with an extra reference to memory with "tempstack", and extra "call" and "ret" instructions... I do not know how significant or insignificant this addition is to the time it takes to call an instruction.. my guess is that it probably doesn't take much more run time.. especially when compared to my two pushes and pops.
Now that I think of it, the memory variable "tempstack" isn't much of a problem to handle ..this would have to be a variable located in some predefined spot in each process's data space. That way you can get around problems with multitasking.
...however when leaving you would have to reference the process's variable either from the kernel data segment, or the process's data segment. Although doing such a thing is relatively simple, its addition may take up a little bit more instructions, depending on how you wrote up your system.
Still, I like your method as an alternative.

JohnnyTheDon, I have thought about having the process returning to other places... I don't see anything wrong with it.. the kernel will only be able to return to a protection level 3 address.. it would be similar to the process simply jumping to those locations. The programmer can actually write up an optimization to jump somewhere once the syscall is done.

Brendan, wow.. you posted right before I wanted to post.

I didn't look into the details syscall functionality, as I heard it only works in 64 bits. If it automatically sets ecx and edx, then it is more convenient than sysenter as you don't have to mess around trying to figure out efficient methods to get ecx and edx to work. ....unfortunately this only works in 64 bits.

These companies keep on creating so many unfortunate trade offs..
Just in case you are wondering, I found that syscall and sysret are for 64-bits, in "IA-32 Intel Architecture Software Developer's Manual Volume 3A" page 4-30 section 4.8.8 it says:

The instructions, along with SYSENTER and SYSEXIT, are suited for IA-32e mode operation. SYSCALL and SYSRET, however, are not supported in compatibility mode.

"compatibility mode" is 32 bit mode on a machine that supports 64 bits.
Note however that SYSENTER and SYSEXIT support both 32 bits and 64 bits...

I hate providing reverse compatibility for anything.. it provides longer more complicated code.
What do you do, Brendan? You seem to know this subject quite well.

Veniamin

LMN · Post by **LMN** » Tue Jan 13, 2009 8:10 am

Does anybody know a better method of getting around this inconvenience?

Me not. Having less available registers at SYSENTER/SYSEXIT and SYSCALL/SYSRET is the price for faster privilege level transitions.For me its ok, especially since x86_64 gives you even more general purpose registers (not talking about all the others).

jal · Post by **jal** » Tue Jan 13, 2009 8:50 am

CPLH wrote:I hate providing reverse compatibility for anything.. it provides longer more complicated code.

Iirc, SYSENTER/EXIT is an Intel thing, and SYSCALL/SYSRET the AMD thing. Intel added the latter when copying AMD's IA64 stuff.

JAL

Brendan · Post by **Brendan** » Tue Jan 13, 2009 9:11 am

Hi,

CPLH wrote:I didn't look into the details syscall functionality, as I heard it only works in 64 bits. If it automatically sets ecx and edx, then it is more convenient than sysenter as you don't have to mess around trying to figure out efficient methods to get ecx and edx to work.

No - SYSCALL first existed on 32-bit AMD CPUs, and I don't think any 32-bit AMD CPU ever supported SYSENTER. 32-bit Intel CPUs had SYSENTER instead and didn't support SYSCALL.

Then AMD introduced 64-bit 80x86 CPUs, and (IIRC) in their early documentation stated that if a CPU supports long mode then software can assume it also supports other features, including PAE and SYSCALL and some other stuff (I really wish I could find this list again now).

This put Intel in an awkward situation - they wanted to sell Itanium, but had to produce a 64-bit 80x86 to maintain market share (I'd assume AMD really didn't like the idea of everyone shifting to Itanium, because they have patent agreements with Intel for 80x86 but don't have similar agreements with Intel for Itanium, so AMD probably couldn't compete if everyone shifted to Itanium). Anyway, Intel had to support SYSCALL to support long mode, because software developers expected SYSCALL to work in long mode.

However, for 32-bit code software developers didn't really use SYSCALL (but did use SYSENTER), so eventually AMD did introduce support for SYSENTER, but only for 32-bit code. This means that an AMD CPU running 64-bit code won't support SYSENTER, but the exact same CPU running 32-bit code (either in protected mode or in long mode) does support SYSENTER.

To make this more confusing there's only 1 feature flag returned by CPUID to tell the OS if SYSENTER is supported, and only 1 feature flag returned by CPUID to tell the OS if SYSCALL is supported - there's no easy way to tell which instructions are supported in which operating mode.

So, here's a summary!

AMD CPUs:

If CPUID says SYSCALL is supported, then SYSCALL is supported for 32-bit code, and if the CPU supports long mode SYSCALL is also supported in 64-bit code.
If CPUID says SYSENTER is supported, then SYSENTER is only supported for 32-bit code, and *not* supported in 64-bit code.

Intel CPUs:

If CPUID says SYSCALL is supported, then SYSCALL is supported for 32-bit code (but AFAIK there aren't any Intel CPUs that support 32-bit SYSCALL, so the feature flag in CPUID will always be clear). However, if the CPU supports long mode then SYSCALL is supported in 64-bit code (even though CPUID says it isn't supported).
If CPUID says SYSENTER is supported, then SYSENTER is supported for 32-bit code, and if the CPU supports long mode then SYSENTER is also supported in 64-bit code.

Also note that for my OS, boot code examines CPU features, etc and builds it's own set of feature flags in RAM (and does some other stuff - brand strings, errata, etc), and the rest of the OS never uses CPUID but uses the feature flags in RAM instead. This makes it easy to have a SYSCALL32 flag, a SYSCALL64 flag, a SYSENTER32 flag and SYSENTER64 flag. Note: I honestly wish there was "disable CPUID for CPL=3 code" flag in CR4 (like the "disable RDTSC for CPL=3 code" flag) so that I can force applications to use the kernel's standardized/unambiguous "CPU information" functions instead of the dodgy mess that CPUID has become, but I'm starting to get off-topic...

If the CPU doesn't support SYSCALL or SYSENTER, then any code that uses the unsupported instruction will generate an invalid opcode exception. Your invalid opcode exception handler can use the return EIP on the exception handler's stack to figure out which instruction was being used, and if it was SYSCALL or SYSENTER it can emulate the instruction. This is what I meant earlier by "emulated SYSCALL/SYSENTER".

Finally, for my OS, the kernel has a large table of function pointers for the 32-bit kernel API, and for 64-bit versions of the OS the kernel has a second large table of function pointers for the 64-bit kernel API. The 64-bit kernel API is only used by SYSCALL, and the SYSCALL code just does "call [kernel_API_table_64 + rax * 8]" before doing SYSRET. The 32-bit kernel API is used by a software interrupt, a call gate, SYSCALL, SYSENTER, and the invalid opcode handler (emulated SYSCALL and emulated SYSENTER); where all of these things do "call [kernel_API_table_32 + eax * 4]" before returning using whatever method is appropriate (IRETD, RETF, SYSRET, SYSEXIT or IRETD).

This means that for 32-bit code I can do something like:

Code: Select all

%macro CALL_KERNEL %1
    mov eax,%1
%ifdef USE_SYSENTER
    push ecx
    push edx
    mov ecx,esp
    mov edx,%%1
    sysenter
    pop edx
    pop ecx
%%1:
%elifdef USE_SYSCALL
    syscall
%elifdef USE_CALL_GATE
    call KERNEL_API_GATE:0x00000000
%else
    int KERNEL_API_TRAP
%endif
%endmacro

Code: Select all

    mov ebx, foo
    mov esi, bar
    CALL_KERNEL function_number

This means that regardless of which method is actually used (and regardless of whether SYSCALL/SYSENTER are used when the CPU doesn't support them), everything involved with the 32-bit kernel API behaves exactly the same...

I should also mention that (AFAIK) for most OSs (e.g. Windows, Linux) the application doesn't really use the kernel API directly - the application uses a shared libary or DLL, and the shared libary or DLL uses the kernel API. This allows a different library to be used to suit the situation (e.g. if SYSCALL is supported then use the library that uses SYSCALL, if SYSENTER is supported then use the library that uses SYSENTER, else use the library that uses the software interrupt). This is a valid way of doing things (if your OS supports shared libraries or DLLs and the potential for "dependency hell"), but at a minimum it also adds the cost of a near call/ret to each kernel API call.

Cheers,

Brendan

CPLH · Post by **CPLH** » Tue Jan 13, 2009 9:32 am

Wow! That makes sense!
Thanks for sharing.

What are the minimal requirements for your OS, Brendan?

Brendan · Post by **Brendan** » Tue Jan 13, 2009 10:10 am

Hi,

CPLH wrote:Wow! That makes sense!
Thanks for sharing.

What are the minimal requirements for your OS, Brendan?

Hehe - currently the main requirement is patience (I'm back doing boot loaders).

The requirements for the previous version (and the target requirements for the current version) are an 80486 with 4 MiB or more RAM (single-CPU only - Pentium or later required for SMP); plus either a serial port or a video card that supports at least one 8-bpp, 15-bpp, 16-bpp, 24-bpp or 32-bpp video mode.

A terminal (if you're using a serial port) or a keyboard (if you're using a video card) might also be needed for installing the OS (but they're not required for running the OS). Of course a computer without a terminal or a keyboard doesn't make much sense unless it does have networking, but I didn't get that far.

Cheers,

Brendan

OSDev.org

The inconveniences of sysenter/sysexit

The inconveniences of sysenter/sysexit

Re: The inconveniences of sysenter/sysexit

Re: The inconveniences of sysenter/sysexit

Re: The inconveniences of sysenter/sysexit

Re: The inconveniences of sysenter/sysexit

Re: The inconveniences of sysenter/sysexit

Re: The inconveniences of sysenter/sysexit

Re: The inconveniences of sysenter/sysexit

Re: The inconveniences of sysenter/sysexit

Re: The inconveniences of sysenter/sysexit

Re: The inconveniences of sysenter/sysexit

Re: The inconveniences of sysenter/sysexit

Re: The inconveniences of sysenter/sysexit

Re: The inconveniences of sysenter/sysexit

Re: The inconveniences of sysenter/sysexit