inline system calls?

Discussions on more advanced topics such as monolithic vs micro-kernels, transactional memory models, and paging vs segmentation should go here. Use this forum to expand and improve the wiki!
mariuszp
Member
Member
Posts: 587
Joined: Sat Oct 16, 2010 3:38 pm

inline system calls?

Post by mariuszp »

I was thinking about making my libc system call wrappers into inline functions, so that isntead of making a call, the compiler could just emit the "syscall" instruction directly, which may also allow the compiler to optimise out the move from %rcx to %r10.

Hwoever, other operating systems don't do this; even Linux, which has all system call numbers, and the calling method, standardized and kept compatible.

The point of this is to boost performance. Are there any advantages/disadvantages I may have missed?
User avatar
bluemoon
Member
Member
Posts: 1761
Joined: Wed Dec 01, 2010 3:41 am
Location: Hong Kong

Re: inline system calls?

Post by bluemoon »

Those OS support multiple way to invoke sys call (INT, sysenter, syscall, even non-x86 way), the actual glue code is decided in install or boot time. Yes you might patch the libc inline but doing dynamic link is much safer on this.
LtG
Member
Member
Posts: 384
Joined: Thu Aug 13, 2015 4:57 pm

Re: inline system calls?

Post by LtG »

bluemoon wrote:dynamic link is much safer on this.
In what way? Or do you mean more future-proof?

Personally I haven't yet fully decided but current plan is to not conflate x86_32 and x86_64 into the same ELF format, but rather have different arch's as different. As such, the SYSCALL instruction is guaranteed to exist and can be used directly (inlined).

However if you want to be able to compile C code into binaries and have those binaries be usable on every x86 arch then you don't know if SYSCALL/SYSENTER is available, only INT is guaranteed at this point (though theoretically even that could change in future). So if you inline SYSCALL into binaries (instead of system local DLL) it wouldn't work on intel x86_32.

Note also that intuitively having SYSCALL inlined would have better performance but with these "smart" CPUs you never know, so always benchmark. Though in this case I can't think of any reason why inlined SYSCALL would be slower so I'd just do it..
User avatar
xenos
Member
Member
Posts: 1118
Joined: Thu Aug 11, 2005 11:00 pm
Libera.chat IRC: xenos1984
Location: Tartu, Estonia
Contact:

Re: inline system calls?

Post by xenos »

I think the difference is rather small:
  • If you inline syscalls, you have to decide at compile time which mechanism you use (software interrupt, sysenter/sysexit, syscall/sysret, ...). Kernel and C library must be compiled with the same convention, so that they fit together.
  • If you have the syscalls as separate functions in some stub, that you keep together with the kernel, then the C library doesn't know anything about the mechanism. Only the kernel and the syscall stub must fit to each other and compiled with the same convention. You can change the mechanism, recompile the kernel and the syscall stub, and don't need to recompile the C library.
But remember, even using the second approach, there is still a point where you have to decide on a common function calling convention that you use when you call the functions in the stub (such as the System V ABI), and if you want to change this, you have to recompile the C library, too. But normally this is something you fix once and for all, while there might be different syscall mechanisms you want to support depending on the hardware capabilities (so that you can run the same kernel on older and newer hardware, implement different mechanisms in the kernel, and don't have to change the C library, but only the stub, which then decides which mechanism to use).
Programmers' Hardware Database // GitHub user: xenos1984; OS project: NOS
mariuszp
Member
Member
Posts: 587
Joined: Sat Oct 16, 2010 3:38 pm

Re: inline system calls?

Post by mariuszp »

I am only targetting x86_64 - syscall is guaranteed to exist.

Furthermore, the problems of different archs could be solved by having macros in the C library which decide which inline asm to use.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: inline system calls?

Post by Brendan »

Hi,
mariuszp wrote:I am only targetting x86_64 - syscall is guaranteed to exist.
Imagine if:
  • Intel create a CPU in 5 years time that has a minor flaw in the way it implements SYSCALL and you want to avoid that flaw
  • Intel create a new faster method (or already has an old method they make faster - e.g. SYSENTER) and you want to upgrade to that and don't want to use SYSCALL on some CPUs
  • you notice that using 4 call gates for 4 frequently used functions is faster for your OS (because it avoids a "switch(functionNumber)" and some other mess)
  • you want processes to use a different library sometimes (e.g. when debugging or profiling, maybe you want the process being debugged/profiled to use a special library that supports "syscall tracing/logging")
  • you have a "doThing(foo)" kernel function and want to change it to "doThing(foo, bar)" and want existing code to use a default value for "bar"
  • you have a "getTimestamp()" kernel function and (for some CPUs and not others, depending on how the CPU implements TSC) you want to shift the entire function into the user-space library
mariuszp wrote:Furthermore, the problems of different archs could be solved by having macros in the C library which decide which inline asm to use.
This requires some form of "compile before use". For C, that probably means that you have to force all software developers to provide source code (which means you can mostly forget about any commercial software, and forget about normal users because "compile before use" is too painful/annoying/fragile for languages like C). Of course there are other forms of "compile before use", like having software compiled into some kind of byte-code by the developer and then compiling that byte-code to native either ahead of time (e.g. when the end user installs the software) or while it's being executed (some sort of JIT). I'm not sure that C can be used like that (pre-processing can ruin portability before the compiler does anything). Of course there are other languages that don't have that problem (Java, C#, ...).

Note: I don't know what your project's goals are, and don't know if any of the things I mentioned will matter for your project.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
mariuszp
Member
Member
Posts: 587
Joined: Sat Oct 16, 2010 3:38 pm

Re: inline system calls?

Post by mariuszp »

Hmmm, i see that perhaps it is indeed better to have the system call method changeable in libc by having the functions implemented in the shared library.

However, what you said does not require any "compile before use". Clearly, every application must be recompiled for each arch anyway, and for each arch the C header would say something like:

Code: Select all

#ifdef __X86_64__
#  define __SYSCALL1(par) {/* x86_64 implementation of syscall */}
#elif __ARM__
#  define __SYSCALL1(par) {/* ARM implementation of syscall */}
#elif ...
/* ... */
#endif
All code compiled onto a specific arch would use a specific syscall convention, but there aren't any more recompilations necessary than those that are required anyway (i.e. one compile per arch).
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: inline system calls?

Post by Brendan »

Hi,
mariuszp wrote:Hmmm, i see that perhaps it is indeed better to have the system call method changeable in libc by having the functions implemented in the shared library.

However, what you said does not require any "compile before use". Clearly, every application must be recompiled for each arch anyway, and for each arch the C header would say something like:

Code: Select all

#ifdef __X86_64__
#  define __SYSCALL1(par) {/* x86_64 implementation of syscall */}
#elif __ARM__
#  define __SYSCALL1(par) {/* ARM implementation of syscall */}
#elif ...
/* ... */
#endif
All code compiled onto a specific arch would use a specific syscall convention, but there aren't any more recompilations necessary than those that are required anyway (i.e. one compile per arch).
I think you mean something more like:

Code: Select all

#ifndef DEBUGGING
# ifndef PROFILING
#  ifdef __X86_64__
#   ifdef KERNEL_API_VERSION_0_1
#    ifdef CPU_VENDOR_INTEL
#     ifndef CPUFAMILY123     // This one is buggy
#      ifndef SYSENTER_IS_FASTER
#       ifndef ENABLE_CALLGATE_FOR_FOO_AND_BAR
#        ifndef OMG_WE_FORGOT_ABOUT_THIS
After 10 years this header file is going to grow to about 20 MiB just so that new applications work on old kernels and old CPUs, and (for "closed source") developers are going to have 1234 different executables to handle all the permutations for 80x86 alone (different CPU family/models, different kernel versions, different other options). Older applications that aren't recompiled will probably just become buggy messes that blow up in the user's face without warning. ;)


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
zaval
Member
Member
Posts: 656
Joined: Fri Feb 17, 2017 4:01 pm
Location: Ukraine, Bachmut
Contact:

Re: inline system calls?

Post by zaval »

mariuszp wrote: I was thinking about making my libc system call wrappers into inline functions, so that isntead of making a call, the compiler could just emit the "syscall" instruction directly, which may also allow the compiler to optimise out the move from %rcx to %r10.

Hwoever, other operating systems don't do this; even Linux, which has all system call numbers, and the calling method, standardized and kept compatible.

The point of this is to boost performance. Are there any advantages/disadvantages I may have missed?
But system service functions (syscalls) are not only bare sysenter instructions. you might want to do a lot of internal processing before passing it downwards, maybe even totally avoiding making a syscall, parameter checking etc. do you want to inline this all?
and how are you going to be "Posix" compliant this way? how for example the fork() implementation would look like? Macros? it sounds like static linking of the system services into applications. you'll end up with multiple code duplication in your apps' binaries, instead of having system supplied dynamic linking libraries. this is neglecting all the complexities already touched by previous posters and probably a lot of not mentioned. and that all gives you a negligible optimisation of removing a few calls along the code path.
ANT - NT-like OS for x64 and arm64.
efify - UEFI for a couple of boards (mips and arm). suspended due to lost of all the target park boards (russians destroyed our town).
LtG
Member
Member
Posts: 384
Joined: Thu Aug 13, 2015 4:57 pm

Re: inline system calls?

Post by LtG »

Brendan wrote:Intel create a CPU in 5 years time that has a minor flaw in the way it implements SYSCALL and you want to avoid that flaw
Intel creating a CPU with minor flag in CALL is about as likely. Difference is SYSCALL can be disabled and then you can presumably (haven't tested) catch the #UD. But is it really good practice to prepare for future bugs?

I thought about replying to the rest but it just seemed like splitting hairs.. So instead I thought I'd mention that I'm thinking about using byte-code ultimately and as such this is (for me) mostly a moot point. Which brings me to my actual point, it really depends on the circumstances, and that I wouldn't worry about this low level details for now at all (and personally haven't code wise, though I've thought about them), so use which ever you prefer and is easier to implement. With DLL you might save minimal re-compilation time (a lot if you keep changing your syscall lib, but I'd expect that to be rare even during development).
forget about normal users because "compile before use" is too painful/annoying/fragile for languages like C).
Nonsense, it's a tooling problem, nothing more. Since it's your OS you can fix that (not that it's OS specific to begin with), if your OS toolset works perfectly then everyone else has to follow suit (or cease to exist).
I'm not sure that C can be used like that (pre-processing can ruin portability before the compiler does anything). Of course there are other languages that don't have that problem (Java, C#, ...).
Language doesn't have that kind of effect, it's compiler/tooling problem. A language is just a language and doesn't have impact on that.

Note, I'm not a fan of C or C++, but do use both, and both are pretty well tooled for osdev though both have massive issues for me as well..

PS. As Brendan also said, it all depends on your goals. The key is to think short, mid and long term at the same time. For me, even if I end up not using byte-code it won't matter, I have alternative solutions, so I'm going with what's easier and has better performance for now, which is to "hard code" the syscalls into applications. Some of the points raised for choosing DLL instead are valid and can make life easier if you make the same mistakes (which are easy to make) as Linux or Windows (ELF and PE).. Basically, if you conflate things you are gonna get screwed, then it's better to stay with dynamic solutions, it works as a lubricant =)
User avatar
bluemoon
Member
Member
Posts: 1761
Joined: Wed Dec 01, 2010 3:41 am
Location: Hong Kong

Re: inline system calls?

Post by bluemoon »

LtG wrote:
bluemoon wrote:dynamic link is much safer on this.
In what way? Or do you mean more future-proof?
So as there are multiple mechanism to invoke kernel call, you can't inline it at compile time. It can however be done in link time, AFAIK there are three approaches:
1. tradition dynamic linking. you provide a "glue library" which do the actual kernel call. Simple.

2. Address patch. In libc(or any program) they will call a kernel_call() function which is resolved(linked) in load-time. the actual "CALL kernel_call" instruction is in a form of R_X86_64_JMP_SLOT that is visible on the ELF header. (similar things apply on other format). Upon load time linking, you just replace the JMP instruction with whatever you want to do. Note that you want to make sure there is enough room for the patch.

3. inline/in-place patch. Similar to (2), but you indeed patch the caller. This might be done by providing a "slow" and a "fast" kernel call path, and everyone calling the "slow" path will trigger a patch to the caller by look up the return address stack.

IIRC Windows use (2) to patch DLL. (3) might seems fastest but run-time self-modifying code is a no-no for many people.
Korona
Member
Member
Posts: 1000
Joined: Thu May 17, 2007 1:27 pm
Contact:

Re: inline system calls?

Post by Korona »

bluemoon wrote:2. Address patch. In libc(or any program) they will call a kernel_call() function which is resolved(linked) in load-time. the actual "CALL kernel_call" instruction is in a form of R_X86_64_JMP_SLOT that is visible on the ELF header. (similar things apply on other format). Upon load time linking, you just replace the JMP instruction with whatever you want to do. Note that you want to make sure there is enough room for the patch.
The GCC toolchain supports __atribute__(( "ifunc"("resolver_function") )) which accomplishes this in a portable manner.

(Btw, JMP_SLOT is not placed in the .text segment; it is used for PLT entries.)
managarm: Microkernel-based OS capable of running a Wayland desktop (Discord: https://discord.gg/7WB6Ur3). My OS-dev projects: [mlibc: Portable C library for managarm, qword, Linux, Sigma, ...] [LAI: AML interpreter] [xbstrap: Build system for OS distributions].
Boris
Member
Member
Posts: 145
Joined: Sat Nov 07, 2015 3:12 pm

Re: inline system calls?

Post by Boris »

Or you can just create big llvm IL lto blobs for your syscall library+libc and let software creators do the inlining themselves, for each cpu family.

Even if it means compiling a simple hello world in one hour.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: inline system calls?

Post by Brendan »

Hi,
LtG wrote:
Brendan wrote:Intel create a CPU in 5 years time that has a minor flaw in the way it implements SYSCALL and you want to avoid that flaw
Intel creating a CPU with minor flag in CALL is about as likely.
The CALL instruction is relatively simple. SYSCALL is more complex and has multiple edge cases, where it might be fine most of the time but on rare occasions several condition occur at the same time and you end up with something like this or this.
LtG wrote:Difference is SYSCALL can be disabled and then you can presumably (haven't tested) catch the #UD.
Yes, but that'd give far worse performance.
LtG wrote:But is it really good practice to prepare for future bugs?
For software that hopes to be around for a while; it's good practice to design it with some flexibility, so that if things change in future you have a way to deal with those changes (that doesn't break backward compatibility). It doesn't matter if things change because you want to add new features to your kernel, or if you discover a way to optimise something, or if you need to fix a bug in your software, or if Intel added something better to the CPU, or if you need to work around a bug in a CPU.
LtG wrote:I thought about replying to the rest but it just seemed like splitting hairs.. So instead I thought I'd mention that I'm thinking about using byte-code ultimately and as such this is (for me) mostly a moot point. Which brings me to my actual point, it really depends on the circumstances, and that I wouldn't worry about this low level details for now at all (and personally haven't code wise, though I've thought about them), so use which ever you prefer and is easier to implement. With DLL you might save minimal re-compilation time (a lot if you keep changing your syscall lib, but I'd expect that to be rare even during development).
forget about normal users because "compile before use" is too painful/annoying/fragile for languages like C).
Nonsense, it's a tooling problem, nothing more. Since it's your OS you can fix that (not that it's OS specific to begin with), if your OS toolset works perfectly then everyone else has to follow suit (or cease to exist).
In theory it's possible to drive half way across a bridge; but in practice nobody ever does - they either don't drive across the bridge at all, or they drive all the way to the other side.

In theory it's possible to create tools that fix all the "compile before use is too painful/annoying/fragile for C" problems; but in practice nobody ever will - they either won't be willing to write their own tools at all, or they'll go all the way and abandon C itself (e.g. create their own language).
LtG wrote:
I'm not sure that C can be used like that (pre-processing can ruin portability before the compiler does anything). Of course there are other languages that don't have that problem (Java, C#, ...).
Language doesn't have that kind of effect, it's compiler/tooling problem. A language is just a language and doesn't have impact on that.
No; "pre-processing to work around portability problems" is a language problem with multiple causes (implementation defined behaviours, lack of standardisation for things like networking and GUI, poor primitive types, multiple versions of the language itself, etc). You can invent a "C like" language that doesn't need a pre-processor, but that language will not be C anymore.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: inline system calls?

Post by Brendan »

Hi,
bluemoon wrote:So as there are multiple mechanism to invoke kernel call, you can't inline it at compile time.
Korona wrote:The GCC toolchain supports __atribute__(( "ifunc"("resolver_function") )) which accomplishes this in a portable manner.
Boris wrote:Or you can just create big llvm IL lto blobs for your syscall library+libc and let software creators do the inlining themselves, for each cpu family.
All of this is focusing on the tip of the iceburg (the specific instruction/s used to transfer control to the kernel) and completely ignores the majority of the iceburg (the entire kernel API).

For a simple example; assume you are implementing the "get_time()" function in a C library; and you know that there are older kernels where "function 0x1234" returns "seconds and microseconds since January 1970 as a pair of signed 32-bit integers", and newer kernels where "function 0x4321" returns "nanoseconds since January 2000 as 96-bit unsigned integer" and that (for compatibility) you need to support both somehow. Also assume that 5 years after you've written your C library (and after hundreds of pieces of closed source executables depend on it and their developers go bankrupt and disappear) you add a "function 0x3333" that returns "TSC multiplier and offset, or zeros if the CPU doesn't support an invariant TSC" (so that software can get these values when the process is started, and then do "time = RDTSC * TSCmultiplier + TSCbase" without calling the kernel API at all).


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Post Reply