inline system calls?
inline system calls?
I was thinking about making my libc system call wrappers into inline functions, so that isntead of making a call, the compiler could just emit the "syscall" instruction directly, which may also allow the compiler to optimise out the move from %rcx to %r10.
Hwoever, other operating systems don't do this; even Linux, which has all system call numbers, and the calling method, standardized and kept compatible.
The point of this is to boost performance. Are there any advantages/disadvantages I may have missed?
Hwoever, other operating systems don't do this; even Linux, which has all system call numbers, and the calling method, standardized and kept compatible.
The point of this is to boost performance. Are there any advantages/disadvantages I may have missed?
Re: inline system calls?
Those OS support multiple way to invoke sys call (INT, sysenter, syscall, even non-x86 way), the actual glue code is decided in install or boot time. Yes you might patch the libc inline but doing dynamic link is much safer on this.
Re: inline system calls?
In what way? Or do you mean more future-proof?bluemoon wrote:dynamic link is much safer on this.
Personally I haven't yet fully decided but current plan is to not conflate x86_32 and x86_64 into the same ELF format, but rather have different arch's as different. As such, the SYSCALL instruction is guaranteed to exist and can be used directly (inlined).
However if you want to be able to compile C code into binaries and have those binaries be usable on every x86 arch then you don't know if SYSCALL/SYSENTER is available, only INT is guaranteed at this point (though theoretically even that could change in future). So if you inline SYSCALL into binaries (instead of system local DLL) it wouldn't work on intel x86_32.
Note also that intuitively having SYSCALL inlined would have better performance but with these "smart" CPUs you never know, so always benchmark. Though in this case I can't think of any reason why inlined SYSCALL would be slower so I'd just do it..
- xenos
- Member
- Posts: 1118
- Joined: Thu Aug 11, 2005 11:00 pm
- Libera.chat IRC: xenos1984
- Location: Tartu, Estonia
- Contact:
Re: inline system calls?
I think the difference is rather small:
- If you inline syscalls, you have to decide at compile time which mechanism you use (software interrupt, sysenter/sysexit, syscall/sysret, ...). Kernel and C library must be compiled with the same convention, so that they fit together.
- If you have the syscalls as separate functions in some stub, that you keep together with the kernel, then the C library doesn't know anything about the mechanism. Only the kernel and the syscall stub must fit to each other and compiled with the same convention. You can change the mechanism, recompile the kernel and the syscall stub, and don't need to recompile the C library.
Re: inline system calls?
I am only targetting x86_64 - syscall is guaranteed to exist.
Furthermore, the problems of different archs could be solved by having macros in the C library which decide which inline asm to use.
Furthermore, the problems of different archs could be solved by having macros in the C library which decide which inline asm to use.
Re: inline system calls?
Hi,
Note: I don't know what your project's goals are, and don't know if any of the things I mentioned will matter for your project.
Cheers,
Brendan
Imagine if:mariuszp wrote:I am only targetting x86_64 - syscall is guaranteed to exist.
- Intel create a CPU in 5 years time that has a minor flaw in the way it implements SYSCALL and you want to avoid that flaw
- Intel create a new faster method (or already has an old method they make faster - e.g. SYSENTER) and you want to upgrade to that and don't want to use SYSCALL on some CPUs
- you notice that using 4 call gates for 4 frequently used functions is faster for your OS (because it avoids a "switch(functionNumber)" and some other mess)
- you want processes to use a different library sometimes (e.g. when debugging or profiling, maybe you want the process being debugged/profiled to use a special library that supports "syscall tracing/logging")
- you have a "doThing(foo)" kernel function and want to change it to "doThing(foo, bar)" and want existing code to use a default value for "bar"
- you have a "getTimestamp()" kernel function and (for some CPUs and not others, depending on how the CPU implements TSC) you want to shift the entire function into the user-space library
This requires some form of "compile before use". For C, that probably means that you have to force all software developers to provide source code (which means you can mostly forget about any commercial software, and forget about normal users because "compile before use" is too painful/annoying/fragile for languages like C). Of course there are other forms of "compile before use", like having software compiled into some kind of byte-code by the developer and then compiling that byte-code to native either ahead of time (e.g. when the end user installs the software) or while it's being executed (some sort of JIT). I'm not sure that C can be used like that (pre-processing can ruin portability before the compiler does anything). Of course there are other languages that don't have that problem (Java, C#, ...).mariuszp wrote:Furthermore, the problems of different archs could be solved by having macros in the C library which decide which inline asm to use.
Note: I don't know what your project's goals are, and don't know if any of the things I mentioned will matter for your project.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: inline system calls?
Hmmm, i see that perhaps it is indeed better to have the system call method changeable in libc by having the functions implemented in the shared library.
However, what you said does not require any "compile before use". Clearly, every application must be recompiled for each arch anyway, and for each arch the C header would say something like:
All code compiled onto a specific arch would use a specific syscall convention, but there aren't any more recompilations necessary than those that are required anyway (i.e. one compile per arch).
However, what you said does not require any "compile before use". Clearly, every application must be recompiled for each arch anyway, and for each arch the C header would say something like:
Code: Select all
#ifdef __X86_64__
# define __SYSCALL1(par) {/* x86_64 implementation of syscall */}
#elif __ARM__
# define __SYSCALL1(par) {/* ARM implementation of syscall */}
#elif ...
/* ... */
#endif
Re: inline system calls?
Hi,
After 10 years this header file is going to grow to about 20 MiB just so that new applications work on old kernels and old CPUs, and (for "closed source") developers are going to have 1234 different executables to handle all the permutations for 80x86 alone (different CPU family/models, different kernel versions, different other options). Older applications that aren't recompiled will probably just become buggy messes that blow up in the user's face without warning.
Cheers,
Brendan
I think you mean something more like:mariuszp wrote:Hmmm, i see that perhaps it is indeed better to have the system call method changeable in libc by having the functions implemented in the shared library.
However, what you said does not require any "compile before use". Clearly, every application must be recompiled for each arch anyway, and for each arch the C header would say something like:
All code compiled onto a specific arch would use a specific syscall convention, but there aren't any more recompilations necessary than those that are required anyway (i.e. one compile per arch).Code: Select all
#ifdef __X86_64__ # define __SYSCALL1(par) {/* x86_64 implementation of syscall */} #elif __ARM__ # define __SYSCALL1(par) {/* ARM implementation of syscall */} #elif ... /* ... */ #endif
Code: Select all
#ifndef DEBUGGING
# ifndef PROFILING
# ifdef __X86_64__
# ifdef KERNEL_API_VERSION_0_1
# ifdef CPU_VENDOR_INTEL
# ifndef CPUFAMILY123 // This one is buggy
# ifndef SYSENTER_IS_FASTER
# ifndef ENABLE_CALLGATE_FOR_FOO_AND_BAR
# ifndef OMG_WE_FORGOT_ABOUT_THIS
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: inline system calls?
But system service functions (syscalls) are not only bare sysenter instructions. you might want to do a lot of internal processing before passing it downwards, maybe even totally avoiding making a syscall, parameter checking etc. do you want to inline this all?mariuszp wrote: I was thinking about making my libc system call wrappers into inline functions, so that isntead of making a call, the compiler could just emit the "syscall" instruction directly, which may also allow the compiler to optimise out the move from %rcx to %r10.
Hwoever, other operating systems don't do this; even Linux, which has all system call numbers, and the calling method, standardized and kept compatible.
The point of this is to boost performance. Are there any advantages/disadvantages I may have missed?
and how are you going to be "Posix" compliant this way? how for example the fork() implementation would look like? Macros? it sounds like static linking of the system services into applications. you'll end up with multiple code duplication in your apps' binaries, instead of having system supplied dynamic linking libraries. this is neglecting all the complexities already touched by previous posters and probably a lot of not mentioned. and that all gives you a negligible optimisation of removing a few calls along the code path.
Re: inline system calls?
Intel creating a CPU with minor flag in CALL is about as likely. Difference is SYSCALL can be disabled and then you can presumably (haven't tested) catch the #UD. But is it really good practice to prepare for future bugs?Brendan wrote:Intel create a CPU in 5 years time that has a minor flaw in the way it implements SYSCALL and you want to avoid that flaw
I thought about replying to the rest but it just seemed like splitting hairs.. So instead I thought I'd mention that I'm thinking about using byte-code ultimately and as such this is (for me) mostly a moot point. Which brings me to my actual point, it really depends on the circumstances, and that I wouldn't worry about this low level details for now at all (and personally haven't code wise, though I've thought about them), so use which ever you prefer and is easier to implement. With DLL you might save minimal re-compilation time (a lot if you keep changing your syscall lib, but I'd expect that to be rare even during development).
Nonsense, it's a tooling problem, nothing more. Since it's your OS you can fix that (not that it's OS specific to begin with), if your OS toolset works perfectly then everyone else has to follow suit (or cease to exist).forget about normal users because "compile before use" is too painful/annoying/fragile for languages like C).
Language doesn't have that kind of effect, it's compiler/tooling problem. A language is just a language and doesn't have impact on that.I'm not sure that C can be used like that (pre-processing can ruin portability before the compiler does anything). Of course there are other languages that don't have that problem (Java, C#, ...).
Note, I'm not a fan of C or C++, but do use both, and both are pretty well tooled for osdev though both have massive issues for me as well..
PS. As Brendan also said, it all depends on your goals. The key is to think short, mid and long term at the same time. For me, even if I end up not using byte-code it won't matter, I have alternative solutions, so I'm going with what's easier and has better performance for now, which is to "hard code" the syscalls into applications. Some of the points raised for choosing DLL instead are valid and can make life easier if you make the same mistakes (which are easy to make) as Linux or Windows (ELF and PE).. Basically, if you conflate things you are gonna get screwed, then it's better to stay with dynamic solutions, it works as a lubricant =)
Re: inline system calls?
So as there are multiple mechanism to invoke kernel call, you can't inline it at compile time. It can however be done in link time, AFAIK there are three approaches:LtG wrote:In what way? Or do you mean more future-proof?bluemoon wrote:dynamic link is much safer on this.
1. tradition dynamic linking. you provide a "glue library" which do the actual kernel call. Simple.
2. Address patch. In libc(or any program) they will call a kernel_call() function which is resolved(linked) in load-time. the actual "CALL kernel_call" instruction is in a form of R_X86_64_JMP_SLOT that is visible on the ELF header. (similar things apply on other format). Upon load time linking, you just replace the JMP instruction with whatever you want to do. Note that you want to make sure there is enough room for the patch.
3. inline/in-place patch. Similar to (2), but you indeed patch the caller. This might be done by providing a "slow" and a "fast" kernel call path, and everyone calling the "slow" path will trigger a patch to the caller by look up the return address stack.
IIRC Windows use (2) to patch DLL. (3) might seems fastest but run-time self-modifying code is a no-no for many people.
Re: inline system calls?
The GCC toolchain supports __atribute__(( "ifunc"("resolver_function") )) which accomplishes this in a portable manner.bluemoon wrote:2. Address patch. In libc(or any program) they will call a kernel_call() function which is resolved(linked) in load-time. the actual "CALL kernel_call" instruction is in a form of R_X86_64_JMP_SLOT that is visible on the ELF header. (similar things apply on other format). Upon load time linking, you just replace the JMP instruction with whatever you want to do. Note that you want to make sure there is enough room for the patch.
(Btw, JMP_SLOT is not placed in the .text segment; it is used for PLT entries.)
managarm: Microkernel-based OS capable of running a Wayland desktop (Discord: https://discord.gg/7WB6Ur3). My OS-dev projects: [mlibc: Portable C library for managarm, qword, Linux, Sigma, ...] [LAI: AML interpreter] [xbstrap: Build system for OS distributions].
Re: inline system calls?
Or you can just create big llvm IL lto blobs for your syscall library+libc and let software creators do the inlining themselves, for each cpu family.
Even if it means compiling a simple hello world in one hour.
Even if it means compiling a simple hello world in one hour.
Re: inline system calls?
Hi,
In theory it's possible to create tools that fix all the "compile before use is too painful/annoying/fragile for C" problems; but in practice nobody ever will - they either won't be willing to write their own tools at all, or they'll go all the way and abandon C itself (e.g. create their own language).
Cheers,
Brendan
The CALL instruction is relatively simple. SYSCALL is more complex and has multiple edge cases, where it might be fine most of the time but on rare occasions several condition occur at the same time and you end up with something like this or this.LtG wrote:Intel creating a CPU with minor flag in CALL is about as likely.Brendan wrote:Intel create a CPU in 5 years time that has a minor flaw in the way it implements SYSCALL and you want to avoid that flaw
Yes, but that'd give far worse performance.LtG wrote:Difference is SYSCALL can be disabled and then you can presumably (haven't tested) catch the #UD.
For software that hopes to be around for a while; it's good practice to design it with some flexibility, so that if things change in future you have a way to deal with those changes (that doesn't break backward compatibility). It doesn't matter if things change because you want to add new features to your kernel, or if you discover a way to optimise something, or if you need to fix a bug in your software, or if Intel added something better to the CPU, or if you need to work around a bug in a CPU.LtG wrote:But is it really good practice to prepare for future bugs?
In theory it's possible to drive half way across a bridge; but in practice nobody ever does - they either don't drive across the bridge at all, or they drive all the way to the other side.LtG wrote:I thought about replying to the rest but it just seemed like splitting hairs.. So instead I thought I'd mention that I'm thinking about using byte-code ultimately and as such this is (for me) mostly a moot point. Which brings me to my actual point, it really depends on the circumstances, and that I wouldn't worry about this low level details for now at all (and personally haven't code wise, though I've thought about them), so use which ever you prefer and is easier to implement. With DLL you might save minimal re-compilation time (a lot if you keep changing your syscall lib, but I'd expect that to be rare even during development).
Nonsense, it's a tooling problem, nothing more. Since it's your OS you can fix that (not that it's OS specific to begin with), if your OS toolset works perfectly then everyone else has to follow suit (or cease to exist).forget about normal users because "compile before use" is too painful/annoying/fragile for languages like C).
In theory it's possible to create tools that fix all the "compile before use is too painful/annoying/fragile for C" problems; but in practice nobody ever will - they either won't be willing to write their own tools at all, or they'll go all the way and abandon C itself (e.g. create their own language).
No; "pre-processing to work around portability problems" is a language problem with multiple causes (implementation defined behaviours, lack of standardisation for things like networking and GUI, poor primitive types, multiple versions of the language itself, etc). You can invent a "C like" language that doesn't need a pre-processor, but that language will not be C anymore.LtG wrote:Language doesn't have that kind of effect, it's compiler/tooling problem. A language is just a language and doesn't have impact on that.I'm not sure that C can be used like that (pre-processing can ruin portability before the compiler does anything). Of course there are other languages that don't have that problem (Java, C#, ...).
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: inline system calls?
Hi,
For a simple example; assume you are implementing the "get_time()" function in a C library; and you know that there are older kernels where "function 0x1234" returns "seconds and microseconds since January 1970 as a pair of signed 32-bit integers", and newer kernels where "function 0x4321" returns "nanoseconds since January 2000 as 96-bit unsigned integer" and that (for compatibility) you need to support both somehow. Also assume that 5 years after you've written your C library (and after hundreds of pieces of closed source executables depend on it and their developers go bankrupt and disappear) you add a "function 0x3333" that returns "TSC multiplier and offset, or zeros if the CPU doesn't support an invariant TSC" (so that software can get these values when the process is started, and then do "time = RDTSC * TSCmultiplier + TSCbase" without calling the kernel API at all).
Cheers,
Brendan
bluemoon wrote:So as there are multiple mechanism to invoke kernel call, you can't inline it at compile time.
Korona wrote:The GCC toolchain supports __atribute__(( "ifunc"("resolver_function") )) which accomplishes this in a portable manner.
All of this is focusing on the tip of the iceburg (the specific instruction/s used to transfer control to the kernel) and completely ignores the majority of the iceburg (the entire kernel API).Boris wrote:Or you can just create big llvm IL lto blobs for your syscall library+libc and let software creators do the inlining themselves, for each cpu family.
For a simple example; assume you are implementing the "get_time()" function in a C library; and you know that there are older kernels where "function 0x1234" returns "seconds and microseconds since January 1970 as a pair of signed 32-bit integers", and newer kernels where "function 0x4321" returns "nanoseconds since January 2000 as 96-bit unsigned integer" and that (for compatibility) you need to support both somehow. Also assume that 5 years after you've written your C library (and after hundreds of pieces of closed source executables depend on it and their developers go bankrupt and disappear) you add a "function 0x3333" that returns "TSC multiplier and offset, or zeros if the CPU doesn't support an invariant TSC" (so that software can get these values when the process is started, and then do "time = RDTSC * TSCmultiplier + TSCbase" without calling the kernel API at all).
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.