Hi all,
I am in the process of transitioning out of my boot loader and into my Kernel, and am looking to make some decisions related to how I want to structure the code going forward. There are a number of different calling conventions for calling both ASM code and interfacing C with ASM, and I am looking to make a decision on one early in my development cycle, for consistency reasons.
I am stuck between wanting to implement the cdecl standard (as it's just that, standard and generally ubiquitous) and one of the other, more recently implemented x64 variants (System V X86_64 being the most attractive), as I would like to start in 64-bit long mode. Each calling convention seems to have its own distinct advantages and disadvantages as well, and I have some concerns.
1) Is Cdecl the "correct" way to go? What happens with arrays, structs, or varargs?
2) Cdecl uses stack based parameter passsing, which, I presume, is slower than passing parameters directly in registers. Other calling conventions pass parameters directly in registers first, followed by others put on the stack; yet this method seems more complicated to implement and maintain, is that presumption correct?
3) In System V X86_64, there is a note mentioning the "Red Zone" in http://wiki.osdev.org/Calling_Conventions#System_V_ABI. I am slightly confused as to the purpose of this space, and the same confusion arises in other calling conventions that have similar constructs (the Microsoft x64 calling convention with its "Shadow Space"). Can someone explain the purpose of that reserved space?
4) Why in System V X86_64 does the stack need to be 16-Byte aligned? This is 128-bits (correct)? Double the word size?
It seems I am leaning in the way of following the cdecl standard, unless there are compelling reasons I should use another calling convention.
Developing a Calling Convention
Re: Developing a Calling Convention
I think you may have read some outdated documentation from the Windows platform. Please try to read up on x64 calling conventions - because, basically, all of them are a form of "cdecl".
Pure cdecl is just push every argument on the stack, back to front, caller cleanup. Pascal (or stdcall) calling convention is push every argument on the stack, front to back, callee cleanup. Then there's half a dozen variants of Pascal calling convention for 32-bit Windows (and related systems), such as fastcall, thiscall, some unnamed variants that put even more in registers, a thiscall-fast variant that has 3 things in register and so on. They all boil down to "put the first N primitives in registers, and then do whatever pascal says".
Then x64 happened, and we finally got a clean break from the old crap. Basically, there's a lot of trouble with callee-cleanup in that your callee may not know how much space you reserved on the stack (1), it may have changed its function definition (2, in particular for C functions), you may have used the wrong prototype (3) and it then needs to know how to cleanup all those things (4). Also, the front-to-back makes printf() and any other variadic function impossible to implement, so those are actually always cdecl and all compilers support it.
If you're going to have to do cdecl support anyway... and there's no real downside to it... why not use it all the time, every time, always? So that's what 64-bit did - Windows 64-bit is now always cdecl, as are all other systems (... mostly because they already were). It gave you more registers to work with, so "fastcall" is the new standard - except by default with 6 arguments in registers, and because we can now assume an SSE unit is present (where cdecl assumed you still had a 486SX without FPU) you can also do that for floating-point and vector arguments as well. So that means that nearly all your stack-use is now gone and you put everything in registers instead. Some other insights were also used to expand on the stack use and to make it more usable, such as the red-zone allowing for a default-spill location without stack pointer adjustments (and IST use in ISRs causing that to not break on kernel-mode stacks). All in all this makes for a much cleaner calling convention.
So which cdecl-style calling convention are you considering? You've basically got the Microsoft-x64 variant and the everybody-else-in-the-entire-planet variant.
[edit] One open question still left:
4) Why in System V X86_64 does the stack need to be 16-Byte aligned? This is 128-bits (correct)? Double the word size?
If you want to spill something to the stack, or pass in a SIMD vector, your stack needs to be at least XMM-size aligned. Hence the requirement to make it XMM-size aligned, which is 4 floats - 16 bytes.
Pure cdecl is just push every argument on the stack, back to front, caller cleanup. Pascal (or stdcall) calling convention is push every argument on the stack, front to back, callee cleanup. Then there's half a dozen variants of Pascal calling convention for 32-bit Windows (and related systems), such as fastcall, thiscall, some unnamed variants that put even more in registers, a thiscall-fast variant that has 3 things in register and so on. They all boil down to "put the first N primitives in registers, and then do whatever pascal says".
Then x64 happened, and we finally got a clean break from the old crap. Basically, there's a lot of trouble with callee-cleanup in that your callee may not know how much space you reserved on the stack (1), it may have changed its function definition (2, in particular for C functions), you may have used the wrong prototype (3) and it then needs to know how to cleanup all those things (4). Also, the front-to-back makes printf() and any other variadic function impossible to implement, so those are actually always cdecl and all compilers support it.
If you're going to have to do cdecl support anyway... and there's no real downside to it... why not use it all the time, every time, always? So that's what 64-bit did - Windows 64-bit is now always cdecl, as are all other systems (... mostly because they already were). It gave you more registers to work with, so "fastcall" is the new standard - except by default with 6 arguments in registers, and because we can now assume an SSE unit is present (where cdecl assumed you still had a 486SX without FPU) you can also do that for floating-point and vector arguments as well. So that means that nearly all your stack-use is now gone and you put everything in registers instead. Some other insights were also used to expand on the stack use and to make it more usable, such as the red-zone allowing for a default-spill location without stack pointer adjustments (and IST use in ISRs causing that to not break on kernel-mode stacks). All in all this makes for a much cleaner calling convention.
So which cdecl-style calling convention are you considering? You've basically got the Microsoft-x64 variant and the everybody-else-in-the-entire-planet variant.
[edit] One open question still left:
4) Why in System V X86_64 does the stack need to be 16-Byte aligned? This is 128-bits (correct)? Double the word size?
If you want to spill something to the stack, or pass in a SIMD vector, your stack needs to be at least XMM-size aligned. Hence the requirement to make it XMM-size aligned, which is 4 floats - 16 bytes.
Re: Developing a Calling Convention
Thanks! I'll go over all of this and respond when I get a moment off of work.Candy wrote:I think you may have read some outdated documentation from the Windows platform. Please try to read up on x64 calling conventions - because, basically, all of them are a form of "cdecl".
Pure cdecl is just push every argument on the stack, back to front, caller cleanup. Pascal (or stdcall) calling convention is push every argument on the stack, front to back, callee cleanup. Then there's half a dozen variants of Pascal calling convention for 32-bit Windows (and related systems), such as fastcall, thiscall, some unnamed variants that put even more in registers, a thiscall-fast variant that has 3 things in register and so on. They all boil down to "put the first N primitives in registers, and then do whatever pascal says".
Then x64 happened, and we finally got a clean break from the old crap. Basically, there's a lot of trouble with callee-cleanup in that your callee may not know how much space you reserved on the stack (1), it may have changed its function definition (2, in particular for C functions), you may have used the wrong prototype (3) and it then needs to know how to cleanup all those things (4). Also, the front-to-back makes printf() and any other variadic function impossible to implement, so those are actually always cdecl and all compilers support it.
If you're going to have to do cdecl support anyway... and there's no real downside to it... why not use it all the time, every time, always? So that's what 64-bit did - Windows 64-bit is now always cdecl, as are all other systems (... mostly because they already were). It gave you more registers to work with, so "fastcall" is the new standard - except by default with 6 arguments in registers, and because we can now assume an SSE unit is present (where cdecl assumed you still had a 486SX without FPU) you can also do that for floating-point and vector arguments as well. So that means that nearly all your stack-use is now gone and you put everything in registers instead. Some other insights were also used to expand on the stack use and to make it more usable, such as the red-zone allowing for a default-spill location without stack pointer adjustments (and IST use in ISRs causing that to not break on kernel-mode stacks). All in all this makes for a much cleaner calling convention.
So which cdecl-style calling convention are you considering? You've basically got the Microsoft-x64 variant and the everybody-else-in-the-entire-planet variant.
[edit] One open question still left:
4) Why in System V X86_64 does the stack need to be 16-Byte aligned? This is 128-bits (correct)? Double the word size?
If you want to spill something to the stack, or pass in a SIMD vector, your stack needs to be at least XMM-size aligned. Hence the requirement to make it XMM-size aligned, which is 4 floats - 16 bytes.
Just for reference, I am using the following website as a quick reference to some of the things I brought up here: https://en.wikipedia.org/wiki/X86_calli ... onventions
-
- Member
- Posts: 5588
- Joined: Mon Mar 25, 2013 7:01 pm
Re: Developing a Calling Convention
It doesn't matter which calling convention you choose, and you can even use different calling conventions for different functions. The specific details depend on which calling convention you choose.lesniakbj wrote:1) Is Cdecl the "correct" way to go? What happens with arrays, structs, or varargs?
On a modern x86 CPU, the speed difference between calling conventions is likely to be negligible. However, if there is a speed difference, the one that accesses the stack will probably be slower.lesniakbj wrote:2) Cdecl uses stack based parameter passsing, which, I presume, is slower than passing parameters directly in registers. Other calling conventions pass parameters directly in registers first, followed by others put on the stack; yet this method seems more complicated to implement and maintain, is that presumption correct?
Of course, if you use a more complicated calling convention, writing the assembly code that uses it will be more complicated. You, the developer, get to decide if you're willing to deal with the extra complexity.
The System V red zone is just a chunk of space beyond the end of the stack that's considered always reserved, so leaf functions can store data on the stack without needing to adjust the stack pointer.lesniakbj wrote:3) In System V X86_64, there is a note mentioning the "Red Zone" in http://wiki.osdev.org/Calling_Conventions#System_V_ABI. I am slightly confused as to the purpose of this space, and the same confusion arises in other calling conventions that have similar constructs (the Microsoft x64 calling convention with its "Shadow Space"). Can someone explain the purpose of that reserved space?
The Win64 shadow space is intended for spilling register parameters to the stack if the registers need to be freed for some other purpose, although it could really be used for anything.
SSE requires data to be aligned in memory. Forcing the stack to be aligned simplifies SSE code.lesniakbj wrote:4) Why in System V X86_64 does the stack need to be 16-Byte aligned? This is 128-bits (correct)? Double the word size?
If you're writing 64-bit code, there are only two calling conventions: the one used in System V and the one used in Windows. (Okay, Windows also has vectorcall, but you probably aren't concerned about that.) System V is the obvious choice here.lesniakbj wrote:It seems I am leaning in the way of following the cdecl standard, unless there are compelling reasons I should use another calling convention.
Re: Developing a Calling Convention
In general, you can live without "System V" and totally ignore it if you want. However, you will need to know "Microsoft x64" in some way or another if you want to use UEFI services. Just a thought that may be important when making choices.Octocontrabass wrote:System V is the obvious choice here.
Re: Developing a Calling Convention
Hi,
Cheers,
Brendan
Um; for 64-bit code, there are an almost infinite number of calling conventions; 2 that are likely to be supported by existing compilers (and are both relatively crappy/awkward), one or 2 (not sure) that must be used for kernels (where the red zone gets trashed by IRQ handlers and can't be used); plus many billions of other possibilities that aren't supported by existing compilers.Octocontrabass wrote:If you're writing 64-bit code, there are only two calling conventions: the one used in System V and the one used in Windows.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.