Calling-Conventions

Owen · Post by **Owen** » Fri Jan 15, 2010 2:05 pm

ErikVikinger wrote:
Owen wrote:I just have the this parameter behave as a hidden first parameter to a function.
Where is the position of the this pointer exactly?

The function "SomeClass::aMethod(int param, short param2)" is called as if it were "aFunction(SomeClass* thisp, int param, short param2)"

ErikVikinger wrote:
Owen wrote:This has a few advantages:

C style functions can masquerade as class functions (This kind of thing is used a lot for dynamic bridging of C++ to dynamic languages)

This becomes a valid argument for the va_start C builtin
Please describe this a little bit more in detail.

I have some code which, on supporting ABIs (E.G. not x86_64, where a vararg call requires extra work by the caller), masquerades as a function and then walks it's parameters based on some tables for things like serializing across a network (For RPC, for example). On architectures with the differing varargs methods, this gets more complex (And generally requires assembly)

Of course, this is pretty rare, but it's useful.

ErikVikinger wrote:
Owen wrote:Additionally, classes call their own members relatively rarely;
You are sure? I have looked into vector.h and there are many method-calls with this.
I will try to discover it.

std::vector & co are somewhat unusual examples in that they're simple container classes rather than application ones which tend to be more complex. While optimizing for such classes, because of how often they are used, would seem like a good idea, they're often inlined anyway making it moot.

Any method which calls external classes is going to have to copy the this pointer somewhere anyway, probably into the callee save registers, which is something it would have to do anyway if the this register was not callee save. However, it now also has the expense of copying it back before returning. I'm inclined to say that it would end up cheaper to just treat this as any other parameter.

ErikVikinger wrote:
Owen wrote:What kind of exception are we talking about here - processor or high level language?
Processor-exceptions, for example at memory access (memory read access with multiple registers that cross page boundary) or during mathematical calculations. If this instructions are destructive they must be carefully and must detect all possible exceptions before it write its results into the registers. Or you have register renaming and can allocate new virtual registers for the results, i will not implement register renaming into my CPU. Non-destructive instructions can raise an exception and, after its handling, simply restarted. High level language exceptions are not my problem, the compiler can do this job, if needed.

I suppose the difference here is that I don't have any instructions which can fault midway through execution which don't at least maintain a shadow state in registers. For example:

Code: Select all

memcpy r1, r2, r3 ; same as C memcpy(r1, r2, r3)

will decrement the count (r3) each time it runs through a a cycle, and can take an exception at any point safely (In fact, it will also allow interrupts). The fact that few of my instructions mutate many registers helps a lot here.

ErikVikinger wrote:
Owen wrote:I suspect our instruction encodings look radically different - I don't know about yours, but mine is pretty esoteric;
Yes, our instructions looks really totally different, but esoteric is my too. I have 4 complete flag-sets and every instruction that modify it must specify which one is used, this avoids a single point of dependency and help for loop unrolling. In do not have indirect accesses, each usage of a register or flag-set must be specified on the assembly instruction. A PUSHM.F2:Z R10,10 is expanded to STRMIA.W,F2:Z [R62!],R10,10 which means : STRM = store multiple, I = increment address, A = after each register, W = word size (32 bit) for all registers and the increment of R62 is 4 after each register, F2 = execution depends on flag-set 2, Z = instruction is executed is Z is set (in flag-set 2), R62 is the stack-pointer, ! = the new value of R62 after all memory writes is written back to R62, R10 is the first register that is written to memory, 10 = ten registers R10...R19 are written ; this means : if the instruction is executed it does : the registers R10...R19, 40 bytes, are written to [R62] and R62 += 40. An other example is ADD.H,F1:GE R7,R6,R5,F0 which means : if the instruction is executed (GE-condition in flag-set 1 is true) it do half-word (16 bit) based R7 = R6 + R5 and set the flags in flag-set 0 for the 16 bit result.

Aah, 64-bit instructions I guess? No way I could see for you squeeze that into 32. I wonder what the speed tradeoff is like w.r.t memory bandwidth v.s. expressiveness.

I'll mention a few other esoteric things:

Fused multiply-shift and shift-divide instructions, for fast fixed point: (Edit: I should add that they use a 64-bit intermediate)

Code: Select all

; X prefix for fixed point ;-)
XMULS r1, r2, 8, r3 ; r3 = (r1 * r2) >> 8. signed
XMULU r4, r5, 16, r6 ; r6 = (r4 * r5) >> 16. unsigned
XDIV r3, 8, r6, r7 ; r7 = (r3 << 8) / r6

Fast procedure entry/exit instructions for non-leafs:

Code: Select all

ENTER r31, r30, 8, 4
; Pseudocode:
; StartSP = SP
; SP -= 8 words
; for(int i = 0; i < 4; i++) *(StartSP - 1 - i) = r[31 - i]; (Save top 4 registers to 4 stack slots of the 8 we just reserved)
; r31 = LR (Link register - an SFR)
; r30 = StartSP (Create a frame pointer)

EXIT r31, r30, 4
; LR = r31
; SP = r30
; for(int i = 0; i < 4; i++) r[31 - i] = *(SP - 1 - i); (Restore top 4 registers off stack)
; JMP LR ; Return!

Theres also JRL, or Jump relink, which is used for tail calls from non-leafs. It's like EXIT, except it takes an extra parameter in the form of the register containing the function to call. In fact EXIT is encoded as JRL rX, rX, N, r0, as you probably have no reason to jump to a hardcoded zero (And I only have 64 4-operand instructions!)

ErikVikinger · Post by **ErikVikinger** » Fri Jan 15, 2010 6:01 pm

Hello,

Owen wrote:I have some code which, on supporting ABIs (E.G. not x86_64, where a vararg call requires extra work by the caller), masquerades as a function and then walks it's parameters based on some tables for things like serializing across a network (For RPC, for example). On architectures with the differing varargs methods, this gets more complex (And generally requires assembly)

Okay, that is interesting.
About varargs i must thinking again. My present idea is that the varargs ever passed over the stack and the caller must free it after the call. The callee is not allowed to write into the varargs because it do not really know the exact size of the varargs.

Owen wrote:While optimizing for such classes, because of how often they are used, would seem like a good idea, they're often inlined anyway making it moot. ..... I'm inclined to say that it would end up cheaper to just treat this as any other parameter.

Okay, i see, i must thinking again.

Owen wrote:I suppose the difference here is that I don't have any instructions which can fault midway through execution which don't at least maintain a shadow state in registers. For example:
Code: Select all
memcpy r1, r2, r3 ; same as C memcpy(r1, r2, r3)
will decrement the count (r3) each time it runs through a a cycle, and can take an exception at any point safely (In fact, it will also allow interrupts). The fact that few of my instructions mutate many registers helps a lot here.

This looks like CISC, i will use RISC and in this case the r3 is not a register but it is a immed. ARM has corrected the internal operation of the STM/LDM instructions by switch from v2 to v3 (or an other generation change, i do not remember exactly at this moment) that the pointer register is written with the new value after all memory accesses are completely finished and no more exception can now raise, if the exception raise at any point before the STM/LDM instruction is restartable because it do not trash the pointer. One problem exist additional on the LDM instruction : in the case the pointer register is not written back after the memory access but the pointer register is part of the loaded registers than the possible accident at restarting is still present. On my CPU i will forbid that a LDRM instruction can load its pointer-register, i do not know a situation there this special feature is useful.

Owen wrote:Aah, 64-bit instructions

Yes and no. I use 128 bit instruction packets, like Itanium. But i can insert 2 up to 6 instructions into one packet. An instruction can have 20 bit, 39 bit or 60 bit, and i allow all possible combinations of it inside of a packet. I have seen the small code size and the slight loss of performance of the 16 bit Thumb instructions in relation to the 32 bit instructions on ARM (3 or more operands are not ever used). My idea is to combining both.

Owen wrote:No way I could see for you squeeze that into 32.

Both examples from me can be a 39 bit instruction in certain circumstances (without these circumstances it must be 60 bit versions).

I have a little bit more : ADD.H,F1:GE R7,R6,R5 #ROL R4,F0 : R7 = R6 + (R5 <<< R4). This instruction can also be (in certain circumstances) a 39 bit version. My instruction set is mostly inspired by ARM, the shifter-operand is cool in my opinion.

Owen wrote:I wonder what the speed tradeoff is like w.r.t memory bandwidth v.s. expressiveness.

I hope my solution is a good one for a maximum processing performance (with powerful and flexible instructions) and a less as possible size of the instruction stream. AVR32 go a similar way with flexible mixing of 16 bit and 32 bit instructions. AVR32 is my second inspiration.

Owen wrote:Fused multiply-shift and shift-divide instructions, for fast fixed point

This is interesting, i must add this idea to my wish-list for the next version.

Owen wrote:Fast procedure entry/exit instructions for non-leafs

I am not sure but i think i need only 2 or 3 instructions (all of this are usually 20 bit ones) in the function header and footer. I hope this is fast enough.

Owen wrote:And I only have 64 4-operand instructions!

Respect! Is it a real 64 bit platform?

Greetings
Erik

ErikVikinger · Post by **ErikVikinger** » Sat Jan 16, 2010 9:03 am

Hello,

js wrote:If you want to know which, I can tell you once

Yes, please.

Owen wrote:(Edit: I should add that they use a 64-bit intermediate)

You mean before the shift and after the multiplication? Nice. Exist any documentation about the instruction-set?

Thanks
Erik

Owen · Post by **Owen** » Sat Jan 16, 2010 10:41 am

At present, I'm specifying a 32-bit architecture, but it's explicitly designed for 64-bit extensibility. For example, wherever I specify a bit index (or shift count), I reserve an extra instruction which can be used for bigger shifts (An operand slot is 5 bits). All of my instructions are 32-bit, though I'm also designing a 16-bit Thumb2 like mode; of course, as this has to encode into 16 bits, it's less expressive (Only has two operand slots, unpredicted, condition code updates controlled by instruction).

At present I don't have any documentation thats any way complete; I'm still working on it.

For varargs, I have them called as a normal function is; extra arguments are spilled onto the stack from right to left. The function can spill the arguments onto the stack if it wants (and it probably does) by reserving space for them at the start of it's call frame; in doing so, va_arg & co can just iterate over them in order (As there is no gap between the caller and callee spilled arguments). Cleanup is always done by the function which allocated that bit of storage on the stack. A function is free to do whatever it wants to the argument slots on the stack.

I'm probably going to change ENTER's order around a bit to help facilitate this

ErikVikinger · Post by **ErikVikinger** » Sat Jan 16, 2010 12:47 pm

Hello,

Owen wrote:At present, I'm specifying a 32-bit architecture, but it's explicitly designed for 64-bit extensibility. For example, wherever I specify a bit index (or shift count), I reserve an extra instruction which can be used for bigger shifts (An operand slot is 5 bits).

The 64 bit was a design goal right from the start. All operand slots are 6 bit (one reason for the 64 base registers) and mostly instructions have a 2 bit field for the used data size (8 bit, 16 bit, 32 bit or 64 bit).

Owen wrote:All of my instructions are 32-bit, though I'm also designing a 16-bit Thumb2 like mode; of course, as this has to encode into 16 bits, it's less expressive (Only has two operand slots, unpredicted, condition code updates controlled by instruction).

16 bit was to less for me therefore i was creating the packet-format with a minimum instruction size of 20 bit. The 39 bit instructions are a superset of the 20 bit instructions with more operands, bigger immediates and packet controlled predication or only conditional jumps. The 60 bit instructions are a superset of the 39 bit instructions with bigger immediates and full predication (each instruction can be conditionally executed independently of other instructions).

Owen wrote:At present I don't have any documentation thats any way complete; I'm still working on it.

Me too.

Owen wrote:For varargs, I have them called as a normal function is; extra arguments are spilled onto the stack from right to left.

I will do the same. The known parameters (before the ", ...") are passed by registers and all varargs are pushed onto the stack from right to left (the first vararg is at nearest to the Top-of-Stack). The callee does not copy the varargs (and stack passed parameters too) into its stack-frame, the varargs are used directly (but read only) because its positions relatively to the SP is always known.

Owen wrote:The function can spill the arguments onto the stack if it wants (and it probably does) by reserving space for them at the start of it's call frame;

The callee must know the size of the varargs for this. Is this information available inside of the function-header? In my implementation this information is not available, the function-code must analyze the normal parameters for extraction this information manually.

Owen wrote:A function is free to do whatever it wants to the argument slots on the stack.

I think this is only save if the size of the stack-slot is exact known.

Owen wrote:I'm probably going to change ENTER's order around a bit to help facilitate this

A RET instruction is for me a simple MOV R63,R61 (R63 (instruction-pointer) = R61 (link-register)) or a POP R63 (if the link-register is pushed as first onto the stack). I do not have a ENTER or EXIT instruction, but it could be a nice improvement for my CPU (i will look for a cozy place in the opcode space). The present needed instructions for my function header and footer are all depended on the stack-pointer so each one must wait for the successful finish of the previous one.

Greetings
Erik

Owen · Post by **Owen** » Sat Jan 16, 2010 1:59 pm

I pass all arguments in r1 through r11. A vararg function which so desires can push the arguments down on to the stack in reverse order (from r11 to r1) regardless of how many are actually passed; just wasting cycle in doing so. It is allowed to mutate the area of the stack into which it's arguments were spilled; if there is some confusion between caller and callee as to the size of this area, you have bigger problems than stack corruption anyway.

While doing things this way makes vararg function somewhat slower, they are some of the slowest to call anyway. I prefer consistency to efficiency in this case.

My IP (Or PC, but I prefer x86's term in this case) isn't mapped into the usual register file; it can only be used (implicitly) by a few instructions:

Code: Select all

JMP disp22 (IP relative)
CALL disp22 (IP relative)
CALL reg, lit (Jump to reg, if lit is not zero then store IPnext into LR, aliased to plain CALL & JMP)
CALL sfr, lit (As above except source is SFR. Common use is CALL LR, 0, aka RET ;) )
EXIT 
LD disp17, reg
LEA disp17, reg

You can see a very incomplete working draft here. The cyce counts are for the P32A core design I'm working on, a single issue, pipelined, in order core. (It implements 2 register read and one write port, and one of each (Plus misc direct routes to the registrs) for SFR space, thus influencing the instruction set design.) It will be implemented using a relatively narrow microcode (Which for most instructions is little more complex than a traditional decoder)

ErikVikinger · Post by **ErikVikinger** » Sun Jan 17, 2010 9:52 am

Hello,

Owen wrote:I pass all arguments in r1 through r11.

This are 11 registers, a prime number, you are sure?

Owen wrote:A vararg function which so desires can push the arguments down on to the stack in reverse order (from r11 to r1)

For a working va_list you must ever push the varargs to the stack, at first on the function header (before you do save used callee-save registers), va_arg would wish a continued list.

Owen wrote:It is allowed to mutate the area of the stack into which it's arguments were spilled;

If the space is allocated by the callee or its size is exactly known than it is safe. In all other situations i prefer safety first.

Owen wrote:if there is some confusion between caller and callee as to the size of this area, you have bigger problems than stack corruption anyway.

By varargs the callee know by default never the size.
For example:

Code: Select all

printf("Hello %i and %u or %c\n",123);

the callee try to access (read) for %u and %c into the stack-frame of the caller but there are no problems (okay theoretical can raise a stack underflow resulting in a segmentation-fault). Anyway the showed call is valid, all compilers do that (okay some with warnings because printf is a well known function).

Owen wrote:I prefer consistency to efficiency in this case.

I can not see a additional consistency in your solution.
I will pass known arguments by default in the registers and, if are more parameters, additional on the stack (controlled by strong rules). After the known parameters: varargs i will pass ever on the stack, there exist one continued list (without gaps) of parameters with variable length. I think it is consistency enough. And it is fast with minimal stack usage.

Owen wrote:You can see a very incomplete working draft here. ....

Nice.

Greetings
Erik

Owen · Post by **Owen** » Sun Jan 17, 2010 12:02 pm

ErikVikinger wrote:
Owen wrote:I pass all arguments in r1 through r11.
This are 11 registers, a prime number, you are sure?

I have 31 GPRs (r0 being a hard zero); one of my categories is going to end up with an extra register. I chose the parameters option; in the case of less parameters (likely), it provides extra caller save registers anyway

ErikVikinger wrote:
Owen wrote:A vararg function which so desires can push the arguments down on to the stack in reverse order (from r11 to r1)
For a working va_list you must ever push the varargs to the stack, at first on the function header (before you do save used callee-save registers), va_arg would wish a continued list.

Pushing them down in reverse order means that the parameters indexes on the stack end up in the right order. Also, not all uses of va_arg (Though admittedly most) require stacking the parameters; for those cases registers are quicker

ErikVikinger wrote:
Owen wrote:It is allowed to mutate the area of the stack into which it's arguments were spilled;
If the space is allocated by the callee or its size is exactly known than it is safe. In all other situations i prefer safety first.

Hmm, I think I may say that it's not permitted to mutate the varargs section of the stack. It's unlikely to occur anyway, so I see no major issues there

ErikVikinger wrote:
Owen wrote:if there is some confusion between caller and callee as to the size of this area, you have bigger problems than stack corruption anyway.
By varargs the callee know by default never the size.
For example:
Code: Select all
printf("Hello %i and %u or %c\n",123);
the callee try to access (read) for %u and %c into the stack-frame of the caller but there are no problems (okay theoretical can raise a stack underflow resulting in a segmentation-fault). Anyway the showed call is valid, all compilers do that (okay some with warnings because printf is a well known function).

I know that the size is unknown to the callee; I'm assuming most can get it accurately through their arguments (And would argue that your example is invalid, at least according to the printf API)

ErikVikinger wrote:
Owen wrote:I prefer consistency to efficiency in this case.
I can not see a additional consistency in your solution.
I will pass known arguments by default in the registers and, if are more parameters, additional on the stack (controlled by strong rules). After the known parameters: varargs i will pass ever on the stack, there exist one continued list (without gaps) of parameters with variable length. I think it is consistency enough. And it is fast with minimal stack usage.

I'll pass all parameters equally; in the end, for functions with unknown numbers of parameters, it's pretty much equivalent. I just like the ability of a vararg function to impersonate a non-vararg one

ErikVikinger wrote:
Owen wrote:You can see a very incomplete working draft here. ....
Nice.

I hope to get it more complete shortly.

ErikVikinger · Post by **ErikVikinger** » Sun Jan 17, 2010 2:07 pm

Hello,

Owen wrote:
ErikVikinger wrote:
Owen wrote:I pass all arguments in r1 through r11.
This are 11 registers, a prime number, you are sure?
I have 31 GPRs (r0 being a hard zero); one of my categories is going to end up with an extra register. I chose the parameters option; in the case of less parameters (likely), it provides extra caller save registers anyway

Sorry, i do only wonder about this even number.

Owen wrote:Pushing them down in reverse order means that the parameters indexes on the stack end up in the right order.

Yes, i do the same.

Owen wrote:Also, not all uses of va_arg (Though admittedly most) require stacking the parameters;

The va_arg function becomes for iteration through the vararg-list only the va_list-struct. Typical the va_list contains only a pointer to the varargs at the stack. Any mixture of stack and register increase considerable the complexity. The va_list-struct, initialized by va_start, is the only way to access the varargs in C/C++.
see http://www.cplusplus.com/reference/clibrary/cstdarg/

Owen wrote:for those cases registers are quicker

In do not think so. The varargs must be in all cases, for C/C++, written to a continued list (without gaps) to memory and the caller can do this job better (he know the vararg-parameters exactly).

Owen wrote:Hmm, I think I may say that it's not permitted to mutate the varargs section of the stack. It's unlikely to occur anyway, so I see no major issues there

Theoretical you can use the parameters-space on stack, after its use, for intermediate values and save stack-space. But usually the callee-created stack-frame is used for this. It is a job for the compiler to decide there intermediate-values and local variables are stored, the fixed space from parameters after the registers is a possible option but the vararg-space is not possible. The reasons are the callee do not know the size of the varargs and you can use the va_start function a second time and iterate over the varargs again (it could be very complicated for the compiler to determine that part of the varargs is not anymore used).

Owen wrote:I'm assuming most can get it accurately through their arguments

I think this is an obligation for using varargs. But the compiler can not check it and the calling-convention must be independent of this.

Owen wrote:(And would argue that your example is invalid, at least according to the printf API)

If i rename the printf to foo, who can say it is valid or not?

Owen wrote:I'll pass all parameters equally; in the end, for functions with unknown numbers of parameters, it's pretty much equivalent.

The varargs are an additional feature to normal functions (without affecting the normal parameters) and you know the frontier exactly. I can not see a benefit for concealing this frontier. Furthermore typical you use the varargs in a different way.

Owen wrote:I just like the ability of a vararg function to impersonate a non-vararg one

When do you use it. The calling-conventions are exactly the same, except the optional varargs after the normal parameters.

Greetings
Erik

Owen · Post by **Owen** » Sun Jan 17, 2010 4:33 pm

ErikVikinger wrote:
Owen wrote:Also, not all uses of va_arg (Though admittedly most) require stacking the parameters;
The va_arg function becomes for iteration through the vararg-list only the va_list-struct. Typical the va_list contains only a pointer to the varargs at the stack. Any mixture of stack and register increase considerable the complexity. The va_list-struct, initialized by va_start, is the only way to access the varargs in C/C++.
see http://www.cplusplus.com/reference/clibrary/cstdarg/

A compiler is free to optimize away the va_list structure when possible, and there are some functions in which it is very possible to do so

ErikVikinger wrote:
Owen wrote:for those cases registers are quicker
In do not think so. The varargs must be in all cases, for C/C++, written to a continued list (without gaps) to memory and the caller can do this job better (he know the vararg-parameters exactly).

Again, as above, there are some functions where stacking the arguments can be optimized away.

An example:

Code: Select all

void aFunction(int op, ...) {
   va_list args;
   va_start(args, op);

   if(op == 0) {
      int arg0 = va_arg(args, int);
      char* arg1 = va_arg(args, char*);
      // ...
   } else {
     char* arg1 = va_arg(args, char*);
     char* arg2 = va_arg(args, char*);
     // ...
   }
}

A common example of this is ioctl.

ErikVikinger wrote:
Owen wrote:Hmm, I think I may say that it's not permitted to mutate the varargs section of the stack. It's unlikely to occur anyway, so I see no major issues there
Theoretical you can use the parameters-space on stack, after its use, for intermediate values and save stack-space. But usually the callee-created stack-frame is used for this. It is a job for the compiler to decide there intermediate-values and local variables are stored, the fixed space from parameters after the registers is a possible option but the vararg-space is not possible. The reasons are the callee do not know the size of the varargs and you can use the va_start function a second time and iterate over the varargs again (it could be very complicated for the compiler to determine that part of the varargs is not anymore used).

I'm allowing mutating any non vararg stack slots on the basis that, when someone does "&arg", it provides a convenient address to take.

ErikVikinger wrote:
Owen wrote:I'm assuming most can get it accurately through their arguments
I think this is an obligation for using varargs. But the compiler can not check it and the calling-convention must be independent of this.

Indeed, but it's a bug if a function accesses non-arguments (Either with the caller or callee)

ErikVikinger wrote:
Owen wrote:(And would argue that your example is invalid, at least according to the printf API)
If i rename the printf to foo, who can say it is valid or not?

The designer of the foo API. printf expects to be called with an argument for every option; calling it otherwise results in undefined (and potentially crashing) behaviour

ErikVikinger wrote:
Owen wrote:I'll pass all parameters equally; in the end, for functions with unknown numbers of parameters, it's pretty much equivalent.
The varargs are an additional feature to normal functions (without affecting the normal parameters) and you know the frontier exactly. I can not see a benefit for concealing this frontier. Furthermore typical you use the varargs in a different way.

Owen wrote:I just like the ability of a vararg function to impersonate a non-vararg one
When do you use it. The calling-conventions are exactly the same, except the optional varargs after the normal parameters.

An example of when it's used is when you need to expand an API to cover cases that weren't covered before.

In fact, IIRC, it's a part of the C standard that, if someone does the following:

Code: Select all

int printf();

void someFunction()
{
    printf("%s %d %f %d", "a string", 1, 1.0, 5);
}

it must work! This is for backwards compatibility with K&R C.

ErikVikinger · Post by **ErikVikinger** » Mon Jan 18, 2010 2:28 pm

Hello,

Owen wrote:A compiler is free to optimize away the va_list structure when possible, and there are some functions in which it is very possible to do so

Yes that's right. But this is independent from the parameter passing. This optimizations can work with both, varargs passing in registers or on stack.

Owen wrote:Again, as above, there are some functions where stacking the arguments can be optimized away.
An example:
Code: Select all
void aFunction(int op, ...) {
   va_list args;
   va_start(args, op);

   if(op == 0) {
      int arg0 = va_arg(args, int);
      char* arg1 = va_arg(args, char*);
      // ...
   } else {
     char* arg1 = va_arg(args, char*);
     char* arg2 = va_arg(args, char*);
     // ...
   }
}
A common example of this is ioctl.

Okay, but this works with varargs on stack good too. Functions with varargs are rarely and functions with varargs and this structure are significantly more seldom. The mostly functions with varargs iterate through the varargs with a loop, here can the compiler optimize nothing for the vararg-access, and a continued vararg-list on stack is the fastest/easiest way. The question is IMHO: "should we do an optimization for a rarely situation with the possibility for a performance loss in mostly of the other situations?". My question for an extra handling of the this-pointer (3. of my start-question) was the same, now i thing it is not a good idea. In my opinion there exist more critical points in a calling-convention to this.

Owen wrote:I'm allowing mutating any non vararg stack slots on the basis that

And in this situation

Code: Select all

void foo(int a, long b, char c, double d, short e); //here must be enough parameters for requiring the passing on stack

bar()
{
  foo(1);
}

is a problem. With this in mind, i think write access to the stack is only for the own (self allocated) stack-frame allowed. But i think this example is not conform to ANSI C.

Owen wrote:when someone does "&arg", it provides a convenient address to take.

In this case the callee must save this parameter into its own stack-frame and give the address of the new position (inside of stack-frame). And after the publication of its pointer (by calling an other function or storing into a global variable) the function must ever use the new "instance" of the parameter, because now are side-effects possible.

Owen wrote:Indeed, but it's a bug if a function accesses non-arguments (Either with the caller or callee)

Yes, that is right, but not in scope of the calling-convention.

Owen wrote:The designer of the foo API. printf expects to be called with an argument for every option; calling it otherwise results in undefined (and potentially crashing) behaviour

Yes, that is right too. But the software-designer is also not in scope of the calling-convention. I think the calling-convention must be as safe as possible even if the software-designer do a small error. Okay, the calling-convention must not catch all user errors.

Owen wrote:An example of when it's used is when you need to expand an API to cover cases that weren't covered before.

If i expand the API i must create a new function. Optional you can use predefined parameters

Code: Select all

int foo(int a, uint b); //old API
int foo(int a, uint b, char c = 'X');  //new API

in this case, you can not link old user-code with your new library (a simple new compiler run is duty), many thanks to name-mangling, and all is safe.

Owen wrote:In fact, IIRC, it's a part of the C standard that, if someone does the following:
Code: Select all
int printf();

void someFunction()
{
    printf("%s %d %f %d", "a string", 1, 1.0, 5);
}
it must work! This is for backwards compatibility with K&R C.

This is unfair. I think all compilers have the right to know that the programmer will do. I can not believe that this is valid ANSI C.

Greetings
Erik

Owen · Post by **Owen** » Mon Jan 18, 2010 2:52 pm

It's valid ISO C (Though not valid ISO C++)

C99 says that, when a function is defined "type x();", it must be treated as a legacy function reference, which means that the function's actual parameters are unknown. This is to support backwards compatibility with K&R C style libraries.

It's not fair, or clean, but C is neither anyway. It is the cost of supporting 25+ year old code.

ErikVikinger · Post by **ErikVikinger** » Tue Jan 19, 2010 1:22 am

Hello,

Owen wrote:C99 says that, when a function is defined "type x();", it must be treated as a legacy function reference, which means that the function's actual parameters are unknown.

One more cause for saying name-mangling is ever mandatory. It makes some assembler-thinks a little bit harder but over all it is a good idea.

Owen wrote:This is to support backwards compatibility with K&R C style libraries.

This is not a design-goal for me.
Exist such libraries this very day?

Greetings
Erik

Owen · Post by **Owen** » Tue Jan 19, 2010 10:04 am

ErikVikinger wrote:Hello,

Owen wrote:C99 says that, when a function is defined "type x();", it must be treated as a legacy function reference, which means that the function's actual parameters are unknown.
One more cause for saying name-mangling is ever mandatory. It makes some assembler-thinks a little bit harder but over all it is a good idea.

OK, but how do you mangle a name provided without parameters? You can't. You can't even mangle at the call site because there may be subtle mismatches (Int vs short vs char, etc) which will make it mangle incorrectly.

Does the C standard even permit name mangling?

ErikVikinger wrote:
Owen wrote:This is to support backwards compatibility with K&R C style libraries.
This is not a design-goal for me.
Exist such libraries this very day?

If you look around enough, yes. The standard code for MD5, for example, is in K&R form.

And do you really want to implement some non-standard variant of C?

Combuster · Post by **Combuster** » Tue Jan 19, 2010 11:09 am

Owen wrote:
ErikVikinger wrote:One more cause for saying name-mangling is ever mandatory. It makes some assembler-thinks a little bit harder but over all it is a good idea.
OK, but how do you mangle a name provided without parameters? You can't. You can't even mangle at the call site because there may be subtle mismatches (Int vs short vs char, etc) which will make it mangle incorrectly.

However, passing a variable of different types is not required to generate the same state prior to the call. In fact, on an 8-bit stack architecture, chars, shorts, and ints take different amounts of space on the stack, and thus will result in broken execution and at least unportable code.

So the idiot who gets a linker error over a K&R function IMO deserves that title. Even my OS wouldn't compile with such a construction (courtesy of the -Werror switch

)

_________________

I'm a bit late with explaining the register distribution argument of last week (must have forgotten), and the issue is pretty much moot after your reply but still:
No register reserved as caller saved. In the worst case, all registers must be preserved, either as callee-saved, or as an argument yet to be used. The problem comes from old Risc implementation with the capital R. Normally you'd see a construct like mov r1, sp; add r1, offset into stack; st (r1), r2; to back up a register onto the stack. The problem is, this approach requires a free register, which we don't have. If we resort to pushes and pops, you'll have to do something like push r2; add sp, size_of_stackframe - 4; it works, but it is a convolved special case for this one case, and thus not desirable, apart from the unavoidable pipeline stalls over sp. Your ISA however provides enough cisc-y traits that you can move things onto the stack even from tight spaces without extra cost. Having the later-needed arguments in memory and some breathing space already helps in superscalar execution of instructions without having an out-of-order execution engine.

If you have no space reserved for arguments, there is no way you can pass any (you'd already need one slot to send a pointer to a space where there is space for arguments). Usually, the sp register works as such.

If you have no callee-saved registers, where is your stack after a function call?

But that's a bit of extreme-corner-case theory. Feel free to throw it away.

OSDev.org

Calling-Conventions

Re: Calling-Conventions

Re: Calling-Conventions

Re: Calling-Conventions

Re: Calling-Conventions

Re: Calling-Conventions

Re: Calling-Conventions

Re: Calling-Conventions

Re: Calling-Conventions

Re: Calling-Conventions

Re: Calling-Conventions

Re: Calling-Conventions

Re: Calling-Conventions

Re: Calling-Conventions

Re: Calling-Conventions

Re: Calling-Conventions