Calling C functions from MASM (x64)

PhantomR · Post by **PhantomR** » Wed Feb 13, 2019 6:22 am

I'm really confused as to how calling C functions from MASM works (Visual Studio) on x64...

I have a function I've written in C called 'printf'. If I do the following in MASM, my RSP is changed..

Code: Select all

move rcx, format_str
move rdx, argument
call printf

I read on the Microsoft website that the caller should allocate space for 4 qwords (shadow space for the 4 param registers). Well then, I assumed this meant I had to do this:

Code: Select all

move rcx, format_str
move rdx, argument
sub rsp, 32
call printf

However, this also modifies my RSP. So, I thought I maybe have to also clean up the stack after the call and this worked:

Code: Select all

move rcx, format_str
move rdx, argument
sub rsp, 32
call printf
add rsp, 32

My question would be: is this last code snippet the RIGHT way of calling a C function from assembly in x64??
EDIT: Also, if this is indeed the right way, how would it go for functions with more than 4 parameters? Would I have to allocate the shadow space before pushing the params or after?

fpissarra · Post by **fpissarra** » Thu Feb 14, 2019 7:12 am

PhantomR wrote:I read on the Microsoft website that the caller should allocate space for 4 qwords (shadow space for the 4 param registers).

I believe you are refeering to SEH (Structured Exception Handling). This type of alignment is done, usually, at the beginning of your code (main()), not at each libc/api function call.

Example of a simple program compiled with mingw64's gcc:

Code: Select all

// test.c
//
// Compile with
//  x86_64-w64-ming32-gcc -O2 -S -masm=intel test.c
#include <stdio.h>

void main( void )
{
  int i;

  for (i = 0; i < 10; i++)
    printf( "%d\n", i );
}

And we'll get something as this:

Code: Select all

main:
	push	rsi     ; with x86-64 ms-abi we need
	push	rbx    ; to preserve this regs!

	sub	rsp, 40          ; why 40?!

	lea	rsi, .LC0[rip]  ; printf's fmt
	xor	ebx, ebx        ; i=0
	call	__main
.L2:
	mov	edx, ebx
	mov	rcx, rsi
	add	ebx, 1
	call	printf         ; Note: No stack alignment!
	cmp	ebx, 10
	jne	.L2

	add	rsp, 40

	pop	rbx
	pop	rsi

	ret

alexfru · Post by **alexfru** » Thu Feb 14, 2019 8:21 am

fpissarra wrote:
PhantomR wrote:I read on the Microsoft website that the caller should allocate space for 4 qwords (shadow space for the 4 param registers).
I believe you are refeering to SEH (Structured Exception Handling).

It may have tangential relation to SEH, but the primary purpose is different.

It's cheaper to do things with registers than with memory (shorter instructions, less cache traffic, etc).
So, when you have plenty of registers, it's reasonable to dedicate some of them to parameter passing.
Hence you have 4 or 6 (depending on the particular x86-64 ABI) regs used for this.

All is fine until you need to call something like "int printf(const char* fmt, ...)".
So, fmt goes into one register, 3 to 5 more optional parameters are in other registers and the rest is on the stack.

Now, think how you'd implement the va_start() and va_arg() macros to get all those optional values.
Those macros must definitely be able to extract the values from the stack when there are many of them.
The values contained in the registers also need to be extractable.
And the simplest is just to spill those 3-5 regs into this shadow/home space and use the exact same code to access things on the stack.

fpissarra · Post by **fpissarra** » Thu Feb 14, 2019 11:33 am

It may have tangential relation to SEH, but the primary purpose is different.

You are probably right! But I also know that GCC and MSVC creates a VERY strange code!!! Take a look at this example using variadics:

Code: Select all

// test.c
// Compile with:
//   x86_64-w64-mingw32-gcc -O2 -S -masm=intel test.c
//
#include <stdio.h>
#include <stdarg.h>

__attribute__ ( ( noinline ) ) int f ( int x ) { return 2 * x; }

__attribute__ ( ( noinline ) ) int g ( int x, ... )
{
  int y;
  va_list ap;

  va_start ( ap, x );
  y = va_arg ( ap, int );
  va_end ( ap );

  return x * y;
}

void dosomething ( int x, int y )
{
  printf ( "%d\n", f ( x ) );
  printf ( "%d\n", g ( x, y ) );
}

Which generate strange code like this:

Code: Select all

; ECX = x
f:
	; OK! very straightforward!
	lea	eax, [rcx+rcx]
	ret

; ECX = x, EDX = y (NOT on stack, as you will see later!)
g:
	; Notice: at this point the structure of stack should be:
	;           <caller stkframe>
	; RSP-> retaddr

	sub	rsp, 24   ; reserve 3 QWORDS on stack (WHY?)

	; The stack now is (in qwords):
	;           <caller stkframe>
	;            retaddr
	;            ?
	;            ?
	; RSP->  ?

	mov	eax, edx  ; EAX=y

	mov	[rsp+40], rdx  ; saves y on stack,
	                              ; as if the arguments were stacked
	                              ; before the call (they aren't!).
	                              ; OBS: The hypothetical stacked x is [rsp+32] and
	                              ; never pushed!

	lea	rdx, [rsp+40]  ; RDX now POINTS TO hypothetical stacked y.

	imul	eax, ecx    ; EAX=y*x

	; RSP+48 and RSP+56 points to hypothetical 3rd and 4th arguments
	; (There are none!).
	mov	[rsp+48], r8 ; saves after the last argument?
	mov	[rsp+56], r9 ; again? even further?

	mov	[rsp+8], rdx ; saves ptr to y on stack on a local reserved space (WHY?).
	                           ; why 24 bytes were allocated if not used?

	; Notice that there four qwords, after retaddr, overwrites the caller
        ; stack frame...
  
	;            R9
	;            R8
	;            y
	;            ?
	;            retaddr
	;            ?
	;            RDX
	; RSP-> ?

	add	rsp, 24  ; reclaim reserved space.
	ret

dosomething:
	push	rsi
	push	rbx

	sub	rsp, 40  ; WHY?

	mov	ebx, ecx

	mov	esi, edx
	call	f

	lea	rcx, .LC0[rip]
	mov	edx, eax
	call	printf

	; See? A simple msabi calling convetion call, no stack used for the arguments.
	mov	edx, esi
	mov	ecx, ebx
	call	g

	lea	rcx, .LC0[rip]

	mov	edx, eax

	add	rsp, 40   ; WHY?

	pop	rbx
	pop	rsi
	jmp	printf

Notice that g() isn't using the stack (x and y are taken directly from ECX and EDX), writes y, R8 and R9 as if they were pushed on stack before the call and store the POINTER of y to a local var (reserved on stack) to discard right after. It NEVER writes x and never uses R8 and R9...

The dosomething() function is less strange, but still, reserves 40 bytes (5 QWORDS)... WHY? We're not using local vars or SEH here! This is ok if you consider the 4 qwords saved by g(). So, dosomething() is reserving this local space so g() don't overwrite dosomething()'s stack frame. Put why 40? Why not 32?

For me this is a very, very strange code.

nullplan · Post by **nullplan** » Thu Feb 14, 2019 2:23 pm

I think what's happening here is that va_list isn't actually all that magic. It is some sort of structure that contains all the registers containing parameters, and a pointer to the remaining stack arguments. So, since you declared a va_list as local variable, the compiler reserves space for it, and initializes it. In the Win64 ABI, there are 4 argument registers, and in your code, at most 3 of those can be filled, so they get spilled.

Somehow the optimizer doesn't see that those stores are dead. I presume that's because va_start() is a little bit too magical for the optimizer, or so. So the va_list is initialized entirely, even though only a single member of it is ever used. But the instruction scheduler somehow manages to push the multiplication instruction further up. Because why not?

Mind you, va_lists are usually meant for a variable amount of arguments. Here's a bit more complete of an example (compiled on Linux. Sorry, no Windows compiler available):

Code: Select all

#include <stddef.h>
#include <stdarg.h>

size_t sum(size_t n, ...)
{
    va_list ap;
    size_t r = 0;
    va_start(ap, n);
    while (n--)
        r += va_arg(ap, size_t);
    va_end(ap);
    return r;
}

Compiled with -Os:

Code: Select all

	.globl	sum
	.type	sum, @function
sum:
.LFB0:
	.cfi_startproc
	leaq	8(%rsp), %rax
	movq	%rsi, -40(%rsp)
	movq	%r9, -8(%rsp)
	movl	$8, -72(%rsp)
	movq	%rax, -64(%rsp)
	leaq	-48(%rsp), %rax
	movq	%rdx, -32(%rsp)
	leaq	8(%rsp), %rdx
	movq	%rcx, -24(%rsp)
	movl	$8, %ecx
	movq	%r8, -16(%rsp)
	movq	%rax, %r8
	movq	%rax, -56(%rsp)
	xorl	%eax, %eax
.L2:
	decq	%rdi
	cmpq	$-1, %rdi
	je	.L7
	leaq	8(%rdx), %rsi
	cmpl	$47, %ecx
	ja	.L4
	movl	%ecx, %r9d
	movq	%rdx, %rsi
	addl	$8, %ecx
	leaq	(%r8,%r9), %rdx
.L4:
	addq	(%rdx), %rax
	movq	%rsi, %rdx
	jmp	.L2
.L7:
	ret
	.cfi_endproc
.LFE0:
	.size	sum, .-sum

So you see, it spills all the argument registers in order. But only God knows why it does so in a random order.

OSDev.org

Calling C functions from MASM (x64)

Calling C functions from MASM (x64)

Re: Calling C functions from MASM (x64)

Re: Calling C functions from MASM (x64)

Re: Calling C functions from MASM (x64)

Re: Calling C functions from MASM (x64)