I think what's happening here is that va_list isn't actually all that magic. It is some sort of structure that contains all the registers containing parameters, and a pointer to the remaining stack arguments. So, since you declared a va_list as local variable, the compiler reserves space for it, and initializes it. In the Win64 ABI, there are 4 argument registers, and in your code, at most 3 of those can be filled, so they get spilled.
Somehow the optimizer doesn't see that those stores are dead. I presume that's because va_start() is a little bit too magical for the optimizer, or so. So the va_list is initialized entirely, even though only a single member of it is ever used. But the instruction scheduler somehow manages to push the multiplication instruction further up. Because why not?
Mind you, va_lists are usually meant for a variable amount of arguments. Here's a bit more complete of an example (compiled on Linux. Sorry, no Windows compiler available):
Code: Select all
#include <stddef.h>
#include <stdarg.h>
size_t sum(size_t n, ...)
{
va_list ap;
size_t r = 0;
va_start(ap, n);
while (n--)
r += va_arg(ap, size_t);
va_end(ap);
return r;
}
Compiled with -Os:
Code: Select all
.globl sum
.type sum, @function
sum:
.LFB0:
.cfi_startproc
leaq 8(%rsp), %rax
movq %rsi, -40(%rsp)
movq %r9, -8(%rsp)
movl $8, -72(%rsp)
movq %rax, -64(%rsp)
leaq -48(%rsp), %rax
movq %rdx, -32(%rsp)
leaq 8(%rsp), %rdx
movq %rcx, -24(%rsp)
movl $8, %ecx
movq %r8, -16(%rsp)
movq %rax, %r8
movq %rax, -56(%rsp)
xorl %eax, %eax
.L2:
decq %rdi
cmpq $-1, %rdi
je .L7
leaq 8(%rdx), %rsi
cmpl $47, %ecx
ja .L4
movl %ecx, %r9d
movq %rdx, %rsi
addl $8, %ecx
leaq (%r8,%r9), %rdx
.L4:
addq (%rdx), %rax
movq %rsi, %rdx
jmp .L2
.L7:
ret
.cfi_endproc
.LFE0:
.size sum, .-sum
So you see, it spills all the argument registers in order. But only God knows why it does so in a random order.