The problem with memcpy is, that it does not return the value (it requires a memory address which can't always be optimized). Using a pointer cast will tell the compiler to return a value, no matter what compiler it is and what optimizations it can do. With memcpy you're relying entirely on the hope that maybe the compiler will handle that. I don't like relying on implemenation-specific compiler optimization features, I prefer to optimize my code by hand.
I mean while you can do this and it will always compile perfectly into a single MOV no matter the compiler and it's optimizer capabilities (okay, two MOVs if the compiler is dummy, get address and a dereference):
Code: Select all
printf("%d", *((int*)someaddress));
you cannot do the same with memcpy
Code: Select all
printf("%d", memcpy(?, someaddress, 4));
this means that
you must use a temporary memory variable. (Read: your code will unnecessarily copy the data twice).
And no gcc isn't smart enough to optimize this, here's a simple example you can try:
Code: Select all
#include <stdio.h>
#include <string.h>
static char buf[4096];
int main()
{
int i;
memcpy(&i, &buf[3], sizeof(int));
printf("%d\n", i);
return 0;
}
Compile and check the result:
Code: Select all
gcc test.c -o test
0000000000001149 <main>:
1149: 55 push %rbp
114a: 48 89 e5 mov %rsp,%rbp
114d: 48 83 ec 10 sub $0x10,%rsp
1151: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax
1158: 00 00
115a: 48 89 45 f8 mov %rax,-0x8(%rbp)
115e: 31 c0 xor %eax,%eax
1160: 8b 05 fd 2e 00 00 mov 0x2efd(%rip),%eax # 4063 <buf+0x3>
1166: 89 45 f4 mov %eax,-0xc(%rbp)
1169: 8b 45 f4 mov -0xc(%rbp),%eax
116c: 89 c6 mov %eax,%esi
116e: 48 8d 3d 8f 0e 00 00 lea 0xe8f(%rip),%rdi # 2004 <_IO_stdin_used+0x4>
1175: b8 00 00 00 00 mov $0x0,%eax
117a: e8 c1 fe ff ff call 1040 <printf@plt>
117f: b8 00 00 00 00 mov $0x0,%eax
1184: 48 8b 55 f8 mov -0x8(%rbp),%rdx
1188: 64 48 2b 14 25 28 00 sub %fs:0x28,%rdx
118f: 00 00
1191: 74 05 je 1198 <main+0x4f>
1193: e8 98 fe ff ff call 1030 <__stack_chk_fail@plt>
1198: c9 leave
1199: c3 ret
as you can see even though memcpy was replaced by MOV, the temporary variable remained, meaning four MOVs and an additional variable on stack instead of single MOV with a register
1160 moves from buf to eax,
1166 moves from eax to stack,
1169 and 116c moves from stack to printf argument.
That's more than 4 times overhead. It's even worse than what a dummy compiler would generate as it accesses the stack too. Twice.
You'll have to enable certain
compiler specific optimizations to make it go away, but there's absolutely no guarantee that a compiler can do this at all. Plus an stb-style header only library most certainly can't specify optimization flags in a portable way (unless it has an ifdef maze to detect compiler and use the appropriate pragma which might be or might be not supported by the actual compiler...)
So I'm not sold, I'll go with pointer casting as it doesn't require compiler specific features and it results in much better code (even on dummy compilers).
Best case:
Worst case:
Code: Select all
mov $buf + 3, %eax
mov (%eax), %esi
That's still much better than what gcc produced with memcpy (excuse I've used "buf + 3" to make the example readable, using rip-relative address instead would go exactly the same way).
Cheers,
bzt