Code: Select all
MOV EAX, 0xbddf40
MOV EBX, 0xe040
CVTSI2SD XMM0, EAX
CVTSI2SD XMM1, EBX
ADDSD XMM0, XMM1
CVTSD2SI EAX, XMM0
Code: Select all
MOV EAX, 0xbddf40
MOV EBX, 0xe040
CVTSI2SD XMM0, EAX
CVTSI2SD XMM1, EBX
ADDSD XMM0, XMM1
CVTSD2SI EAX, XMM0
Code: Select all
sqrt(2.0+5.0+449201938193.500039218);
Code: Select all
MOV RAX, 0x120c667a1255a42
CVTSI2SD XMM0, RAX
SQRTSD XMM0, XMM0
C library functions aren't identical to the assembly instructions. In most cases, the only difference is that the C library function sets errno when the parameter is outside the function's domain, so a lot of the optimizations you're looking for will appear if you add "-fno-math-errno" to your compiler flags. There are a whole set of floating-point math optimization flags for nonstandard behavior like this.Ethin wrote:On an unrelated note, since we're talking about assembly language, is there a reason that (from my experience) compilers are bad at optimizing certain mathematical operations down to their assembly instruction versions?
Is that supposed to be a floating-point constant? If it is, you've got the bytes backwards, and you should use MOVQ instead of CVTSI2SD because it's already a scalar double and doesn't require conversion from signed integer.Ethin wrote:Code: Select all
MOV RAX, 0x120c667a1255a42 CVTSI2SD XMM0, RAX
Any half-baked assembler will allow you to specify AVX instructions with two operands as a shorthand for the AVX equivalent of an SSE instruction. GNU AS even has a flag to directly translate SSE mnemonics into AVX instructions. I can't promise it would actually be faster, though - that depends on the rest of your program (and maybe your CPU microarchitecture).Ethin wrote:(VSQRTSD would probably be faster, but it neeeds 3 operands, not two.)
Well no, the fastest for this particular call would be to hardcode the result into the output. But it is entirely possible that the additions or any of the calculations inside of sqrt() would have observable side effects in the floating-point environment, and so the compiler cannot remove either of those.Ethin wrote:For example, if I write this code:The fastest optimization I could imagine would be the compiler transforming that into:Code: Select all
sqrt(2.0+5.0+449201938193.500039218);
Code: Select all
MOV RAX, 0x120c667a1255a42 CVTSI2SD XMM0, RAX SQRTSD XMM0, XMM0