SIMD programming
Posted: Thu Nov 13, 2008 9:02 pm
Hey guys, I'm trying to use vectorization and the XMM processors to speed up my program, whose job is to multiply floating point numbers.
I understand how I can load bits 0-63 into bits 64-127 of my register. My problem is, how can I load two unique floats into bits 0-63? Can I load one float into 0-31, then bit shift it, then load another? If so, how would I do this?
Here is what I have so far:
I understand how I can load bits 0-63 into bits 64-127 of my register. My problem is, how can I load two unique floats into bits 0-63? Can I load one float into 0-31, then bit shift it, then load another? If so, how would I do this?
Here is what I have so far:
Code: Select all
movaps (%r13), %xmm1 # load A in xmm1
shl $32, %xmm1 # shift A[0:31] to A[32:63]
movaps (%r13), %xmm1
movlhps %xmm1, %xmm1 # shift A[0:63] to A[64:127]