Page 1 of 1
mulitiplcation to shift interally?
Posted: Sun Jun 28, 2009 4:07 am
by blackoil
Hi,
for modern CPU, will mulitiplcation like "IMUL EAX,2" be executed as "SHL EAX,1" internally?
thanks!
Re: mulitiplcation to shift interally?
Posted: Sun Jun 28, 2009 4:12 am
by rootnode
No, although this would be an optimization.
The CPU blidnly executes the opcodes you pass to it.
I don't have the specs at hand, but it is possible that IMUL affects some status- or flagregisters while SHL doesn't.
So this optimizations aren't used. This optimizations are handled by the compiler.
So, if you program in assembler you have to optimize it yourself.
Re: mulitiplcation to shift interally?
Posted: Sun Jun 28, 2009 5:49 am
by Combuster
No. Multiplications have however become really fast and tend to execute at similar speeds as shifts (on some chips, shifts are even worse in some cases)
Re: mulitiplcation to shift interally?
Posted: Sun Jun 28, 2009 6:41 am
by JamesM
Hi,
blackoil wrote:Hi,
for modern CPU, will mulitiplcation like "IMUL EAX,2" be executed as "SHL EAX,1" internally?
thanks!
It depends what level you look at. One the lowest level, all multiplications are implemented as shifts and adds. So yes, multiplying EAX by 2 would result in EAX being shifted left one bit, but this would occur inside the multiplier unit - the instruction won't be optimised into the ALU (as a shift insn) by the processor.
It's best to leave that sort of thing to the compiler - if as Combuster says, for some arch shifts are more costly than multiplications, the compiler will know that and work around it.
Re: mulitiplcation to shift interally?
Posted: Sun Jun 28, 2009 10:34 am
by Love4Boobies
I read somewhere, although I can't actually remember where, that (at least) Intel optimizes MUL instructions when you try to multiply with a number that is a power of two. It didn't say exactly in which way but I doubt that it's not shifting - since that's the fastest way.
Re: mulitiplcation to shift interally?
Posted: Sun Jun 28, 2009 7:03 pm
by blackoil
to muliply a var with number 2, I can use
imul eax,[var],2
mov eax,[var]
shl eax,1
it's a bit difficult to determine to which one is faster.
And I saw Visual C++ 2008 express uses imul instruction for array indexing.
Re: mulitiplcation to shift interally?
Posted: Sun Jun 28, 2009 9:10 pm
by Brendan
Hi,
blackoil wrote:to muliply a var with number 2, I can use
imul eax,[var],2
mov eax,[var]
shl eax,1
it's a bit difficult to determine to which one is faster.
In this case, the fastest way is probably "mov eax,[var]; add [var],eax", especially if there's other code you can place in between these instructions (so the CPU has something to do while waiting for the fetch from cache/RAM).
On some CPUs (e.g. early Pentium 4/Netburst) using several ADD instructions can be faster than using one instruction - e.g. "add eax,eax; add eax,eax" can be faster than "shl eax,2" or "lea eax,[eax*4]" or "imul eax,4".
Cheers,
Brendan
Re: mulitiplcation to shift interally?
Posted: Wed Jul 01, 2009 2:42 pm
by Owen
Brendan wrote:Hi,
In this case, the fastest way is probably "mov eax,[var]; add [var],eax", especially if there's other code you can place in between these instructions (so the CPU has something to do while waiting for the fetch from cache/RAM).
On some CPUs (e.g. early Pentium 4/Netburst) using several ADD instructions can be faster than using one instruction - e.g. "add eax,eax; add eax,eax" can be faster than "shl eax,2" or "lea eax,[eax*4]" or "imul eax,4".
Cheers,
Brendan
Aah, the wonders of CPUs without barrel shifters. I doubt the two instructions together would be faster if they immediately followed each other, however - in fact I imagine they would be much slower considering the ridiculous length of NetBurst's pipeline.
Edit: I've just realised the Irony: NetBurst does the exact oposite of what this thread was about
Re: mulitiplcation to shift interally?
Posted: Wed Jul 01, 2009 3:41 pm
by Combuster
From what I gathered, optimising for the Pentium 4 generally has negative effects on all other processors. I had a run at the cycle sheets to get some concrete details on mentioned case. Might be interesting:
Consider the shift/lea/mul/add-add methods of computing reg * 4
On a 486 / Pentium 1 / Athlon, a shift takes one cycle. On netburst, it takes 4.
On a 486 / Pentium 1, a LEA takes one cycle, on an Athlon, a "complex" LEA takes two cycles. The intel document does not give exact timings, but suggests that equivalent adds are faster given enough decoder space (which implies that it would be > 2 cycles)
An imul has a 4 cycle latency on an Athlon (and post-netburst), 10 on netburst, and something similarly awful for old processors. (why again was everybody buying AthlonXPs in that time?
)
A sequence of adds to do a multiplication by four would take 2 clocks on all processors except netburst, which does that in one cycle (two half-clock operations).
Summarized:
P1/486: depending on situation, use lea (take care of AGIs) or shifts (take care of the u-v schedule)
Athlon: always use shifts
Netburst: use adds and expect execution times for all other platforms to
double.
The other conclusion: there is no conversion.
Re: mulitiplcation to shift interally?
Posted: Wed Jul 01, 2009 5:57 pm
by Troy Martin
Combuster wrote:P1/486: depending on situation, use lea (take care of AGIs) or shifts (take care of the u-v schedule)
Athlon: always use shifts
Netburst: use adds and expect execution times for all other platforms to double.
The nth Law of Optimization: if it's efficient somewhere, it's slow as molasses everywhere else.