mulitiplcation to shift interally?

Programming, for all ages and all languages.
Post Reply
blackoil
Member
Member
Posts: 146
Joined: Mon Feb 12, 2007 4:45 am

mulitiplcation to shift interally?

Post by blackoil »

Hi,

for modern CPU, will mulitiplcation like "IMUL EAX,2" be executed as "SHL EAX,1" internally?

thanks!
User avatar
rootnode
Member
Member
Posts: 42
Joined: Fri Feb 29, 2008 11:21 am
Location: Aachen, Germany
Contact:

Re: mulitiplcation to shift interally?

Post by rootnode »

No, although this would be an optimization.
The CPU blidnly executes the opcodes you pass to it.

I don't have the specs at hand, but it is possible that IMUL affects some status- or flagregisters while SHL doesn't.
So this optimizations aren't used. This optimizations are handled by the compiler.

So, if you program in assembler you have to optimize it yourself.
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: mulitiplcation to shift interally?

Post by Combuster »

No. Multiplications have however become really fast and tend to execute at similar speeds as shifts (on some chips, shifts are even worse in some cases)
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
User avatar
JamesM
Member
Member
Posts: 2935
Joined: Tue Jul 10, 2007 5:27 am
Location: York, United Kingdom
Contact:

Re: mulitiplcation to shift interally?

Post by JamesM »

Hi,
blackoil wrote:Hi,

for modern CPU, will mulitiplcation like "IMUL EAX,2" be executed as "SHL EAX,1" internally?

thanks!
It depends what level you look at. One the lowest level, all multiplications are implemented as shifts and adds. So yes, multiplying EAX by 2 would result in EAX being shifted left one bit, but this would occur inside the multiplier unit - the instruction won't be optimised into the ALU (as a shift insn) by the processor.

It's best to leave that sort of thing to the compiler - if as Combuster says, for some arch shifts are more costly than multiplications, the compiler will know that and work around it.
User avatar
Love4Boobies
Member
Member
Posts: 2111
Joined: Fri Mar 07, 2008 5:36 pm
Location: Bucharest, Romania

Re: mulitiplcation to shift interally?

Post by Love4Boobies »

I read somewhere, although I can't actually remember where, that (at least) Intel optimizes MUL instructions when you try to multiply with a number that is a power of two. It didn't say exactly in which way but I doubt that it's not shifting - since that's the fastest way.
"Computers in the future may weigh no more than 1.5 tons.", Popular Mechanics (1949)
[ Project UDI ]
blackoil
Member
Member
Posts: 146
Joined: Mon Feb 12, 2007 4:45 am

Re: mulitiplcation to shift interally?

Post by blackoil »

to muliply a var with number 2, I can use

imul eax,[var],2

mov eax,[var]
shl eax,1

it's a bit difficult to determine to which one is faster.

And I saw Visual C++ 2008 express uses imul instruction for array indexing.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: mulitiplcation to shift interally?

Post by Brendan »

Hi,
blackoil wrote:to muliply a var with number 2, I can use

imul eax,[var],2

mov eax,[var]
shl eax,1

it's a bit difficult to determine to which one is faster.
In this case, the fastest way is probably "mov eax,[var]; add [var],eax", especially if there's other code you can place in between these instructions (so the CPU has something to do while waiting for the fetch from cache/RAM).

On some CPUs (e.g. early Pentium 4/Netburst) using several ADD instructions can be faster than using one instruction - e.g. "add eax,eax; add eax,eax" can be faster than "shl eax,2" or "lea eax,[eax*4]" or "imul eax,4".


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Owen
Member
Member
Posts: 1700
Joined: Fri Jun 13, 2008 3:21 pm
Location: Cambridge, United Kingdom
Contact:

Re: mulitiplcation to shift interally?

Post by Owen »

Brendan wrote:Hi,

In this case, the fastest way is probably "mov eax,[var]; add [var],eax", especially if there's other code you can place in between these instructions (so the CPU has something to do while waiting for the fetch from cache/RAM).

On some CPUs (e.g. early Pentium 4/Netburst) using several ADD instructions can be faster than using one instruction - e.g. "add eax,eax; add eax,eax" can be faster than "shl eax,2" or "lea eax,[eax*4]" or "imul eax,4".

Cheers,

Brendan
Aah, the wonders of CPUs without barrel shifters. I doubt the two instructions together would be faster if they immediately followed each other, however - in fact I imagine they would be much slower considering the ridiculous length of NetBurst's pipeline.

Edit: I've just realised the Irony: NetBurst does the exact oposite of what this thread was about :P
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: mulitiplcation to shift interally?

Post by Combuster »

From what I gathered, optimising for the Pentium 4 generally has negative effects on all other processors. I had a run at the cycle sheets to get some concrete details on mentioned case. Might be interesting:

Consider the shift/lea/mul/add-add methods of computing reg * 4

On a 486 / Pentium 1 / Athlon, a shift takes one cycle. On netburst, it takes 4.
On a 486 / Pentium 1, a LEA takes one cycle, on an Athlon, a "complex" LEA takes two cycles. The intel document does not give exact timings, but suggests that equivalent adds are faster given enough decoder space (which implies that it would be > 2 cycles)
An imul has a 4 cycle latency on an Athlon (and post-netburst), 10 on netburst, and something similarly awful for old processors. (why again was everybody buying AthlonXPs in that time? :wink:)
A sequence of adds to do a multiplication by four would take 2 clocks on all processors except netburst, which does that in one cycle (two half-clock operations).

Summarized:
P1/486: depending on situation, use lea (take care of AGIs) or shifts (take care of the u-v schedule)
Athlon: always use shifts
Netburst: use adds and expect execution times for all other platforms to double.

The other conclusion: there is no conversion.
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
User avatar
Troy Martin
Member
Member
Posts: 1686
Joined: Fri Apr 18, 2008 4:40 pm
Location: Langley, Vancouver, BC, Canada
Contact:

Re: mulitiplcation to shift interally?

Post by Troy Martin »

Combuster wrote:P1/486: depending on situation, use lea (take care of AGIs) or shifts (take care of the u-v schedule)
Athlon: always use shifts
Netburst: use adds and expect execution times for all other platforms to double.
The nth Law of Optimization: if it's efficient somewhere, it's slow as molasses everywhere else.
Image
Image
Solar wrote:It keeps stunning me how friendly we - as a community - are towards people who start programming "their first OS" who don't even have a solid understanding of pointers, their compiler, or how a OS is structured.
I wish I could add more tex
Post Reply