OSDev.org

Posted: **Wed Jun 08, 2005 4:27 am**

Hi,

DennisCGc wrote:I think you meant "Yes - if I assembled to 32 bit it'd be 3 bytes instead of 5."

No - I actually did mean 7 bytes for "lea (,%eax,2),%eax" when assembled to 32 bit by GAS, as this is exactly what GAS generated (as posted previously):

[tt]8d 04 45 00 00 00 00 lea 0x0(,%eax,2),%eax[/tt]

This is:
[tt] 0x8D = LEA
0x04 = ModR/M byte = [Mod = 0, nnn = 0, R/M = 4] = EAX is destination and SIB byte needed
0x45 = SIB byte = [Scaled Index = 0, Scale = 0, Index = 0] = EAX * 2 + disp32
0x00, 0x00, 0x00, 0x00 = 32 bit displacement[/tt]

When a SIB byte is present (which is needed for the scale *2) it's impossible to avoid a displacement (or to have an 8 bit displacement) without using a base register.

NASM is better at optimizing than GAS, but it still isn't the best. The shortest/fastest possible encoding for "lea (,%eax,2),%eax" would be" 0x01, 0xC0" but I don't think NASM developers want this level of optimization...

Cheers,

Brendan

Posted: **Wed Jun 08, 2005 5:11 am**

It could be argued that developers don't want optimization at the Assembler level, but rather prefer the Assembler to output exactly what was in the source.

Not me, just a thought. I tend to avoid Assembler like the plague.

Posted: **Wed Jun 08, 2005 5:42 am**

Brendan wrote:
No - I actually did mean 7 bytes for "lea (,%eax,2),%eax" when assembled to 32 bit by GAS, as this is exactly what GAS generated (as posted previously):

[tt]8d 04 45 00 00 00 00 lea 0x0(,%eax,2),%eax[/tt]

Oops, sorry, should've read better. Thought you meant this, because you quoted this:

Code: Select all

00000000  66678D0400        lea eax,[eax+eax]

Posted: **Wed Jun 08, 2005 8:03 am**

Hi,

Solar wrote: It could be argued that developers don't want optimization at the Assembler level, but rather prefer the Assembler to output exactly what was in the source.

If the assembler can find an instruction with identical operation that's smaller or faster (for all CPUs), then IMHO it should use it in general. The optimization I suggested above doesn't have identical operation, as "add eax, eax" will modify the flags (and therefore isn't strictly identical). If a programmer writes something like:

Code: Select all

   cmp ebx,3
   lea eax,[eax*2]
   je .ebx_was_three

Then my optimization would break the code. NASM (IMHO) does simple optimizations that won't break code or require a complex optimizer.

In my experience assembly programmers write 2 different types of code. The first type of code is highly optimized cycle counting that can only be done in assembly with painstaking detail. For this type of code no optimization should be permitted, and if the assembler thinks an instruction could be better it should generate a warning (that can be disabled).

The second type of code is the general "need to get it working" code that is often done with a compiler instead. For this type of code full optimization should be possible, but in almost all cases anything above "intelligent opcode selection" is too complex to implement within an assembler. For example, the assembler could look ahead to see if flags matter or not, and then use "add eax,eax" instead of "lea eax,[eax*2]" if the flags are irrelevant.

Cheers,

Brendan

Posted: **Wed Jun 08, 2005 9:00 am**

Brendan wrote: The second type of code is the general "need to get it working" code that is often done with a compiler instead. For this type of code full optimization should be possible, but in almost all cases anything above "intelligent opcode selection" is too complex to implement within an assembler.

Exactly - for the "need to get it working" part, you should use a compiler - which is in a vastly better position to make optimizations as it knows the higher-level-tree.

How in the world should the assembler know whether a flag being set or not is an intended side effect or an overlooked programming error? That's low-level-language programming for you...

But we've strayed from the topic quite a bit, eh?

Posted: **Wed Jun 08, 2005 7:19 pm**

Hi,

Solar wrote:Exactly - for the "need to get it working" part, you should use a compiler - which is in a vastly better position to make optimizations as it knows the higher-level-tree.

I really dislike compilers - I can't find one that never uses EBP for it's stack frame, that allows a function to return multiple values without ugly hacks, that doesn't need make or a linker, etc. There's a difference between an assembly programmer than knows C and a C programmer that knows assembly.

Besides this, most of the time my "need to get it working" code beats the compiler's optimizer, and I know it's easy to go back later and deliberately optimize the code.

To be honest, I doubt I'll ever like any compiler until I write my own compiler for my own language. I've designed most of it, but it needs an interpretter built into the compiler and I want a good OS to develop it on before I start something so complex. This is actually my grand plan: write an OS with the correct design and API/s that works enough, develop the language, convert most of the OS to the language, then recompile for different architectures.

Solar wrote:How in the world should the assembler know whether a flag being set or not is an intended side effect or an overlooked programming error? That's low-level-language programming for you...

The assembler could figure out (in most cases) whether an optimization that messes with the flags is going to cause a problem or not. All it needs to do is examine the next instruction to see if it either relies on the flags, sets the flags or is neutral. If it's a neutral instruction the assembler could check the instruction after that. Sooner or later the assembler might find out if the flags need to be preserved or can be changed. In some cases the assembler might get to an instruction where it's impossible to tell (for e.g. a "ret") or it's too complex to check further (e.g. "jmp [eax*4]"). In this case the assembler would do the "safe" thing.

Even if the assembler only checked one instruction it'd be conclusive half the time. In any case it's not an easy optimization for the assembler to make - not as easy as swapping instructions with identical operation.

Cheers,

Brendan

Posted: **Thu Jun 09, 2005 2:21 am**

Some optimization on the assembler side is good. Like how fasm optimizes jumps so you don't have to add 'short' in front of the offset.

OSDev.org

ASM syntax standard

Re:ASM syntax standard

Re:ASM syntax standard

Re:ASM syntax standard

Re:ASM syntax standard

Re:ASM syntax standard

Re:ASM syntax standard

Re:ASM syntax standard