code alignment

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
sancho1980
Member
Member
Posts: 199
Joined: Fri Jul 13, 2007 6:37 am
Location: Stuttgart/Germany
Contact:

code alignment

Post by sancho1980 »

hi
this is a general questionL i realised that after compiling and linking what is going to become my kernel, procedures arent automatically aligned..is this bad? i thought that *actually" everything should be aligned, but thinking about it, this would mean that every single instruction would also need alignment..so i suppose it's not bad to call procedures at unaligned addresses?
User avatar
XCHG
Member
Member
Posts: 416
Joined: Sat Nov 25, 2006 3:55 am
Location: Wisconsin
Contact:

Post by XCHG »

Some AMD (or maybe all) and Intel PM microprocessors fetch the code stream on 16 bytes boundaries. So assuming that your code is not aligned on a 16-byte boundary, the microprocessor will fetch the 16-byte block that contains your EIP and then would need to fetch other 16-bytes-long blocks in order to read the rest of your code.

Alignment of code can be beneficial for time-critical sub-routines where a label aligned on a certain boundary is called over and over again. However, I don't think aligning unimportant labels that are called a few times is important. In fact, you will lose the cache space and your code segment's data will grow in size unnecessarily.

It is a good programming practice to align subroutine entries on a 16-byte boundary but not every loop label. For example:

Code: Select all

  ALIGN 0x10 NOP
  __MyProcedure:
    MOV     ECX , 10
    ALIGN 0x10 NOP
    .LoopLabel:
      ; Process something here
      DEC     ECX
      JNZ     .LoopLabel
    RET
The [__MyProcedure] label that is the entry point of the procedure is aligned on a 16-byte boundary which is good but when you align the [.LoopLabel] local label, you will also force the processor to decode and execute any of the NOPs before it that make it aligned on a 16-byte boundary. Now this is not optimal.
On the field with sword and shield amidst the din of dying of men's wails. War is waged and the battle will rage until only the righteous prevails.
sancho1980
Member
Member
Posts: 199
Joined: Fri Jul 13, 2007 6:37 am
Location: Stuttgart/Germany
Contact:

Post by sancho1980 »

XCHG wrote:It is a good programming practice to align subroutine entries on a 16-byte boundary but not every loop label. For example:

Code: Select all

  ALIGN 0x10 NOP
  __MyProcedure:
    MOV     ECX , 10
    ALIGN 0x10 NOP
    .LoopLabel:
      ; Process something here
      DEC     ECX
      JNZ     .LoopLabel
    RET
The [__MyProcedure] label that is the entry point of the procedure is aligned on a 16-byte boundary which is good but when you align the [.LoopLabel] local label, you will also force the processor to decode and execute any of the NOPs before it that make it aligned on a 16-byte boundary. Now this is not optimal.
I take it that you were to say 16-bit boundary. Why 16 bit? Why not dword-alignment on a 32-bit machine?
User avatar
Candy
Member
Member
Posts: 3882
Joined: Tue Oct 17, 2006 11:33 pm
Location: Eindhoven

Post by Candy »

sancho1980 wrote:
XCHG wrote:It is a good programming practice to align subroutine entries on a 16-byte boundary but not every loop label. For example:

Code: Select all

  ALIGN 0x10 NOP
  __MyProcedure:
    MOV     ECX , 10
    ALIGN 0x10 NOP
    .LoopLabel:
      ; Process something here
      DEC     ECX
      JNZ     .LoopLabel
    RET
The [__MyProcedure] label that is the entry point of the procedure is aligned on a 16-byte boundary which is good but when you align the [.LoopLabel] local label, you will also force the processor to decode and execute any of the NOPs before it that make it aligned on a 16-byte boundary. Now this is not optimal.
I take it that you were to say 16-bit boundary. Why 16 bit? Why not dword-alignment on a 32-bit machine?
I strongly suspect he meant 16 bytes. 16-bit alignment is pointless, but 32-bit is not much less pointless. You're most likely trying to make sure the function loads a full cacheline to start with so you align it to a multiple of a power of 2, preferably the cache line size. Cache lines have been made for x86 between 16 and 64 bytes, so 16 bytes is a useful amount. I prefer 64 bytes but I think I also align on 16 byte boundaries.
Post Reply