code alignment

sancho1980 · Post by **sancho1980** » Sat Aug 04, 2007 3:53 am

hi
this is a general questionL i realised that after compiling and linking what is going to become my kernel, procedures arent automatically aligned..is this bad? i thought that *actually" everything should be aligned, but thinking about it, this would mean that every single instruction would also need alignment..so i suppose it's not bad to call procedures at unaligned addresses?

XCHG · Post by **XCHG** » Sat Aug 04, 2007 6:05 am

Some AMD (or maybe all) and Intel PM microprocessors fetch the code stream on 16 bytes boundaries. So assuming that your code is not aligned on a 16-byte boundary, the microprocessor will fetch the 16-byte block that contains your EIP and then would need to fetch other 16-bytes-long blocks in order to read the rest of your code.

Alignment of code can be beneficial for time-critical sub-routines where a label aligned on a certain boundary is called over and over again. However, I don't think aligning unimportant labels that are called a few times is important. In fact, you will lose the cache space and your code segment's data will grow in size unnecessarily.

It is a good programming practice to align subroutine entries on a 16-byte boundary but not every loop label. For example:

Code: Select all

  ALIGN 0x10 NOP
  __MyProcedure:
    MOV     ECX , 10
    ALIGN 0x10 NOP
    .LoopLabel:
      ; Process something here
      DEC     ECX
      JNZ     .LoopLabel
    RET

The [__MyProcedure] label that is the entry point of the procedure is aligned on a 16-byte boundary which is good but when you align the [.LoopLabel] local label, you will also force the processor to decode and execute any of the NOPs before it that make it aligned on a 16-byte boundary. Now this is not optimal.

sancho1980 · Post by **sancho1980** » Sun Aug 12, 2007 1:26 am

XCHG wrote:It is a good programming practice to align subroutine entries on a 16-byte boundary but not every loop label. For example:
Code: Select all
  ALIGN 0x10 NOP
  __MyProcedure:
    MOV     ECX , 10
    ALIGN 0x10 NOP
    .LoopLabel:
      ; Process something here
      DEC     ECX
      JNZ     .LoopLabel
    RET
The [__MyProcedure] label that is the entry point of the procedure is aligned on a 16-byte boundary which is good but when you align the [.LoopLabel] local label, you will also force the processor to decode and execute any of the NOPs before it that make it aligned on a 16-byte boundary. Now this is not optimal.

I take it that you were to say 16-bit boundary. Why 16 bit? Why not dword-alignment on a 32-bit machine?

Candy · Post by **Candy** » Sun Aug 12, 2007 1:35 am

sancho1980 wrote:
XCHG wrote:It is a good programming practice to align subroutine entries on a 16-byte boundary but not every loop label. For example:
Code: Select all
  ALIGN 0x10 NOP
  __MyProcedure:
    MOV     ECX , 10
    ALIGN 0x10 NOP
    .LoopLabel:
      ; Process something here
      DEC     ECX
      JNZ     .LoopLabel
    RET
The [__MyProcedure] label that is the entry point of the procedure is aligned on a 16-byte boundary which is good but when you align the [.LoopLabel] local label, you will also force the processor to decode and execute any of the NOPs before it that make it aligned on a 16-byte boundary. Now this is not optimal.
I take it that you were to say 16-bit boundary. Why 16 bit? Why not dword-alignment on a 32-bit machine?

I strongly suspect he meant 16 bytes. 16-bit alignment is pointless, but 32-bit is not much less pointless. You're most likely trying to make sure the function loads a full cacheline to start with so you align it to a multiple of a power of 2, preferably the cache line size. Cache lines have been made for x86 between 16 and 64 bytes, so 16 bytes is a useful amount. I prefer 64 bytes but I think I also align on 16 byte boundaries.