Page 1 of 1

Interrupt Service Routine alignment

Posted: Tue May 02, 2017 12:00 pm
by IanSeyler
From this wiki page it notes the alignment is 4: http://wiki.osdev.org/Interrupt_Service_Routines

I would guess this is referring to 32-bit code. What should the alignment be in 64-bit mode? Is 8 correct? Currently I have everything aligned to 16.

Thanks,
Ian

Re: Interrupt Service Routine alignment

Posted: Tue May 02, 2017 12:17 pm
by Octocontrabass
Alignment is just a speed optimization. You don't need to align ISRs at all. (I suspect 4 bytes isn't enough to be useful on modern 32-bit x86 anyway. If you're going to align your ISRs, look at what your compiler does.)

Re: Interrupt Service Routine alignment

Posted: Tue May 02, 2017 12:22 pm
by IanSeyler
Good to know. Thanks! This is in Assembly so there is no compiler.

https://github.com/ReturnInfinity/BareM ... errupt.asm

Each one the exception handlers is 16-byte aligned so this will save me some bytes.

-Ian

Re: Interrupt Service Routine alignment

Posted: Wed May 03, 2017 6:25 am
by LtG
IanSeyler wrote:Each one the exception handlers is 16-byte aligned so this will save me some bytes.
How will save you some bytes?

As for aligning for performance, it depends on what you mean by ISR. If your ISR is just a stub that calls the actual ISR in some driver then the alignment doesn't matter that much and you might want to make the stubs tiny so they have greater chance of being cached.

If you are talking about the actual ISR's (hundred bytes+) then I might even align on either pages or at least cache lines. The memory wasted due to cache line alignment is minimal. Assuming 64B cache lines, assuming half wasted (32B) and assuming you use all 256 interrupts you're wasting 8KiB memory on the whole system, which shouldn't matter at all. Without alignment it's possible the CPU will jump to your ISR and it's at the very end of some cache line and it won't be able to process multiple instructions immediately and will have to wait for the next cache line to be read from memory.

Note, I don't know how the CPU handles interrupts internally and whether it fetches the single cache line where the IDT entry points to or multiple ones. Either do some testing or assume worst and align at least by cache line size.

On x86 the cache line is commonly (these days always?) 64B..

Google Agner Fog, great research into x86 optimization, not just the latencies/throughput tables but the C and ASM optimization guides. I find them very well written and quite easily understandable. They explain pretty much everything from basics to advanced.

Re: Interrupt Service Routine alignment

Posted: Wed May 03, 2017 7:33 am
by Brendan
Hi,
IanSeyler wrote:From this wiki page it notes the alignment is 4: http://wiki.osdev.org/Interrupt_Service_Routines

I would guess this is referring to 32-bit code. What should the alignment be in 64-bit mode? Is 8 correct? Currently I have everything aligned to 16.
In general (for both code and data) you either want to:
  • Align to a page boundary (4 KiB), or
  • Align to a cache line boundary (64 bytes), or
  • Align to a natural boundary (64 bytes for AVX512, 32 bytes for AVX2, 16 bytes for SSE, 8 bytes for 64-bit integers and double floating point, 4 bytes for 32-bit integers and "float", etc).
Aligning to a cache line boundary or page boundary hurts cache locality (more cache lines used means more cache misses) and is usually only done when there's a specific reason (e.g. because you need different page permissions, or to avoid "false sharing" problems within a cache line).

For instructions (which are variable length) there is no natural boundary - everything after the first instruction will be misaligned regardless of what you do, so the CPU has to be designed to handle "randomly misaligned instructions".

However; on top of all of this; for each CPU there are "specific and different" arcane rules that exist due to artefacts in the behaviours of various types of caches (that depend on things like cache associativity, "tag" formats, etc). This includes artefacts from the CPU's branch target buffer (and obscure rules like "don't put more than X conditional branches in the same Y bytes of code, for CPU vendor A family B model C"). The "4-byte code alignment theory" is a guideline that originated from historical branch target buffer artefacts; but it probably doesn't actually makes sense for a lot of CPUs, and probably doesn't make sense for ISRs (because ISRs aren't the target of a normal branch).


Cheers,

Brendan