Alignment on x86_64
Posted: Mon Oct 10, 2022 10:51 am
The tests were done on : 3.60 GHz Intel Xeon (Sandy bridge where AVX was invented) E51620 0 - 1600 MHz DDR3 Memory - NUMA System.
What is the memory access granularity of an x86_64 CPU ?
My tests has shown that the granularity depends on the word size of the load/store instruction, and the page where memory was accessed has some effect.
I made a loop that tested Aligned, Unaligned write operations.
The two had actually almost the same performance with a small amount of milliseconds between them, Aligned wins.
Almost 450 ms for aligned and 470 ms for unaligned.
But lets say you want to load or store a DWORD from the address 0x1FFD where 3 bytes are in the page and 1 byte is from the other page.
The result was a massive difference compared to load/store inside a single page with 450 ms against 1900 ms.
Is this what you guys call page granularity ?
Or is this an effect of paging ?
Sometimes (at first) accessing address 0x1000 is so slow but accessing 0x1010 is so fast.
What is the memory access granularity of an x86_64 CPU ?
My tests has shown that the granularity depends on the word size of the load/store instruction, and the page where memory was accessed has some effect.
I made a loop that tested Aligned, Unaligned write operations.
The two had actually almost the same performance with a small amount of milliseconds between them, Aligned wins.
Almost 450 ms for aligned and 470 ms for unaligned.
But lets say you want to load or store a DWORD from the address 0x1FFD where 3 bytes are in the page and 1 byte is from the other page.
The result was a massive difference compared to load/store inside a single page with 450 ms against 1900 ms.
Is this what you guys call page granularity ?
Or is this an effect of paging ?
Sometimes (at first) accessing address 0x1000 is so slow but accessing 0x1010 is so fast.