Alignment on x86_64

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
devc1
Member
Member
Posts: 439
Joined: Fri Feb 11, 2022 4:55 am
Location: behind the keyboard

Alignment on x86_64

Post by devc1 »

The tests were done on : 3.60 GHz Intel Xeon (Sandy bridge where AVX was invented) E51620 0 - 1600 MHz DDR3 Memory - NUMA System.

What is the memory access granularity of an x86_64 CPU ?

My tests has shown that the granularity depends on the word size of the load/store instruction, and the page where memory was accessed has some effect.

I made a loop that tested Aligned, Unaligned write operations.
The two had actually almost the same performance with a small amount of milliseconds between them, Aligned wins.
Almost 450 ms for aligned and 470 ms for unaligned.

But lets say you want to load or store a DWORD from the address 0x1FFD where 3 bytes are in the page and 1 byte is from the other page.

The result was a massive difference compared to load/store inside a single page with 450 ms against 1900 ms.

Is this what you guys call page granularity ?
Or is this an effect of paging ?

Sometimes (at first) accessing address 0x1000 is so slow but accessing 0x1010 is so fast.
nullplan
Member
Member
Posts: 1790
Joined: Wed Aug 30, 2017 8:24 am

Re: Alignment on x86_64

Post by nullplan »

devc1 wrote:What is the memory access granularity of an x86_64 CPU ?
Typically 64 bytes, at least with write-back caching enabled (L1 cache line size).
devc1 wrote:The result was a massive difference compared to load/store inside a single page with 450 ms against 1900 ms.
Yeah, the manuals warn about unaligned accesses crossing page boundaries. Inside of a cache line, the effects are barely measurable, across cache lines, there is some effect, across page boundaries, latency spikes.
devc1 wrote:Sometimes (at first) accessing address 0x1000 is so slow but accessing 0x1010 is so fast.
Same L1 cache line. So accessing 0x1000 has already created the TLB entry and the L1 cache line, and then the access to 0x1010 hits the same cache line.
Carpe diem!
devc1
Member
Member
Posts: 439
Joined: Fri Feb 11, 2022 4:55 am
Location: behind the keyboard

Re: Alignment on x86_64

Post by devc1 »

I meant that when I started the testing program, accessing aligned pages was slow. Then it magically worked.

I will take some measures from what I discovered to optimize my OS.


However, thanks.
nexos
Member
Member
Posts: 1081
Joined: Tue Feb 18, 2020 3:29 pm
Libera.chat IRC: nexos

Re: Alignment on x86_64

Post by nexos »

That's because of the TLB and the cache. When you first access a page it's metadata gets cached in the TLB, avoiding extra memory accesses to the page tables for successive accesses on that page. Also, at first, it must access to memory to read the pages data and then it sticks it in the cache for successive accesses.
"How did you do this?"
"It's very simple — you read the protocol and write the code." - Bill Joy
Projects: NexNix | libnex | nnpkg
Post Reply