Page 1 of 1
AMD64 -> intel memory ordering : question on fences
Posted: Tue May 12, 2015 1:53 am
by JulienDarc
Hello,
I know that the amd64 architecture is strong on memory ordering. But stores can happen after a load.
So it basically means that I may only need a compiler barrier (better with gcc at O2 or O3) and a sfence in my code, right ?
No need for lfence and mfence, since the load/load and store/store are always in the right order ?
Thanks
Re: AMD64 -> intel memory ordering : question on fences
Posted: Tue May 12, 2015 3:12 am
by Brendan
Hi,
JulienDarc wrote:I know that the amd64 architecture is strong on memory ordering. But stores can happen after a load.
So it basically means that I may only need a compiler barrier (better with gcc at O2 or O3) and a sfence in my code, right ?
No need for lfence and mfence, since the load/load and store/store are always in the right order ?
What you need (or don't need) depends on what your code does.
In general, code written in C uses "C abstract machine ordering", which is mostly weak ordering (unless the variable is 'volatile'), and has nothing to do with the underlying architecture.
Cheers,
Brendan
Re: AMD64 -> intel memory ordering : question on fences
Posted: Tue May 12, 2015 5:14 am
by JulienDarc
Thanks Brendan,
Yes that alright for the c part.
But as far as cpu fences/barriers are concerned, one should never use lfence/mfence on a amd64 arch, right ?
https://en.wikipedia.org/wiki/Memory_or ... rd_pdf_6-0
Because it is useless as it seems : load/load && store/store && load/store operations are strong ordered but not store/load.
So I guess on this arch, one should only use the sfence and forget about the other ones as they are implied already.
Am I correct?
Re: AMD64 -> intel memory ordering : question on fences
Posted: Tue May 12, 2015 7:14 am
by Brendan
Hi,
JulienDarc wrote:Thanks Brendan,
Yes that alright for the c part.
But as far as cpu fences/barriers are concerned, one should never use lfence/mfence on a amd64 arch, right ?
No. You should always use lfence/mfence in cases where they're necessary.
JulienDarc wrote:Because it is useless as it seems : load/load && store/store && load/store operations are strong ordered but not store/load.
So I guess on this arch, one should only use the sfence and forget about the other ones as they are implied already.
Am I correct?
The default memory ordering on 80x86 is "write ordering with store forwarding". This is only the default. There are various instructions (e.g. non-temporal stores) that use weaker ordering, and there are caching types (e.g. write combining) that also weaken the ordering. In addition to that, the CPU does some (rare) things that ignore the memory ordering (e.g. setting the accessed and dirty flags in page table entries can happen "out of order").
Also; loads can happen in any order. This may create problems when there's multiple CPUs (or just one CPU and a device). For example, imagine if the first CPU does this:
Code: Select all
mov dword [data],123456 ;Set the data for the second CPU
mov dword [flag],1 ;Tell the second CPU it can continue
And the second CPU does this:
Code: Select all
.l1: cmp dword [flag],0 ;Can this CPU continue yet?
je .l1 ; no, keep waiting
mov eax,[data] ; yes, get the data
In this case the second CPU may read from "data" speculatively (before the first CPU has set "flag") and the value in EAX may not be the value stored by the first CPU (123456), and a fence (e.g. lfence) is needed to ensure the second load ("mov eax,[data]") doesn't happen before the earlier load says "flag" is non-zero.
Cheers,
Brendan
Re: AMD64 -> intel memory ordering : question on fences
Posted: Tue May 12, 2015 7:47 am
by JulienDarc
Glad you are here Brendan,
I would have met some major bugs without your help.
Thanks for your clarifications,
julien