AMD64 -> intel memory ordering : question on fences

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
JulienDarc
Member
Member
Posts: 97
Joined: Tue Mar 10, 2015 10:08 am

AMD64 -> intel memory ordering : question on fences

Post by JulienDarc »

Hello,

I know that the amd64 architecture is strong on memory ordering. But stores can happen after a load.
So it basically means that I may only need a compiler barrier (better with gcc at O2 or O3) and a sfence in my code, right ?

No need for lfence and mfence, since the load/load and store/store are always in the right order ?

Thanks
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: AMD64 -> intel memory ordering : question on fences

Post by Brendan »

Hi,
JulienDarc wrote:I know that the amd64 architecture is strong on memory ordering. But stores can happen after a load.
So it basically means that I may only need a compiler barrier (better with gcc at O2 or O3) and a sfence in my code, right ?

No need for lfence and mfence, since the load/load and store/store are always in the right order ?
What you need (or don't need) depends on what your code does.

In general, code written in C uses "C abstract machine ordering", which is mostly weak ordering (unless the variable is 'volatile'), and has nothing to do with the underlying architecture.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
JulienDarc
Member
Member
Posts: 97
Joined: Tue Mar 10, 2015 10:08 am

Re: AMD64 -> intel memory ordering : question on fences

Post by JulienDarc »

Thanks Brendan,

Yes that alright for the c part.
But as far as cpu fences/barriers are concerned, one should never use lfence/mfence on a amd64 arch, right ?

https://en.wikipedia.org/wiki/Memory_or ... rd_pdf_6-0

Because it is useless as it seems : load/load && store/store && load/store operations are strong ordered but not store/load.
So I guess on this arch, one should only use the sfence and forget about the other ones as they are implied already.

Am I correct?
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: AMD64 -> intel memory ordering : question on fences

Post by Brendan »

Hi,
JulienDarc wrote:Thanks Brendan,

Yes that alright for the c part.
But as far as cpu fences/barriers are concerned, one should never use lfence/mfence on a amd64 arch, right ?
No. You should always use lfence/mfence in cases where they're necessary.
JulienDarc wrote:Because it is useless as it seems : load/load && store/store && load/store operations are strong ordered but not store/load.
So I guess on this arch, one should only use the sfence and forget about the other ones as they are implied already.

Am I correct?
The default memory ordering on 80x86 is "write ordering with store forwarding". This is only the default. There are various instructions (e.g. non-temporal stores) that use weaker ordering, and there are caching types (e.g. write combining) that also weaken the ordering. In addition to that, the CPU does some (rare) things that ignore the memory ordering (e.g. setting the accessed and dirty flags in page table entries can happen "out of order").

Also; loads can happen in any order. This may create problems when there's multiple CPUs (or just one CPU and a device). For example, imagine if the first CPU does this:

Code: Select all

     mov dword [data],123456  ;Set the data for the second CPU
     mov dword [flag],1       ;Tell the second CPU it can continue
And the second CPU does this:

Code: Select all

.l1: cmp dword [flag],0         ;Can this CPU continue yet?
     je .l1                    ; no, keep waiting
     mov eax,[data]            ; yes, get the data
In this case the second CPU may read from "data" speculatively (before the first CPU has set "flag") and the value in EAX may not be the value stored by the first CPU (123456), and a fence (e.g. lfence) is needed to ensure the second load ("mov eax,[data]") doesn't happen before the earlier load says "flag" is non-zero.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
JulienDarc
Member
Member
Posts: 97
Joined: Tue Mar 10, 2015 10:08 am

Re: AMD64 -> intel memory ordering : question on fences

Post by JulienDarc »

Glad you are here Brendan,

I would have met some major bugs without your help.

Thanks for your clarifications,

julien
Post Reply