Hi,
gerryg400 wrote:Is a memory barrier required for this on x86 ? I thought that if core A wrote to a variable then core B would see the new value. Doesn't cache coherency guarantee that ?
I though that making the variable volatile to ensure the compiler does the right thing would be sufficient in this case. A fence would only be required for cache consistency with multiple variables.
There's 2 different issues. The first is that the "volatile" keyword in C doesn't do what people think it does. From that wikipedia article:
In C and C++, the volatile keyword was intended to allow C and C++ programs to directly access memory-mapped I/O. Memory-mapped I/O generally requires that the reads and writes specified in source code happen in the exact order specified with no omissions. Omissions or reorderings of reads and writes by the compiler would break the communication between the program and the device accessed by memory-mapped I/O. A C or C++ compiler may not reorder reads and writes to volatile memory locations, nor may it omit a read or write to a volatile memory location, allowing a pointer to volatile memory to be used for memory-mapped I/O.
The C and C++ standards do not address multiple threads (or multiple processors), and as such, the usefulness of volatile depends on the compiler and hardware. Although volatile guarantees that the volatile reads and volatile writes will happen in the exact order specified in the source code, the compiler may generate code (or the CPU may re-order execution) such that a volatile read or write is reordered with regard to non-volatile reads or writes, thus limiting its usefulness as an inter-thread flag or mutex. Preventing such is compiler specific, but some compilers, like gcc, will not reorder operations around in-line assembly code with volatile and "memory" tags, like in: asm volatile ("" : : : "memory"); (See more examples in compiler memory barrier). Moreover, it is not guaranteed that volatile reads and writes will be seen in the same order by other processors due to caching, cache coherence protocol and relaxed memory ordering, meaning volatile variables alone may not even work as inter-thread flags or mutexes.
The second issue is
when the write will be visible to other CPUs. 80x86 guarantees that writes will appear to occur in order, but does not guarantee when. For example, if you write to an address and then go into a loop or something (that doesn't do any writes); then that first write could (at least in theory) be postponed until after the loop terminates.
Also note that Intel introduced instructions specifically for memory barriers (MFENCE, LFENCE and SFENCE). For example, if you write to an address and then do an "MFENCE" instruction, then no reads (or writes) will occur until after the write has been made visible to other CPUs. This also doesn't guarantee "when", but does tend to starve the CPU of useful work that might cause the write to be postponed. There are also a group of (older) instructions that force stronger ordering (and behave a little like "MFENCE"): any IO port access, any instruction with a LOCK prefix, and any serialising instruction (e.g. CPUID, WBINVD, INVLPG, IRET, LGDT, etc).
Cheers,
Brendan