Inserting another memory clobber before the inline assembly acts as an appropriate barrier.
Code: Select all
inline void force_read(int* address)
{
asm volatile ("": : : "memory" );
asm volatile ("": :"r"(*address):"memory" );
}
Code: Select all
inline void force_read(int* address)
{
asm volatile ("": : : "memory" );
asm volatile ("": :"r"(*address):"memory" );
}
Exactly.Octocontrabass wrote:That means that, if the value in memory matches the value in the register, there is no reason to re-read it from memory
Code: Select all
inline void force_read(int* address)
{
int dummy;
asm volatile ("mov (%[addr]) , %[dummy]":[dummy]"=r"(dummy) : [addr]"r"(address));
}
Code: Select all
#include <stdint.h>
#include <stdio.h>
inline void force_read(int* address)
{
int dummy;
asm volatile ("mov (%[addr]) , %[dummy]":[dummy]"=r"(dummy) : [addr]"r"(address));
}
int main()
{
int target;
scanf("%d",&target);
target *= 12432;
target += 1231;
target %= 12;
force_read((int*)&target);
}
Code: Select all
.LC0:
.string "%d"
main:
pushq %rbx
movl $.LC0, %edi
xorl %eax, %eax
subq $16, %rsp
leaq 12(%rsp), %rbx
movq %rbx, %rsi
call __isoc99_scanf
mov (%rbx) , %ebx
addq $16, %rsp
xorl %eax, %eax
popq %rbx
ret
Your guess is correct, and the GCC documentation explains it with examples.sunnysideup wrote:I'm guessing that a memory clobber will solve the issue, but that seems like overkill. How do I solve this issue? I'm guessing that a really simple input constraint will do the trick too!
But while reading about that, I came up with a different solution. (Though I'm still not convinced it'll be reliable with local variables.)Here is a fictitious sum of squares instruction, that takes two pointers to floating point values in memory and produces a floating point register output. Notice that x, and y both appear twice in the asm parameters, once to specify memory accessed, and once to specify a base register used by the asm. You won’t normally be wasting a register by doing this as GCC can use the same register for both purposes. However, it would be foolish to use both %1 and %3 for x in this asm and expect them to be the same. In fact, %3 may well not be a register. It might be a symbolic memory reference to the object pointed to by x.Code: Select all
asm ("sumsq %0, %1, %2" : "+f" (result) : "r" (x), "r" (y), "m" (*x), "m" (*y));
Code: Select all
inline void force_read(int* address)
{
asm volatile("":"=m"(*address):"m"(*address));
asm volatile(""::"r"(*address));
}
Btw, what does this mean.. symbolic memory reference? I don't follow... They say that %3 may not be a register. But doesn't the input constraint "m"(*x) imply that %3 will be a memory reference (I'm guessing that %3 will be replaced by something like (%rsp),12 wherever it is present in the assembly template). How can %3 be a register?Octocontrabass wrote:However, it would be foolish to use both %1 and %3 for x in this asm and expect them to be the same. In fact, %3 may well not be a register. It might be a symbolic memory reference to the object pointed to by x
Code: Select all
asm volatile (" ": :"r"(*address));
Code: Select all
inline void force_read(int* address)
{
int dummy;
asm volatile ("mov %[addr] , %[dummy]":[dummy]"=r"(dummy) : [addr]"m"(*address));
}
I'm guessing that "=m"(*address) (the output constraint) is used to prevent compiler reordering since the value produced by the first asm block is used by the second asm block. (I also think that "+m"(*address) output constraint is equivalent... Am I wrong?)Octocontrabass wrote:But while reading about that, I came up with a different solution. (Though I'm still not convinced it'll be reliable with local variables.)
Code:
inline void force_read(int* address)
{
asm volatile("":"=m"(*address):"m"(*address));
asm volatile(""::"r"(*address));
}
For example, a global variable can be referenced by a label, so the compiler may choose to emit that label instead of loading the address into a register.sunnysideup wrote:Btw, what does this mean.. symbolic memory reference? I don't follow...
There are intrinsics for non-temporal stores, which you probably want to use if you're not hand-optimizing for one specific CPU. If you need a specific temporal load or store, you're either working with MMIO or performing some kind of magic. For MMIO, if you need to worry about the instruction then you won't be using normal pointer access anyway, so no memory clobber is necessary. Magic is outside the scope of inline assembly, so there's no guarantee it will be able to do what you want.sunnysideup wrote:The reason that I don't want to useis because sometimes I want more flexibility while choosing the exact memory instruction. This would be really useful for stores where I can have a non-temporal (aka streaming) instruction, whereas the compiler would just have some default behaviour...Code: Select all
asm volatile (" ": :"r"(*address));
With no output constraint, the value is not clobbered, so the compiler is not forced to re-read it from memory. I believe you're correct about the "+" modifier; it probably should use "+m" instead of "=m" (and wouldn't need any input operands since the output is the input).sunnysideup wrote:I'm guessing that "=m"(*address) (the output constraint) is used to prevent compiler reordering since the value produced by the first asm block is used by the second asm block. (I also think that "+m"(*address) output constraint is equivalent... Am I wrong?)
Code: Select all
inline void force_read(int* address)
{
asm volatile("":"+m"(*address));
asm volatile(""::"r"(*address));
}
I think that means you've tried to pass the address of a variable to your inline assembly without also passing the value of the variable in memory. No memory reference means the compiler doesn't need to spill it to memory, and if it doesn't get spilled it has no address.sunnysideup wrote:Moreover, for local variables, you'd simply get input is not directly addressable compile-time error if you'd use the & operator I believe.