Proper way to write inline assembly

Octocontrabass · Post by **Octocontrabass** » Tue Aug 04, 2020 1:05 am

The memory clobber tells the compiler that the inline assembly could read or write values in memory for which it has the address. That means that, if the value in memory matches the value in the register, there is no reason to re-read it from memory. But it also means that future accesses must re-read the value in memory.

Inserting another memory clobber before the inline assembly acts as an appropriate barrier.

Code: Select all

inline void force_read(int* address)
{
    asm volatile ("": : : "memory" );
    asm volatile ("": :"r"(*address):"memory" );
}

However, there is another problem: you're using a local variable. Local variables do not have addresses until an address is taken, so the compiler can assume (if the local variable's address is never taken) that it may optimize away your attempts at putting its value into memory. Right now you're using scanf() to force it to be spilled to memory, but there's no guarantee that that will work properly in the future, since the compiler may in the future become aware that scanf() does not make the addresses of its arguments available in the global namespace indicated by the memory clobber. (There are ways to work around this as well, but at that point you're working against the optimizer, and it really doesn't make sense to use inline assembly unless the optimizer is helping you.)

sunnysideup · Post by **sunnysideup** » Tue Aug 04, 2020 10:31 pm

Octocontrabass wrote:That means that, if the value in memory matches the value in the register, there is no reason to re-read it from memory

Exactly.

Moreover, I don't like the idea of abusing memory clobbers, because it would cause a forced read/write of all the variables in scope. This was the reason that I'd asked if something like this would be better (I've modified a little to comply with @nullplan's concern):

Code: Select all

inline void force_read(int* address)
{
    int dummy;
    asm  volatile ("mov  (%[addr]) , %[dummy]":[dummy]"=r"(dummy) : [addr]"r"(address));
}

The above code snippet seems to work at a cursory glance... However, the semantics seem to be incorrect...

Consider this:

Code: Select all

#include <stdint.h>
#include <stdio.h>

inline void force_read(int* address)
{
    int dummy;
    asm  volatile ("mov  (%[addr]) , %[dummy]":[dummy]"=r"(dummy) : [addr]"r"(address));
}

int main()
{
    int target;
    scanf("%d",&target);
    target *= 12432; 
    target += 1231;
    target %= 12;

    force_read((int*)&target);
}

It compiles to:

Code: Select all

.LC0:
  .string "%d"
main:
  pushq %rbx
  movl $.LC0, %edi
  xorl %eax, %eax
  subq $16, %rsp
  leaq 12(%rsp), %rbx
  movq %rbx, %rsi
  call __isoc99_scanf
  mov (%rbx) , %ebx
  addq $16, %rsp
  xorl %eax, %eax
  popq %rbx
  ret

Have a look here: https://godbolt.org/z/W1bEcf

Clearly, the arithmatic that I do is missing, probably because the compiler assumes that all the calculations that I do (in the asm block in force_read) are with the address of target, but it assumes that I won't dereference it. I'm guessing that a memory clobber will solve the issue, but that seems like overkill. How do I solve this issue? I'm guessing that a really simple input constraint will do the trick too!

Octocontrabass · Post by **Octocontrabass** » Wed Aug 05, 2020 1:08 am

sunnysideup wrote:I'm guessing that a memory clobber will solve the issue, but that seems like overkill. How do I solve this issue? I'm guessing that a really simple input constraint will do the trick too!

Your guess is correct, and the GCC documentation explains it with examples.

Here is a fictitious sum of squares instruction, that takes two pointers to floating point values in memory and produces a floating point register output. Notice that x, and y both appear twice in the asm parameters, once to specify memory accessed, and once to specify a base register used by the asm. You won’t normally be wasting a register by doing this as GCC can use the same register for both purposes. However, it would be foolish to use both %1 and %3 for x in this asm and expect them to be the same. In fact, %3 may well not be a register. It might be a symbolic memory reference to the object pointed to by x.
Code: Select all
asm ("sumsq %0, %1, %2"
     : "+f" (result)
     : "r" (x), "r" (y), "m" (*x), "m" (*y));

But while reading about that, I came up with a different solution. (Though I'm still not convinced it'll be reliable with local variables.)

Code: Select all

inline void force_read(int* address)
{
    asm volatile("":"=m"(*address):"m"(*address));
    asm volatile(""::"r"(*address));
}

sunnysideup · Post by **sunnysideup** » Wed Aug 05, 2020 2:38 am

That looks interesting...

Octocontrabass wrote:However, it would be foolish to use both %1 and %3 for x in this asm and expect them to be the same. In fact, %3 may well not be a register. It might be a symbolic memory reference to the object pointed to by x

Btw, what does this mean.. symbolic memory reference? I don't follow... They say that %3 may not be a register. But doesn't the input constraint "m"(*x) imply that %3 will be a memory reference (I'm guessing that %3 will be replaced by something like (%rsp),12 wherever it is present in the assembly template). How can %3 be a register?

The reason that I don't want to use

Code: Select all

asm volatile (" ": :"r"(*address));

is because sometimes I want more flexibility while choosing the exact memory instruction. This would be really useful for stores where I can have a non-temporal (aka streaming) instruction, whereas the compiler would just have some default behaviour...

sunnysideup · Post by **sunnysideup** » Wed Aug 05, 2020 3:20 am

Code: Select all

inline void force_read(int* address)
{
    int dummy;
    asm  volatile ("mov  %[addr] , %[dummy]":[dummy]"=r"(dummy) : [addr]"m"(*address));
}

I think that the above snippet seems to be the right one for my use case.. (As far as my understanding goes). Since I use[addr]"m"(*address) as an input constraint, I don't even have to use brackets '()' for memory dereference! I would love to be proven wrong though!

In @octocontrabass's answer:

Octocontrabass wrote:But while reading about that, I came up with a different solution. (Though I'm still not convinced it'll be reliable with local variables.)

Code:
inline void force_read(int* address)
{
asm volatile("":"=m"(*address):"m"(*address));
asm volatile(""::"r"(*address));
}

I'm guessing that "=m"(*address) (the output constraint) is used to prevent compiler reordering since the value produced by the first asm block is used by the second asm block. (I also think that "+m"(*address) output constraint is equivalent... Am I wrong?)

Moreover, for local variables, you'd simply get input is not directly addressable compile-time error if you'd use the & operator I believe.

Octocontrabass · Post by **Octocontrabass** » Wed Aug 05, 2020 12:26 pm

sunnysideup wrote:Btw, what does this mean.. symbolic memory reference? I don't follow...

For example, a global variable can be referenced by a label, so the compiler may choose to emit that label instead of loading the address into a register.

sunnysideup wrote:The reason that I don't want to use
Code: Select all
asm volatile (" ": :"r"(*address));
is because sometimes I want more flexibility while choosing the exact memory instruction. This would be really useful for stores where I can have a non-temporal (aka streaming) instruction, whereas the compiler would just have some default behaviour...

There are intrinsics for non-temporal stores, which you probably want to use if you're not hand-optimizing for one specific CPU. If you need a specific temporal load or store, you're either working with MMIO or performing some kind of magic. For MMIO, if you need to worry about the instruction then you won't be using normal pointer access anyway, so no memory clobber is necessary. Magic is outside the scope of inline assembly, so there's no guarantee it will be able to do what you want.

sunnysideup wrote:I'm guessing that "=m"(*address) (the output constraint) is used to prevent compiler reordering since the value produced by the first asm block is used by the second asm block. (I also think that "+m"(*address) output constraint is equivalent... Am I wrong?)

With no output constraint, the value is not clobbered, so the compiler is not forced to re-read it from memory. I believe you're correct about the "+" modifier; it probably should use "+m" instead of "=m" (and wouldn't need any input operands since the output is the input).

Code: Select all

inline void force_read(int* address)
{
    asm volatile("":"+m"(*address));
    asm volatile(""::"r"(*address));
}

sunnysideup wrote:Moreover, for local variables, you'd simply get input is not directly addressable compile-time error if you'd use the & operator I believe.

I think that means you've tried to pass the address of a variable to your inline assembly without also passing the value of the variable in memory. No memory reference means the compiler doesn't need to spill it to memory, and if it doesn't get spilled it has no address.

OSDev.org

Proper way to write inline assembly

Re: Proper way to write inline assembly

Re: Proper way to write inline assembly

Re: Proper way to write inline assembly

Re: Proper way to write inline assembly

Re: Proper way to write inline assembly

Re: Proper way to write inline assembly