Page 1 of 2
Proper way to write inline assembly
Posted: Mon Jun 01, 2020 7:58 am
by sunnysideup
Checkout this code segment:
Code: Select all
void put_zmm_value()
{
struct zmm_value buffer;
asm volatile(
"vmovdqa64 %%zmm0, (%[buffer]) \n": :[buffer]"r"(buffer):"%zmm0");
for(int i=0;i<8;i++)
printf("%lx ",buffer.word[i]);
}
I'm wondering whether my inline assembly code is correct, and will work for all optimizations.
Re: Proper way to write inline assembly
Posted: Mon Jun 01, 2020 8:30 am
by Octocontrabass
Probably not, since it doesn't appear to do anything useful. What is it supposed to do?
Re: Proper way to write inline assembly
Posted: Mon Jun 01, 2020 8:40 am
by sunnysideup
Correction:
Code: Select all
void put_zmm_value()
{
struct zmm_value buffer;
asm volatile(
"vmovdqa64 %%zmm0, %[buffer] \n": :[buffer]"m"(buffer):"%zmm0");
for(int i=0;i<8;i++)
printf("%lx ",buffer.word[i]);
}
The previous snippet doesn't even compile.
The purpose of this code is to print out the value of the zmm0 register. Nothing else!
Note: buffer is cache line aligned... No issues there... I won't get segfault
Re: Proper way to write inline assembly
Posted: Mon Jun 01, 2020 9:04 am
by Octocontrabass
sunnysideup wrote:The purpose of this code is to print out the value of the zmm0 register.
Why? According to the ABI, there is nothing useful in zmm0 at this point in time.
Re: Proper way to write inline assembly
Posted: Mon Jun 01, 2020 9:31 am
by sunnysideup
I'm want to get familiar with zmm0 as is important for implementing fast memcpy and so on. I also manually 'fill' zmm0 using:
Code: Select all
struct zmm_value
{
uint64_t word[8];
} __attribute__((packed)) __attribute__ ((aligned(64)));
void set_zmm_value(struct zmm_value* val_address)
{
asm volatile
("vmovntdqa (%[val_address]),%%zmm0\n":: [val_address]"r"(val_address):"%zmm0");
}
Re: Proper way to write inline assembly
Posted: Mon Jun 01, 2020 10:32 am
by Octocontrabass
That won't work. The compiler is free to do whatever it wants with zmm0 outside your asm block.
If you want to move data through zmm registers, you must either load and store within the same asm block, or you must tell the compiler to do the load/store on your behalf by passing around __m512/__m512d/__m512i values.
Re: Proper way to write inline assembly
Posted: Mon Jun 01, 2020 10:57 am
by sunnysideup
Makes sense
Re: Proper way to write inline assembly
Posted: Sun Jun 28, 2020 6:21 am
by sunnysideup
Alright, moving on to something new here: I've often seen this:
in C code that is compiled using gcc. What is its significance and what is the concept here? Is this
extended asm? I can't wrap my head around inline assembly in gcc. Any good resources?
Also, what's the difference between __asm and just asm?
Re: Proper way to write inline assembly
Posted: Sun Jun 28, 2020 9:41 am
by Korona
That's a memory barrier for the compiler.¹ Yes, it is extended asm. The memory clobber forces all loads/stores to globally visible variables that occur before/after the barrier in program order to happen before/after the barrier.
__asm can be used in contexts where asm is not available (e.g., because somebody chose to #define asm) but other than that, there is no difference.
¹ But not for the CPU! On some architectures (e.g., ARM), that makes a vast difference.
Re: Proper way to write inline assembly
Posted: Sun Jun 28, 2020 11:35 am
by sunnysideup
Korona wrote:That's a memory barrier for the compiler.¹ Yes, it is extended asm. The memory clobber forces all loads/stores to globally visible variables that occur before/after the barrier in program order to happen before/after the barrier.
Alright, I understand that why it's used for now - as a way for the
compiler to ensure that no compile time reordering occurs across this 'barrier'
However, why does it work this way? Or as a mathematician would say -
can you derive it from first principles?
I've also read this piece of code:
Code: Select all
static void force_read(uint8_t *p) {
asm volatile("" : : "r"(*p) : "memory");
}
It's supposed to force a read from memory location p. But
how does it work, i.e. why does gcc make it work that way?
Re: Proper way to write inline assembly
Posted: Sun Jun 28, 2020 2:06 pm
by nullplan
sunnysideup wrote:However, why does it work this way?
It's an assembler statement with a memory clobber. The fact that it's empty is incidental to this. The memory clobber tells GCC that this statement will change "memory", but not which memory and in what way it will be changed. Therefore, GCC cannot assume anything about the state of memory, and must write all changes to memory before the statement, and read all things that are in memory again after the statement.
sunnysideup wrote:I've also read this piece of code:
Code: Select all
static void force_read(uint8_t *p) {
asm volatile("" : : "r"(*p) : "memory");
}
It's supposed to force a read from memory location p. But how does it work, i.e. why does gcc make it work that way?
This time it is a memory clobber and an input constraint. So in addition to the above, this statement requires that the value of "*p" be put iinto a register beforehand. The statement is empty and doesn't do anything with the value, but GCC doesn't know that, and is therefore forced to emit a read of this memory location. And since memory is clobbered, even multiple reads of this location have to be read, since they might have changed now.
Re: Proper way to write inline assembly
Posted: Sun Jun 28, 2020 2:47 pm
by Octocontrabass
sunnysideup wrote:However, why does it work this way? Or as a mathematician would say -
can you derive it from first principles?
Because the GCC developers say so.
Using the "memory" clobber effectively forms a read/write memory barrier for the compiler.
Here's the part of the manual that explains it.
nullplan wrote:And since memory is clobbered, even multiple reads of this location have to be read, since they might have changed now.
But this holds true only if you use functions like this one with memory barriers to access that location. If you also access it without a memory barrier and the function gets inlined, the read may be combined with prior accesses. There is also no guarantee that the read will occur after all prior statements if the function is inlined.
Re: Proper way to write inline assembly
Posted: Mon Aug 03, 2020 8:37 am
by sunnysideup
Hello people, It's been a long time....
I've been experimenting with inline assembly for a bit (again). I just realized that the
force_read() function isn't really correct (according to my understanding at least) when optimizations come into the picture.
Here is the function again:
Code: Select all
static void force_read(uint8_t *p)
{
asm volatile("" : : "r"(*p) : "memory");
}
Let us assume that the compiler inlines it, and we finally get:
Code: Select all
static inline void force_read(uint8_t *p)
{
asm volatile("" : : "r"(*p) : "memory");
}
It is my understanding that a call to
force_read would not really guarantee a
memory read. Consider this program:
Code: Select all
inline void force_read(int* address)
{
asm volatile ("": :"r"(*address):"memory" );
}
int main()
{
int a;
scanf("%d",&a);
//I'm doing some calculations with a
a *= 12432;
a += 1231;
force_read(&a); //This shouldn't actually compile to a memory read
}
As expected, Godbolt gives me:
Code: Select all
.LC0:
.string "%d"
main:
subq $24, %rsp
movl $.LC0, %edi
xorl %eax, %eax
leaq 12(%rsp), %rsi
call __isoc99_scanf
imull $12432, 12(%rsp), %eax
addl $1231, %eax
movl %eax, 12(%rsp)
xorl %eax, %eax
addq $24, %rsp
ret
Clearly, there is no "forced read"...
I would have guessed that a better function would have been:
Code: Select all
inline void force_read(int* address)
{
asm volatile ("mov (%[addr]) , %%rax": :[addr]"m"(address):"memory","rax" ); //Do we even need the memory clobber??
}
Is my understanding correct?
Re: Proper way to write inline assembly
Posted: Mon Aug 03, 2020 9:12 am
by nullplan
sunnysideup wrote:Clearly, there is no "forced read"...
Well, there is. The snippet (empty string) is instantiated with the value you wanted in a register, namely in EAX. Question is what this is even supposed to prove and how it is useful in any way. You are calling force_read() on a local variable. But local variables are in normal memory, where a force read is both unnecessary and not useful. Or if it is, I don't know how. force_read() is useful for MMIO, where you sometimes need to read a register because that has side effects, even if you don't need the value. But that means, in terms of C, that you are reading a non-local object through a pointer. For example:
Code: Select all
struct whatever *foo = get_foo();
force_read(&foo->reg);
Result:
https://godbolt.org/z/ss6nda
See, it does what it is supposed to.
Your replacement function has at least the problem that it is reading 8 bytes from a 4-byte pointer, and it is using one specific register that is used very often, thereby increasing register pressure. Mind you, my solution is to not use inline assembly at all, thereby removing all optimization opportunities. But less of a headache in the long run.
Re: Proper way to write inline assembly
Posted: Mon Aug 03, 2020 10:53 pm
by sunnysideup
I'm afraid I didn't explain myself very well.
Let me explain why I'd want a force_read function first: You can use it for cache side-channel attacks, where you'd measure the time that it takes to access a variable from memory to figure out whether it was accessed from the cache, or main memory.
So I'd expect that, whenever I call force_read(), It would compile to a memory load. In my example, I purposefully do some arithmetic before calling force_read(), because I want the value to be loaded into a register anyway, and when I call force_read(), the variable isn't really forcefully read.... and the compiler uses that register that was already used for the arithmetic...
Btw, the memory clobber is messing things up a bit... removing it makes it a lot more clearer in my example.
I hope what I am saying makes sense XD