Page 1 of 1
Having a hard time with __atomic_xxx intrinsics on aarch64
Posted: Sat Dec 17, 2022 8:18 pm
by kzinti
I have this very simple implementation of std::shared_ptr<> for my kernel. It's been working just fine in single-thread mode, now I want to support SMP. The way to support this is to basically use atomic operations on the reference counts.
OK, so I replaced the "++refCount" and "--refCount" with "__atomic_add_fetch()" and "__atomic_sub_fetch()". It still works just fine on qemu (x86_64 and aarch64), but on the Raspberry Pi 3 (aarch64), it doesn't work anymore. I have tried using memory barriers (dmb and dsb), compiler barrier, etc. Nothing seems to help. Not that it should, I am still running with a single CPU. The generated code looks fine. If I try to use these two intrinsics and print the value of variables before and after, everything looks fine. Using __sync_add_fetch() doesn't work either. Code is running in EL1 at virtual address 0xFFFFFFFF8000xxxx, I do wonder if that can be an issue here. The problem exists with clang-12 and also clang-15 (I just upgraded to it hoping it would solve the issue, it didn't).
I've spend a few days on this and I am out of idea.
You can see the code here:
https://github.com/kiznit/rainbow-os/bl ... tr.hpp#L57
If I comment line 59 and use the intrinsic on line 60 instead, it doesn't work anymore. I have a hard time getting any info out of it because I am using shared_ptr<> everywhere in my logging code
Re: Having a hard time with __atomic_xxx intrinsics on aarch
Posted: Sat Dec 17, 2022 9:54 pm
by Octocontrabass
kzinti wrote:The way to support this is to basically use atomic operations on the reference counts.
This is very easy to do: declare the reference count as an atomic type, e.g. using "std::atomic<long>" instead of "long". All operations on an atomic variable will be atomic unless you explicitly relax the memory order.
kzinti wrote:OK, so I replaced the "++refCount" and "--refCount"
You don't need to change how you access an atomic variable unless you want to relax the memory order.
kzinti wrote:with "__atomic_add_fetch()" and "__atomic_sub_fetch()".
Generally, you should use std::atomic instead of the __atomic builtins.
kzinti wrote:Using __sync_add_fetch() doesn't work either.
The __sync builtins are deprecated.
kzinti wrote:Code is running in EL1 at virtual address 0xFFFFFFFF8000xxxx, I do wonder if that can be an issue here.
As long as your memory types are correct, I don't see how this could be the problem.
kzinti wrote:The problem exists with clang-12 and also clang-15
How did you convince Clang to not call GCC?
You may need to pass "-mno-outline-atomics" to ensure lock-free atomic operations are inline.
Re: Having a hard time with __atomic_xxx intrinsics on aarch
Posted: Sat Dec 17, 2022 10:55 pm
by kzinti
Octocontrabass wrote:This is very easy to do: declare the reference count as an atomic type, e.g. using "std::atomic<long>" instead of "long". All operations on an atomic variable will be atomic unless you explicitly relax the memory order.
There is no such thing as std::atomic<> in bare metal. I have my own implementation which is what I am trying to debug here.
Octocontrabass wrote:You don't need to change how you access an atomic variable unless you want to relax the memory order.
I know all about memory orders. I am just trying to narrow down what the problem is without using my std::atomic<> implementation (see
https://github.com/kiznit/rainbow-os/bl ... %2B/atomic).
Octocontrabass wrote:Generally, you should use std::atomic instead of the __atomic builtins.
That's what I am trying to do, debug my implementation of std::atomic<>.
Octocontrabass wrote:The __sync builtins are deprecated.
Yes they are. Was this not worth a try? They still compile just fine.
Octocontrabass wrote:How did you convince Clang to not call GCC?
That doesn't seem to make any sense. I am not using GCC. Why would clang call GCC? What am I missing here?
Octocontrabass wrote:You may need to pass "-mno-outline-atomics" to ensure lock-free atomic operations are inline.
The generated code is inlining the atomics. I don't see how that flag would do anything here. Using "-moutline-atomics" does produce link errors as expected (again, bare metal). I suppose I could implement the missing functions here and see if it helps with anything...
I appreciate you are trying to help, maybe I should have been more clear that this is kernel code (i.e. bare metal) and that I don't have an implementation of std::atomic<> provided by the compiler.
Re: Having a hard time with __atomic_xxx intrinsics on aarch
Posted: Sat Dec 17, 2022 11:04 pm
by Octocontrabass
kzinti wrote:There is no such thing as std::atomic<> in bare metal.
It's part of the freestanding <atomic> header. It's always available.
Re: Having a hard time with __atomic_xxx intrinsics on aarch
Posted: Sat Dec 17, 2022 11:13 pm
by kzinti
Octocontrabass wrote:It's part of the freestanding <atomic> header. It's always available.
Very interesting... I didn't know that. I found confirmation here:
https://en.cppreference.com/w/cpp/freestanding.
I do see <stdatomic.h> and all the other C freestanding headers, but I don't see any of the C++ freestanding header.
This is how I configure clang:
https://github.com/kiznit/rainbow-os/bl ... xt#L43-L48. I must be missing something there to enable C++ headers.
Thanks a lot!
edit: I can't find any file named "atomic" in the LLVM archive. Mmm.
Re: Having a hard time with __atomic_xxx intrinsics on aarch
Posted: Sat Dec 17, 2022 11:22 pm
by kzinti
From
https://en.cppreference.com/w/cpp/freestanding:
Compiler vendors might not correctly support freestanding implementation, for either implementation issues, or just ignoring freestanding as a whole (LLVM libcxx and msvc stl).
So it doesn't look like these headers are available with clang... But perhaps I can implement it on top of <stdatomic.h>.
edit: using C's atomic_long and atomic_fetch_add() doesn't solve the issue. I want to conclude that something is wrong with my std::shared_ptr<> implementation. What is puzzling is that it works just fine without atomics.
https://github.com/kiznit/rainbow-os/bl ... tr.hpp#L61
https://github.com/kiznit/rainbow-os/bl ... tr.hpp#L81
Re: Having a hard time with __atomic_xxx intrinsics on aarch
Posted: Sun Dec 18, 2022 12:28 am
by kzinti
I found the implementation of std::atomic<> for clang (libc++). It turns out that std::atomic<>::fetch_add() basically resolves to a __atomic_fetch_add(), which is what I was doing before switching to C's version of atomics.
Something else is going on here...
One thing I forgot to mention previously is that this code works just fine in my bootloader but not the kernel. This is was wondering if high-address space is a problem for the Cortex-A53 processor. It doesn't seem to make sense to me either, but I've come to expect the unexpected from hardware.
Re: Having a hard time with __atomic_xxx intrinsics on aarch
Posted: Sun Dec 18, 2022 12:41 am
by nullplan
So what is actually the problem? Does the increment function not adequately increment the counter? I would try to write a test function that calls the increment function from multiple threads, then collects all the results and looks to see if the result is what you expect.
Re: Having a hard time with __atomic_xxx intrinsics on aarch
Posted: Sun Dec 18, 2022 12:45 am
by Octocontrabass
kzinti wrote:
Compiler vendors might not correctly support freestanding implementation, for either implementation issues, or just ignoring freestanding as a whole (LLVM libcxx and msvc stl).
So it doesn't look like these headers are available with clang...
At least one LLVM developer disagrees with that statement. I'm not sure how you're supposed to get the headers, but they should work once you have them.
kzinti wrote:edit: using C's atomic_long and atomic_fetch_add() doesn't solve the issue.
Using atomic_fetch_add() shouldn't make any difference, operating on atomic variables is always atomic. (And if it
does make a difference, your compiler is broken.)
I really can't think of what would cause atomics to fail like this, unless you've somehow messed up something fundamental like alignment or memory type.
kzinti wrote:One thing I forgot to mention previously is that this code works just fine in my bootloader but not the kernel.
...Maybe you should double-check the alignment and the memory type...
Re: Having a hard time with __atomic_xxx intrinsics on aarch
Posted: Sun Dec 18, 2022 8:31 am
by qookie
Have you enabled the data cache (SCTLR.C=1)? With it disabled, the stxr instruction will just continuously report failure (on Cortex-A72 at least, and I'm guessing A53 too), presumably because the exclusive monitor lock is per cache line on these cores.
Re: Having a hard time with __atomic_xxx intrinsics on aarch
Posted: Sun Dec 18, 2022 12:32 pm
by kzinti
clang doesn't provide the freestanding headers. That comment above basically says that the libc++ people try to make their library work in freestanding mode, but it doesn't appear that they provide a way to install it as such. I will see if I can get to build the whole thing with my cross-compiler or just copy the files I need for now. But I doubt it's a C++ problem, I think it might have to do with aarch64 memory configuration as suggested by both of you.
SCTLR_EL1 is configured as 0xc5183d in QEMU (and the Pi starts et EL2 where SCTLR_EL2 is also 0xc5183d). The breakdown is:
Code: Select all
SCTLR_EL1: 0000000000c5183d
SPAN : 1 - The value of PSTATE.PAN is left unchanged on taking an exception to EL1.
EIS : 1 - The taking of an exception to EL1 is a context synchronizing event.
nTWE : 1 - This control does not cause any instructions to be trapped.
nTWI : 1 - This control does not cause any instructions to be trapped.
I : 1 - Stage 1 instruction access Cacheability control, for accesses at EL0 and EL1.
EOS : 1 - An exception return from EL1 is a context synchronizing event.
CP15BEN: 1 - EL0 using AArch32: EL0 execution of the CP15DMB, CP15DSB, and CP15ISB instructions is enabled.
SA0 : 1 - SP Alignment check enable for EL0.
SA : 1 - SP Alignment check enable.
C : 1 - Stage 1 Cacheability control, for data accesses.
M : 1 - EL1&0 stage 1 address translation enabled.
In the case of the Pi, I programe SCTLR_EL1 like this:
Code: Select all
// Enable MMU at EL1
uint64_t sctlr_el1{};
sctlr_el1 |= (1 << 0); // M = 1 EL1&0 stage 1 address translation enabled.
sctlr_el1 |= (1 << 2); // C = 1 Enable Data Cache
sctlr_el1 |= (1 << 3); // SA = 1 SP Alignment check enable for EL1
sctlr_el1 |= (1 << 4); // SA0 = 1 SP Alignment check enable for EL0
sctlr_el1 |= (1 << 12); // I = 1 Enable Instruction Cache
mtl::Write_SCTLR_EL1(sctlr_el1); // 0x101D
So I do enable both instruction (I) and data (C) cache... The rest doesn't seem related. I did try hardcoding 0xc5183d but it didn't make any difference.
Re: Having a hard time with __atomic_xxx intrinsics on aarch
Posted: Mon Dec 19, 2022 5:41 am
by linuxyne
Was the qemu-aarch64 test done with kvm enabled?
If not, running your OS within a kvm-enabled VM on RPi3 Linux can probably help with the debug.
Since the problem occurs within the kernel, one can create a small stub containing the instructions that seem to trigger the problem, and call that stub at various eligible locations, starting at the kernel's entry point.
Re: Having a hard time with __atomic_xxx intrinsics on aarch
Posted: Mon Dec 19, 2022 9:15 am
by kzinti
Trying this under qemu with kvm is an excellent idea. I was running qemu on my desktop x86_64, so no kvm. I probably won't have time to try this before next year, but I look forward to it.
Re: Having a hard time with __atomic_xxx intrinsics on aarch
Posted: Tue May 09, 2023 2:57 pm
by kzinti
For posterity, I found the issue: I had a bug in how I was initializing MAIR_EL1. Basically my code is setup to use MAIR index 3 for writeback memory, but MAIR index 3 was initialized to 0x00 which is uncacheable memory. Atomic operations on aarch64 don't work on uncacheable memory.