OK, so I replaced the "++refCount" and "--refCount" with "__atomic_add_fetch()" and "__atomic_sub_fetch()". It still works just fine on qemu (x86_64 and aarch64), but on the Raspberry Pi 3 (aarch64), it doesn't work anymore. I have tried using memory barriers (dmb and dsb), compiler barrier, etc. Nothing seems to help. Not that it should, I am still running with a single CPU. The generated code looks fine. If I try to use these two intrinsics and print the value of variables before and after, everything looks fine. Using __sync_add_fetch() doesn't work either. Code is running in EL1 at virtual address 0xFFFFFFFF8000xxxx, I do wonder if that can be an issue here. The problem exists with clang-12 and also clang-15 (I just upgraded to it hoping it would solve the issue, it didn't).
I've spend a few days on this and I am out of idea.
You can see the code here:
https://github.com/kiznit/rainbow-os/bl ... tr.hpp#L57
If I comment line 59 and use the intrinsic on line 60 instead, it doesn't work anymore. I have a hard time getting any info out of it because I am using shared_ptr<> everywhere in my logging code
