OSDev.org

Posted: **Mon Jul 12, 2021 1:02 pm**

Hi!

Recently I've been trying to get managarm running on the Raspberry Pi 4, but when trying to get SMP working I noticed that reads from the GIC's GICD_ITARGETSR0-7 registers return inconsistent values (on CPU0 the read returns either 0x00000000, 0x01010101, or 0x02020202, on CPU1 it returns 0x00000000, 0x01010101, 0x02020202, or 0x04040404). As per the GICv2 specification, these registers are read only, and reading from them indicates the CPU interface number for the CPU that the read originated from. These inconsistent CPU interface numbers also seem to affect sending SGIs (with GICD_SGIR.TargetListFilter=1), since sometimes it seems that wrong CPUs receive the SGI (CPU 0 sends SGI to everyone but itself, yet it receives it).

I've confirmed the issue only happens under managarm, since if I run "for i in 0 1 2 3; do taskset -c $i devmem2 0xFF841800 w; done" in Linux, it always prints the expected result (and Linux probably would've panicked during initialization if the results were inconsistent), but despite this, I was unable to find the reason why this happens. The GIC registers are mapped as outer-shareable device nGnRnE memory, and whether the reads are relaxed or not does not seem to matter (Linux only does relaxed I/O to the GIC). I suspect this is some system register misconfiguration, but I have no clue at how to continue debugging this.

The code for the GIC driver can be found here: https://github.com/qookei/managarm/blob ... rm/gic.cpp and https://github.com/qookei/managarm/blob ... ch/gic.hpp. The code that reads the IRQ target regs starts at line 252. Do note that the code makes use of C++ features like operator overloading for accessing register fields etc.

Example output on CPU 0 (before entering the SMP code) from that function:

Code: Select all

thor: zero v at i = 4? ignoring
thor: zero v at i = 5? ignoring
thor: bad v = 2020202 at i = 1, prev mask = 1

Any help would be greatly appreciated.
qookie

Posted: **Mon Jul 12, 2021 3:56 pm**

The manual says, these registers are byte accessible. Maybe try access them this way.

Posted: **Mon Jul 12, 2021 9:10 pm**

GIC manual claims:

All registers support 32-bit word accesses ...

In addition, the GICD_IPRIORITYRn, GICD_ITARGETSRn, GICD_CPENDSGIRn, and GICD_SPENDSGIRn registers support byte accesses.

Reading them as 32-bits should work.

Your Linux test seem to read only GICD_ITARGETSR0. Were other registers also read from within Linux? The function gic_get_cpumask stops as soon as it reads a non-zero value in the mask variable. If the implementation bothered to return the correct mask only for GICD_ITARGETSR0, things work for Linux.

Do i=4 and i=5 consistently return 0? If so, that may be because they control the IDs (16-23, i.e some of the PPIs) are not implemented under rpi4, and so they return 0. The bcm2711 doc shows PPIs beginning at ID 26 (i.e. i=6). If this is really the case, then the read of i=6 should return 0x1010000 (for cpu interface # 0), since ID 24 and 25 are (assumed to be) unimplemented also.

Posted: **Tue Jul 13, 2021 8:36 am**

linuxyne wrote: Do i=4 and i=5 consistently return 0? If so, that may be because they control the IDs (16-23, i.e some of the PPIs) are not implemented under rpi4, and so they return 0. The bcm2711 doc shows PPIs beginning at ID 26 (i.e. i=6). If this is really the case, then the read of i=6 should return 0x1010000 (for cpu interface # 0), since ID 24 and 25 are (assumed to be) unimplemented also.

Indeed they do, and it seems your theory about unimplemented PPIs does check out since with some extra logging we see:

Code: Select all

thor: v = 0 @ i = 4
thor: v = 0 @ i = 5
thor: v = 1010100 @ i = 6

Unfortunately the main problem still stands, even when only considering ITARGETSR0-3 (or even only ITARGETSR0) the read results between them are inconsistent on the same core (and between two reads to the same register) in an unpredictable manner:

Code: Select all

thor: v = 2020202 @ i = 0
thor: v = 1010101 @ i = 1
thor: v = 1010101 @ i = 2
thor: v = 1010101 @ i = 3
...
thor: v = 2020202 @ i = 0
thor: v = 1010101 @ i = 1
thor: v = 1010101 @ i = 2
thor: v = 1010101 @ i = 3
...
thor: v = 1010101 @ i = 0
thor: v = 4040404 @ i = 1
thor: v = 1010101 @ i = 2
thor: v = 1010101 @ i = 3

A similar situation is also observed on CPU1 etc.

qookie

Posted: **Tue Jul 13, 2021 10:23 am**

Unfortunately the main problem still stands

If the OS doesn't bring up the other CPUs (or keeps all except one looping at the loader/firmware
stage), does that change the behaviour? I would also test by reading the register at various places during and after the boot to see if there's any particular stage after which the issue occurs (for e.g. before and after the MMU is enabled).

Is it possible for us to run the OS under a simulator such as the arm fvp models? The model provides a trace (the tarmac trace) which can provide details about the execution of the instructions, afaik.

Other than that, there seems to be a (admittedly far-fetched and most probably incorrect) possibility that the thread is being migrated away to other cpus, and/or some form of caching may have been taking place.

I found the source code difficult to parse - for e.g. where should I look for the definition of arch::scalar_load? Please excuse my unfamiliarity with the c++ language.

Posted: **Thu Jul 15, 2021 3:25 pm**

linuxyne wrote: If the OS doesn't bring up the other CPUs (or keeps all except one looping at the loader/firmware
stage), does that change the behaviour? I would also test by reading the register at various places during and after the boot to see if there's any particular stage after which the issue occurs (for e.g. before and after the MMU is enabled).

Thank you very much for your help! After some further debugging I noticed that this only didn't occur when the MMU was disabled, and after some more poking around I noticed that I accidentally was mapping the GIC MMIO region as device GRE instead of nGnRnE due to small typo in the page flag constants.

linuxyne wrote: Is it possible for us to run the OS under a simulator such as the arm fvp models? The model provides a trace (the tarmac trace) which can provide details about the execution of the instructions, afaik.

I'm not too familiar with FVP, but our goal is to support HW with at least partially standard peripherals, and we currently support running under QEMU's virt machine and we're working on the RPi4 support. In theory getting it working on platforms that support the Linux boot protocol, and that provide a device tree shouldn't be that much of an issue.

linuxyne wrote: I found the source code difficult to parse - for e.g. where should I look for the definition of arch::scalar_load? Please excuse my unfamiliarity with the c++ language.

The code being a difficult to approach is mainly on us, arch::scalar_load (and everything else in the arch namespace) comes from our helper library (https://github.com/managarm/libarch; it was in-tree before but it was split off so other projects can use it), and eventually it calls into these functions here: https://github.com/managarm/libarch/blo ... p#L15-L169

Again, thank you very much for your help.

qookie

OSDev.org

Inconsistent reads from GICD_ITARGETSR0-7 on a real GIC-400

Inconsistent reads from GICD_ITARGETSR0-7 on a real GIC-400

Re: Inconsistent reads from GICD_ITARGETSR0-7 on a real GIC-

Re: Inconsistent reads from GICD_ITARGETSR0-7 on a real GIC-

Re: Inconsistent reads from GICD_ITARGETSR0-7 on a real GIC-

Re: Inconsistent reads from GICD_ITARGETSR0-7 on a real GIC-

Re: Inconsistent reads from GICD_ITARGETSR0-7 on a real GIC-