Page 1 of 3

[solved] PCIe config space access hangs on aarch64

Posted: Mon Oct 31, 2022 2:55 pm
by kzinti
I am trying to enumerate devices using the PCIe mechanism on qemu's aarch 64 virt machine. I can find the MCFG table just fine and it tells me that the PCIE config space is at 0x0000004010000000.

I have mapped that memory using the following flags:

Code: Select all

MMIO = Valid | Page | AccessFlag | UXN | PXN | Uncacheable,
"Uncacheable" is MAIR index 2 which is initialized to 0x00 (Device nGnRnE).

When I try to read the very first entry ("vendor id" for bus 0, slot 0, function 0), qemu just hangs. No exception is shown (synchronous or otherwise).

I have tried many things involving page flags, identity mapping, etc... Nothing seems to work.

The code works just fine on x86_64 (qemu and 3 computers).

What am I missing here to get this working on aarch64?

Thanks!

Re: PCIe config space access hangs on aarch64 -qemu virt mac

Posted: Mon Oct 31, 2022 4:31 pm
by kzinti
Device Tree has this info:

Code: Select all

        pcie@10000000 {
                interrupt-map-mask = <0x1800 0x00 0x00 0x07>;
                interrupt-map = <0x00 0x00 0x00 0x01 0x8001 0x00 0x00 0x00 0x03 0x04 0x00 0x00 0x00 0x02 0x8001 0x00 0x00 0x00 0x04 0x04 0x00 0x00 0x00 0x03 0x8001 0x00 0x00 0x00 0x05 0x04 0x00 0x00 0x00 0x04 0x8001 0x00 0x00 0x00 0x06 0x04 0x800 0x00 0x00 0x01 0x8001 0x00 0x00 0x00 0x04 0x04 0x800 0x00 0x00 0x02 0x8001 0x00 0x00 0x00 0x05 0x04 0x800 0x00 0x00 0x03 0x8001 0x00 0x00 0x00 0x06 0x04 0x800 0x00 0x00 0x04 0x8001 0x00 0x00 0x00 0x03 0x04 0x1000 0x00 0x00 0x01 0x8001 0x00 0x00 0x00 0x05 0x04 0x1000 0x00 0x00 0x02 0x8001 0x00 0x00 0x00 0x06 0x04 0x1000 0x00 0x00 0x03 0x8001 0x00 0x00 0x00 0x03 0x04 0x1000 0x00 0x00 0x04 0x8001 0x00 0x00 0x00 0x04 0x04 0x1800 0x00 0x00 0x01 0x8001 0x00 0x00 0x00 0x06 0x04 0x1800 0x00 0x00 0x02 0x8001 0x00 0x00 0x00 0x03 0x04 0x1800 0x00 0x00 0x03 0x8001 0x00 0x00 0x00 0x04 0x04 0x1800 0x00 0x00 0x04 0x8001 0x00 0x00 0x00 0x05 0x04>;
                #interrupt-cells = <0x01>;
                ranges = <0x1000000 0x00 0x00 0x00 0x3eff0000 0x00 0x10000 0x2000000 0x00 0x10000000 0x00 0x10000000 0x00 0x2eff0000 0x3000000 0x80 0x00 0x80 0x00 0x80 0x00>;
                reg = <0x40 0x10000000 0x00 0x10000000>; 
                msi-parent = <0x8002>;
                dma-coherent;
                bus-range = <0x00 0xff>;
                linux,pci-domain = <0x00>;
                #size-cells = <0x02>;
                #address-cells = <0x03>;
                device_type = "pci";
                compatible = "pci-host-ecam-generic";
        };
The config space address is in "reg". It shows 0x4010000000 which matches what I see in the MCFG.

Re: PCIe config space access hangs on aarch64 -qemu virt mac

Posted: Mon Oct 31, 2022 9:46 pm
by linuxyne
Some debugging tips:

- Enable tracing for pci_cfg_*
- Once the VM hangs, go to qemu monitor and dump the cpu registers for all cpus; see if any of them is at an unexpected PC.

Re: PCIe config space access hangs on aarch64 -qemu virt mac

Posted: Tue Nov 01, 2022 9:52 am
by kzinti
Using the trace, it looks like it is reading from PCI just fine:

Code: Select all

[email protected]:pci_cfg_read gpex-root 00:0 @0x0 -> 0x81b36
I am ramping up on QEMU monitor's usage... Thanks for the help.

Code: Select all

(qemu) info registers
 PC=000000023bb34a00 X00=ffffffff8004c020 X01=ffffffff8004ace0
X02=0000000000000000 X03=0060000000000613 X04=00000000000000f5
X05=ffffffff8004b4c5 X06=0000000000000000 X07=0000000000000032
X08=ffffffff800225e6 X09=0000000000000049 X10=ffffffff800225ed
X11=0000000009000000 X12=0000000000000007 X13=0000000000000000
X14=0000000000000032 X15=0000000240000000 X16=0000000000000000
X17=ffffffff80020d24 X18=000000000000002e X19=ffffffff8004ace0
X20=ffffffff8004c050 X21=ffffffff8004c050 X22=ffffffff80021209
X23=ffffffff8004b4d0 X24=ffff8002384a002c X25=ffffffff8004ac08
X26=0000000000000000 X27=0000000000000000 X28=0000000000000000
X29=ffffffff8004aba0 X30=ffffffff8001dadc  SP=ffffffff8004aba0
PSTATE=000003c5 ---- EL1h     FPCR=00000000 FPSR=00000000
ffffffff800xxxxx is my kernel
PC is at 000000023bb34a00, which according to my memory map is "available conventional memory", which is a bit puzzling. It is near UEFI Runtime Services Code, I wonder if this is some exception handler (but I have exited boot services already).

Code: Select all

(qemu) x/32i 0x000000023bb34a00
0x23bb34a00:  4cf33c50  .byte    0x50, 0x3c, 0xf3, 0x4c
0x23bb34a04:  000055c0  .byte    0xc0, 0x55, 0x00, 0x00
0x23bb34a08:  4c2037b0  .byte    0xb0, 0x37, 0x20, 0x4c
0x23bb34a0c:  000055c0  .byte    0xc0, 0x55, 0x00, 0x00
0x23bb34a10:  00000020  .byte    0x20, 0x00, 0x00, 0x00
0x23bb34a14:  00000000  .byte    0x00, 0x00, 0x00, 0x00
0x23bb34a18:  00000010  .byte    0x10, 0x00, 0x00, 0x00
0x23bb34a1c:  00000000  .byte    0x00, 0x00, 0x00, 0x00
0x23bb34a20:  00000000  .byte    0x00, 0x00, 0x00, 0x00
0x23bb34a24:  00000000  .byte    0x00, 0x00, 0x00, 0x00
0x23bb34a28:  00000000  .byte    0x00, 0x00, 0x00, 0x00
0x23bb34a2c:  00000000  .byte    0x00, 0x00, 0x00, 0x00
0x23bb34a30:  00000000  .byte    0x00, 0x00, 0x00, 0x00
0x23bb34a34:  00000000  .byte    0x00, 0x00, 0x00, 0x00
Everytime I dump the above, the first 3 bytes are different. This is very puzzling.

Dumping from xxx9f0 I get constant memory at xxxa00:

Code: Select all

(qemu) x/32i 0x000000023bb349f0
...
0x23bb34a00:  00000020  .byte    0x20, 0x00, 0x00, 0x00
0x23bb34a04:  00000000  .byte    0x00, 0x00, 0x00, 0x00
0x23bb34a08:  00000015  .byte    0x15, 0x00, 0x00, 0x00
0x23bb34a0c:  00000000  .byte    0x00, 0x00, 0x00, 0x00

Re: PCIe config space access hangs on aarch64 -qemu virt mac

Posted: Tue Nov 01, 2022 10:22 am
by kzinti
I found something interesting: if I skip bus 0, device 0, function 0 and try to read to read device 1 first, I am able to read the vendor and device ids just fine:

Code: Select all

[email protected]:pci_cfg_read virtio-net-pci 01:0 @0x0 -> 0x1af4
[email protected]:pci_cfg_read virtio-net-pci 01:0 @0x2 -> 0x1000
[email protected]:pci_cfg_read virtio-net-pci 01:0 @0x0 -> 0x1af4
[email protected]:pci_cfg_read virtio-net-pci 01:0 @0x2 -> 0x1000
Info   : [PCI] (0000/00/01/00) PCI Device 1af4:1000 (Unknown device)
This is vendor id 0x1af4, device id 0x1000, which is a Virtio Network Device.

But then it hangs on the next device. Looks like I will have to do some tinkering to figure out what is going on.

Re: PCIe config space access hangs on aarch64 -qemu virt mac

Posted: Tue Nov 01, 2022 12:18 pm
by kzinti
So it appears bus 0, device 0, function 0 is the only read that hangs the CPU. If I skip 0/0/0, I can enumerate all devices just fine:

Code: Select all

Info   : [PCI] Mapped PCIE configuration space: 0000004010000000 to ffff804010000000, page count 65536
[email protected]:pci_cfg_read virtio-net-pci 01:0 @0x0 -> 0x10001af4
Info   : Ids: 10001af4
[email protected]:pci_cfg_read virtio-net-pci 01:0 @0xe -> 0x0
[email protected]:pci_cfg_read virtio-gpu-pci 02:0 @0x0 -> 0x10501af4
Info   : Ids: 10501af4
[email protected]:pci_cfg_read virtio-gpu-pci 02:0 @0xe -> 0x0
[email protected]:pci_cfg_read virtio-blk-pci 03:0 @0x0 -> 0x10011af4
Info   : Ids: 10011af4
[email protected]:pci_cfg_read virtio-blk-pci 03:0 @0xe -> 0x0
I am not sure why that would be, but it certainly is annoying.

Re: PCIe config space access hangs on aarch64 -qemu virt mac

Posted: Tue Nov 01, 2022 12:34 pm
by linuxyne
We can try reading 0xffff804010000000 from the qemu monitor to see if that causes any problems.

Does gva2gpa on 0xffff804010000000 correctly resolve the address?

Since only 0/0/0 causes the failure, it seems 0/0/1, 0/0/2, etc. work (albeit, they might return 0xffffffff)

Edit: Tried running the qemu-system-aarch64 with tracing on. The UEFI code too reads 0/0/0, but then it moves on without hanging. The UEFI seems to setup an identity-map for the PCIe config space, and reading from monitor using x and xp both work without any problems.

There might be ways to dump each instruction qemu runs leading up to the unexpected PC. GDB might help in dumping the stack when the cpu hangs.

Edit2: With aarch64 we also must take care of the differences between the formats the last-level block level descriptor, and other-level table descriptors. (But I think that is handled).

Re: PCIe config space access hangs on aarch64 -qemu virt mac

Posted: Tue Nov 01, 2022 1:10 pm
by kzinti
Thanks for all the pointers, I will investigate them today.

I did notice that when I change the code a bit, the behaviour changes. It will get hang in other locations (I saw device 3 instead of 0).

This sequence is interesting: it reads the vendor id (0x1b36) 3 times, the device id (0x08) once and then hangs. It did verify that these values match what UEFI reads at startup.

Code: Select all

Info   : [PCI] Mapped PCIE configuration space: 0000004010000000 to ffff804010000000, page count 65536
[email protected]:pci_cfg_read gpex-root 00:0 @0x0 -> 0x1b36
[email protected]:pci_cfg_read gpex-root 00:0 @0x0 -> 0x1b36
[email protected]:pci_cfg_read gpex-root 00:0 @0x0 -> 0x1b36
[email protected]:pci_cfg_read gpex-root 00:0 @0x2 -> 0x8
<hang>
I think there might be something going on with MMIO memory accesses... Do I need memory barriers on aarch64 to access MMIO / PCI config space? I'll also try to see how exactly UEFI maps the memory (flags, MAIR, etc).

Re: PCIe config space access hangs on aarch64 -qemu virt mac

Posted: Tue Nov 01, 2022 1:29 pm
by linuxyne
It is surprising that qemu doesn't report any exceptions with "-d int". The contents of the PC pasted earlier were clearly invalid instructions. If this is a kvm-enabled VM, switching to tcg should provide more debug control over the VM.

I would try relocating the testing into the boot loader, so that the entire kernel is bypassed. If the bootloader too suffers from the same problem, that should narrow down the scope of the problem. If not, comparing the bootloader and the kernel sources should again narrow down the scope within the kernel.
(Edit: Or, inside the kernel, rely on UEFI built memory map when accessing PCIe config space instead of creating new mappings. That should also confirm if the MMIO maps are at fault).

There's no need for memory barriers here.

Re: PCIe config space access hangs on aarch64 -qemu virt mac

Posted: Tue Nov 01, 2022 2:19 pm
by kzinti
There are exceptions, I forgot to log them initially. Here is what I see (it starts with a few PCI reads and then hangs). Notice our friend 0x0x23bb34a00 that was the PC's value above.

Code: Select all

Exception return from AArch64 EL1 to AArch64 EL1 PC 0x23bb9f01c
[email protected]:pci_cfg_read gpex-root 00:0 @0x0 -> 0x1b36
[email protected]:pci_cfg_read gpex-root 00:0 @0x0 -> 0x1b36
[email protected]:pci_cfg_read gpex-root 00:0 @0x0 -> 0x1b36
[email protected]:pci_cfg_read gpex-root 00:0 @0x2 -> 0x8
Taking exception 4 [Data Abort]
...from EL1 to EL1
...with ESR 0x25/0x96000009
...with FAR 0x9000018
...with ELR 0xffffffff80001634
...to EL1 PC 0x23bb34a00 PSTATE 0x3c5
Taking exception 3 [Prefetch Abort]
...from EL1 to EL1
...with ESR 0x21/0x86000005
...with FAR 0x23bb34a00
...with ELR 0x23bb34a00
...to EL1 PC 0x23bb34a00 PSTATE 0x3c5
Taking exception 3 [Prefetch Abort]
...from EL1 to EL1
...with ESR 0x21/0x86000005
...with FAR 0x23bb34a00
...with ELR 0x23bb34a00
...to EL1 PC 0x23bb34a00 PSTATE 0x3c5
Taking exception 3 [Prefetch Abort]
...from EL1 to EL1
...with ESR 0x21/0x86000005
...with FAR 0x23bb34a00
...with ELR 0x23bb34a00
...to EL1 PC 0x23bb34a00 PSTATE 0x3c5
<keeps repeating>

Re: PCIe config space access hangs on aarch64 -qemu virt mac

Posted: Tue Nov 01, 2022 2:54 pm
by kzinti
Trying to run this code in the bootloader (before ExitBootServices) and using address 0000004010000000 doesn't work, I am getting synchronous exceptions. Presumably the mapping is not kept around or is not accessible to the bootloader.

How did you determine that the PCIE config space was identity mapped by UEFI?

This memory is also not showing up in the UEFI memory map.

Re: PCIe config space access hangs on aarch64 -qemu virt mac

Posted: Tue Nov 01, 2022 3:27 pm
by kzinti
Might be time to install some exception handlers... QEMU spits so much log that I can't make head or tail of what is happening.

It does seem to point to a bus error / data abort. Depending on the order and number of accesses I make to the config space, it will trigger at different times.

I can't access the original location in the bootloader or my kernel. Either it was not identity-mapped or that mapping is being removed before my bootloader starts.

Re: PCIe config space access hangs on aarch64 -qemu virt mac

Posted: Tue Nov 01, 2022 5:36 pm
by kzinti
I got a few different ways to reproduce this and it always end up being a bus error (I think?):

Code: Select all

Taking exception 4 [Data Abort]
...from EL1 to EL1
...with ESR 0x25/0x96000009
...with FAR 0x9000018
...with ELR 0xffffffff80001634
...to EL1 PC 0x23bb34a00 PSTATE 0x3c5
And then because there is nothing valid at 0x23bb34a00, I get an infinite number of:

Code: Select all

Taking exception 3 [Prefetch Abort]
...from EL1 to EL1
...with ESR 0x21/0x86000005
...with FAR 0x23bb34a00
...with ELR 0x23bb34a00
...to EL1 PC 0x23bb34a00 PSTATE 0x3c5
My current code only does 32 bits reads to the PCI config space, so I am not sure what these bus errors are.

Skipping 0/0/0 I get all the way to 16/0/0 before exception 4 is triggered. Very puzzling.

Code: Select all

Info   : Slot: 31
Info   : Function: 0
Info   : Function: 1
Info   : Function: 2
Info   : Function: 3
Info   : Function: 4
Info   : Function: 5
Info   : Function: 6
Info   : Function: 7
Info   : Bus: 16
Info   : Slot: 0
Info   : Function: 0

Re: PCIe config space access hangs on aarch64 -qemu virt mac

Posted: Tue Nov 01, 2022 5:53 pm
by kzinti
The fault address register (FAR) says the access was to 0x9000018, and sure enough there is nothing in the memory map there:

Code: Select all

Info   : [KRNL] System memory map:
Info   : [KRNL] 0000000004000000 - 0000000007ffffff: Mapped I/O
Info   : [KRNL] 0000000009010000 - 0000000009010fff: Mapped I/O
Info   : [KRNL] 0000000040000000 - 00000000fff7afff: Available
Info   : [KRNL] 00000000fff7b000 - 00000000fffc0fff: Kernel Data
Info   : [KRNL] 00000000fffc1000 - 00000000fffdffff: Kernel Code
Info   : [KRNL] 00000000fffe0000 - 00000000fffeffff: Kernel Data
Info   : [KRNL] 00000000ffff0000 - 000000023848ffff: Available
Info   : [KRNL] 0000000238490000 - 00000002384effff: ACPI Reclaimable
Info   : [KRNL] 00000002384f0000 - 00000002384fffff: Available
Info   : [KRNL] 0000000238500000 - 000000023852ffff: ACPI Reclaimable
Info   : [KRNL] 0000000238530000 - 00000002385affff: UEFI Runtime Services Code
Info   : [KRNL] 00000002385b0000 - 000000023869ffff: UEFI Runtime Services Data
Info   : [KRNL] 00000002386a0000 - 000000023874ffff: UEFI Runtime Services Code
Info   : [KRNL] 0000000238750000 - 000000023bc1ffff: Available
Info   : [KRNL] 000000023bc20000 - 000000023bdaffff: UEFI Runtime Services Code
Info   : [KRNL] 000000023bdb0000 - 000000023bffffff: UEFI Runtime Services Data
Info   : [KRNL] 000000023c000000 - 000000023f81bfff: Available
Info   : [KRNL] 000000023f81c000 - 000000023f843fff: Kernel Data
Info   : [KRNL] 000000023f844000 - 000000023fffffff: Available
Now I need to figure out what and why is trying to access that address...

Re: PCIe config space access hangs on aarch64 -qemu virt mac

Posted: Tue Nov 01, 2022 7:09 pm
by kzinti
Mappings look ok. I've done lot of googling and I am thinking that there is some error raised by the PCIE controller. I just can't find anyone having the same issue with QEMU/virt.

Code: Select all

(qemu) x /i 0xffff804010000000
[email protected]:pci_cfg_read gpex-root 00:0 @0x0 -> 0x81b36
[email protected]:pci_cfg_read gpex-root 00:0 @0x4 -> 0x0
[email protected]:pci_cfg_read gpex-root 00:0 @0x8 -> 0x6000000
[email protected]:pci_cfg_read gpex-root 00:0 @0xc -> 0x0
[email protected]:pci_cfg_read gpex-root 00:0 @0x10 -> 0x0
[email protected]:pci_cfg_read gpex-root 00:0 @0x14 -> 0x0
[email protected]:pci_cfg_read gpex-root 00:0 @0x18 -> 0x0
[email protected]:pci_cfg_read gpex-root 00:0 @0x1c -> 0x0
0xffff804010000000:  00081b36  .byte    0x36, 0x1b, 0x08, 0x00
(qemu) gva2gpa 0xffff804010000000
gpa: 0x4010000000
(qemu)                              
81b36 --> Red Hat PCIe Host Bridge, as expected.