Page 1 of 2

[SOLVED] xHCI Address Device (BSR=0) and Device Descriptor queries return completion code 4

Posted: Thu Jan 30, 2025 6:41 am
by alnyannn
Hi everyone,

I'm continuing my work on making my OS more functional on real hardware, the thing I'm working on right now is the xHCI driver, specifically, making it work on real hardware.

I've got a few issues with xHCI on my laptop:

* There exist two xHCs, of which (for now) I select the one that's used for USB2.0, because configuring both means that the same physical port will generate events to both xHCs, which I don't want to deal with right now
* My Root Hub port setup function, which works under QEMU, fails to issue an address to a device, with the Command Ring reporting an Event TRB with completion code 4 (USB transcation failed) once I try to issue an Address Device TRB with BSR=0.

My port initialization process looks like this:

1. I discover that a device has been connected to a port, because PORTSC's CCS=1 and CSC=1
2. I do a reset of that port, because it is a USB2.0 port and the spec says it will stay in the Disabled state until a reset is performed, after which it will transition to Enabled state (and the device will go Powered → Default state).
3. Issue an Enable Slot TRB with appropriate slot_type, set up a Device Context in DCBAA
4. Issue an Address Device TRB with BSR=1 and Input Context set up as described by the specification (block SET_ADDRESS, but let the xHC do its internal thing)
5. Read the device descriptor from its Default Control Endpoint (necessary for Full-speed devices to figure out their wMaxPacketSize)
6. Reset the Default Control Endpoint's ring back to its initial state so I again get a DCS=1 and a fresh dequeue pointer
7. Finally, I issue an Address Device, this time with BSR=0 and a proper max packet size for EP0 to make the xHC send a SET_ADDRESS packet

In QEMU, everything works fine and the device is initialized and handed off to a generic USB stack driver. But on real hardware, I keep receiving the "USB transaction error" (4) completion code on step 5 or step 7, if steps 5-6 are omitted and a max packet size is assumed to be 8.

I know QEMU is quite forgiving in terms of configuration, but I've spent quite a lot of time this week going through the spec's device init process to find what I might be missing, and can't find anything.

Here's the code involved in initializing the root hub port:

https://git.alnyan.me/alnyan/yggdrasil/ ... er.rs#L320

Re: xHCI Address Device (BSR=0) and Device Descriptor queries return completion code 4

Posted: Thu Jan 30, 2025 7:26 am
by BenLunt
Can you place a bootable image somewhere for download? If so, I can run it though my tests.

Have you tried Bochs? When configured correctly, it can output quite a log file to show what is going on.

Ben
- https://www.fysnet.net/the_universal_serial_bus.htm

Re: xHCI Address Device (BSR=0) and Device Descriptor queries return completion code 4

Posted: Thu Jan 30, 2025 8:35 am
by alnyannn
Can you place a bootable image somewhere for download? If so, I can run it though my tests.
Sure. I don't have a proper .iso build process yet, but I have a fat32 FS image + build artifacts:

* FAT-32 FS Image https://alnyan.me/public/image.fat32
* kernel.elf: https://alnyan.me/public/kernel.elf
* initrd.img: https://alnyan.me/public/initrd.img
* EFI bootloader: https://alnyan.me/public/yboot.efi

I'm not sure if you can just flash a FAT-32 image onto a USB stick, but you can set up a FAT-32 FS there with an EFI partition and do it manually: just place the yboot.efi into EFI/Boot/BootX64.efi, and the rest goes to the FS root. Note that the kernel is programmed to just attempt to initialize a first device it finds and then hangs for debugging purposes (so I can see the output of what's happening on the screen)


UPD: Nevermind, I've just made a bootable .iso: https://alnyan.me/public/image.iso

As an additional note, I've just broken out a spare USB cable and compared the same device setup in Linux vs my OS using a logic analyzer:
* In Linux, I see a SETUP packet almost immediately after reset, followed by a couple of extra packets
* In my OS, there is a reset, but after that only some broken (?) frames follow, with no real packets being sent

Re: xHCI Address Device (BSR=0) and Device Descriptor queries return completion code 4

Posted: Thu Jan 30, 2025 6:47 pm
by BenLunt
Hi,

Thanks for the bootable image, however, may I suggest that next time you use some kind of compression on the ISO image. The image is ~525meg, but when compressed, it is ~190meg :-)

Side note, may have nothing to do with current issue, but Bochs complains early on:

Code: Select all

#PAE: asked to set dirty on paging leaf entry with R/W bit clear
First impression I found is that the controller never sees any reads or writes to the Operational Registers.

You write a base address to BAR #0 of 0xF0500000

Code: Select all

00517849738i[XHCI  ] BAR #0: mem base address = 0xf0500000
If you read the BAR #0, you will notice that bit 2 is set, the big bit.
However, you never write to BAR #1. It just so happens that BAR #1 is zero.

Anyway, my tests never see any writes to any addresses between 0x00000000F0500000 and 0x00000000F05008xx.

Ben

Re: xHCI Address Device (BSR=0) and Device Descriptor queries return completion code 4

Posted: Fri Jan 31, 2025 2:03 am
by alnyannn
Wow, does bochs support UEFI now? It's been quite a while since I used it. Maybe some custom patching required?

The test result looks weird: both on my real machine and in QEMU I actually see a pretty large 64-bit address (which means two BARs) being used. In QEMU it's 0x380000000000, for example.

Re: xHCI Address Device (BSR=0) and Device Descriptor queries return completion code 4

Posted: Fri Jan 31, 2025 2:57 am
by alnyannn
I've just made an extra observation: when I connect a device to a port, for USB3.1 ports I get PORTSC.PED=0. If I ignore the USB3.1 ports and try to set up the same physical port using USB2.0 registers, the port seems to reset fine (PED=1, PLS=0 immediately after reset), but almost immediately after a successful reset PED changes back to zero and PLS goes back to 7. So it looks like the device just disconnects a really short time after a port reset is performed.

So, for a USB2.0 port:

* Immediately after a reset: PED=1, PLS=0, CCS=1, PS=3 (High), PP=1
* Sleeping for 20ms after a reset: PED=0, PLS=7, CCS=1, PS=3, PP=1

When ignoring USB2.0 ports and only initializing USB3.1 ports (because the USB storage drive is a USB3 device):

* I briefly get CCS=1 with CSC=1, indicating a device has been connected
* Then immediately CCS clears back to 0 before I even manage to reset the port

This looks like the problem is not actually with SET_ADDRESS/Contexts, but even before that — with port connection/enabled state

Re: xHCI Address Device (BSR=0) and Device Descriptor queries return completion code 4

Posted: Fri Jan 31, 2025 5:03 am
by rdos
BenLunt wrote: Thu Jan 30, 2025 6:47 pm Hi,

Thanks for the bootable image, however, may I suggest that next time you use some kind of compression on the ISO image. The image is ~525meg, but when compressed, it is ~190meg :-)

Side note, may have nothing to do with current issue, but Bochs complains early on:

Code: Select all

#PAE: asked to set dirty on paging leaf entry with R/W bit clear
First impression I found is that the controller never sees any reads or writes to the Operational Registers.

You write a base address to BAR #0 of 0xF0500000

Code: Select all

00517849738i[XHCI  ] BAR #0: mem base address = 0xf0500000
If you read the BAR #0, you will notice that bit 2 is set, the big bit.
However, you never write to BAR #1. It just so happens that BAR #1 is zero.

Anyway, my tests never see any writes to any addresses between 0x00000000F0500000 and 0x00000000F05008xx.

Ben
I don't think an OS ever should change BARs, only read their settings.

Also, I note that newer motherboards will no longer set the bus master and memory access bits, and if you fail to do that in the driver code, then the BARs will not work and always return FF when read. This is probably a protection means so non-activated PCI functions should not be able to access PCI, but it really doesn't work for malious hardware since it doesn't stop direct access to PCI from the hardware.

Re: xHCI Address Device (BSR=0) and Device Descriptor queries return completion code 4

Posted: Fri Jan 31, 2025 5:06 am
by alnyannn
rdos wrote: Fri Jan 31, 2025 5:03 am Also, I note that newer motherboards will no longer set the bus master and memory access bits, and if you fail to do that in the driver code, then the BARs will not work and always return FF when read. This is probably a protection means so non-activated PCI functions should not be able to access PCI, but it really doesn't work for malious hardware since it doesn't stop direct access to PCI from the hardware.
I do set bus mastering + MMIO, I also clear port IO enable and IRQ disable bits in the PCI configuration space's command register in my probe() function:

Code: Select all

    let mut cmd = PciCommandRegister::from_bits_retain(info.config_space.command());
    cmd &= !(PciCommandRegister::DISABLE_INTERRUPTS | PciCommandRegister::ENABLE_IO);
    cmd |= PciCommandRegister::ENABLE_MEMORY | PciCommandRegister::BUS_MASTER;
    info.config_space.set_command(cmd.bits());
Also, an OS can update BARs if it wants to. I believe Linux remaps some BARs even in x86(-64). And on platforms like arm/riscv, you basically need to do that, because there might be no proper firmware to setup the BARs for you.

Re: xHCI Address Device (BSR=0) and Device Descriptor queries return completion code 4

Posted: Fri Jan 31, 2025 4:41 pm
by alnyannn
If I just dump the register state each time I get an event, I see that the USB3.1 port, when a device is connected, shows a correct PORTSC value with CCS=1, PED=1, PLS=0, PS=4 (SuperSpeed), but then immediately I receive a new event about that port, with PORTSC reading CCS=0, PED=0, PLS=4, PS=0.
So looks like the xHC sees the device trying to connect, briefly shows it actually managed to connect (CCS=1 + PLS=0 with the correct speed value), but then immediately the connection is dropped for some reason without me doing anything. Also, after this, I receive no further events from that port in the Event Ring

Re: xHCI Address Device (BSR=0) and Device Descriptor queries return completion code 4

Posted: Fri Jan 31, 2025 6:05 pm
by BenLunt
Wow, does bochs support UEFI now? It's been quite a while since I used it. Maybe some custom patching required?
It was a surprise to me a while back too. It was quite embarrassing since I have contributed so much to the Bochs project. :-)
...but then immediately the connection is dropped for some reason without me doing anything. Also, after this, I receive no further events from that port in the Event Ring
This sounds like a power thing. Either you are inadvertently clearing the power bit or (on real hardware) the port is finding an unrecoverable error and kills the power to the port.

After a bit more digging, come to find out, due to unknown circumstances, Bochs and/or OVMF-pure-efi.fd doesn't boot even to the EFI shell. All of the XHCI tests I was doing was on the EFI loading, not your code. I have no idea why Bochs and/or OVMF-pure-efi.fd won't boot to at least the EFI shell. Qemu does, as you know.

Edit: I tried another configuration/OS and Bochs and OVMF-pure-efi.fd booted it just fine. Most likely your OS does or doesn't do something that Bochs likes.

Re: xHCI Address Device (BSR=0) and Device Descriptor queries return completion code 4

Posted: Sat Feb 01, 2025 2:08 am
by alnyannn
BenLunt wrote: Fri Jan 31, 2025 6:05 pm This sounds like a power thing. Either you are inadvertently clearing the power bit or (on real hardware) the port is finding an unrecoverable error and kills the power to the port.
I think it's most likely the latter: I set the port power bit to 1 before starting polling any events and then sleep for 250ms just in case. Also, the PORTSC dumps I do on each event show that PP stays at 1, which means at least there's no problem here. I've also checked the PCI power capability D-state, it's D0 when I begin initializing the xHC, so I guess no problems here as well.
BenLunt wrote: Fri Jan 31, 2025 6:05 pm It was a surprise to me a while back too. It was quite embarrassing since I have contributed so much to the Bochs project. :-)
This is interesting. I'll make sure to try it out some time. I've tried that once, hit the "ROM size too large" error and gave up.

Re: xHCI Address Device (BSR=0) and Device Descriptor queries return completion code 4

Posted: Sat Feb 01, 2025 11:18 am
by BenLunt
When I run your image though Bochs and UEFI, it looks like your boot code repeats an inquiry to ACPI PM. At 0x03EB0097C, your code does an in eax,dx with dx = 0xb008. It then does a few tests on the returned value, does a pause and then after a few other calls, returns to the in eax,dx a little while later. Does your initial boot check for something in the Power Management and loop indefinitely until satisfied?

If you have the latest version of Bochs, this is the bochsrc.txt file I use to boot your image:
(Currently I have xHCI off to see if it boots first)

Code: Select all

display_library: win32, options="gui_debug:globalini" # use Win32 debugger gui
romimage: file="OVMF-pure-efi.fd", address=0x0, options=none
config_interface: win32config
cpu: model=tigerlake
cpu: count=1, ips=750000000, reset_on_triple_fault=1, ignore_bad_msrs=1
cpu: cpuid_limit_winnt=0
memory: guest=1024, host=1024
vgaromimage: file=VGABIOS-lgpl-latest-cirrus.bin
vga: extension=cirrus, update_freq=10
ata0: enabled=1, ioaddr1=0x1f0, ioaddr2=0x3f0, irq=14
ata1: enabled=0, ioaddr1=0x170, ioaddr2=0x370, irq=15
ata2: enabled=0, ioaddr1=0x1e8, ioaddr2=0x3e0, irq=11
ata3: enabled=0, ioaddr1=0x168, ioaddr2=0x360, irq=9
ata0-master: type=cdrom, path=image.iso, status=inserted, model="image.iso"
boot: cdrom
clock: sync=both, time0=local
floppy_bootsig_check: disabled=0
log: log.txt
panic: action=ask
error: action=report, cpu0=ignore
info: action=report, cpu0=ignore
debug: action=ignore
parport1: enabled=1, file="parport.out"
mouse: enabled=0, type=imps2
private_colormap: enabled=0
pci: enabled=1, chipset=i440fx, slot1=cirrus
#pci: enabled=1, chipset=i440fx, slot1=cirrus, slot2=usb_xhci
#usb_debug: type=xhci, doorbell=1
#usb_xhci: enabled=1, model="uPD720202", n_ports=4
#usb_xhci: port4=floppy, options4="speed:full, path:floppy.img, model:teac"
magic_break: enabled=1 cx
print_timestamps: enabled=0

Re: xHCI Address Device (BSR=0) and Device Descriptor queries return completion code 4

Posted: Sat Feb 01, 2025 1:30 pm
by alnyannn
BenLunt wrote: Sat Feb 01, 2025 11:18 am 0x03EB0097C, your code does an in eax,dx with dx = 0xb008.
Hmm, this is weird, I don't access anything ACPI-related from my UEFI boot stub directly (and I'm fairly certain the EFI calls I do shouldn't as well). And this is certainly not a location within the kernel, because it's not a higher-half address. Are there any logs from the serial port? Most of the code prints what's going on right now there, so you can see what the UEFI stub/kernel were doing.

UPD: Just set up Bochs with OVMF, it's doing the same for me, but I'm absolutely sure it's OVMF — it's doing that even if I supply no cdrom image

Re: xHCI Address Device (BSR=0) and Device Descriptor queries return completion code 4

Posted: Sat Feb 01, 2025 2:26 pm
by alnyannn
Here's an example of qemu's xhci trace events along with my OS's log messages when detecting the connected/disconnected events for ports. Prior to this, the OS has not written anything to the port registers, just initialized the HC by resetting, halting, configuring and starting it. I don't try to initialize anything, I just read and print the current connect status.

Code: Select all

(qemu) device_add usb-kbd,id=kbd0
usb_xhci_port_link port 5, pls 7
usb_xhci_port_notify port 5, bits 0x20000
usb_xhci_queue_event v 0, idx 0, ER_PORT_STATUS_CHANGE, CC_SUCCESS, p 0x0000000005000000, s 0x01000000, c 0x00008801
usb_xhci_oper_read off 0x0004, ret 0x00000008       #### USBSTS read
usb_xhci_oper_write off 0x0004, val 0x00000008      #### USBSTS write-back to clear RW1C bits
usb_xhci_runtime_read off 0x0020, ret 0x00000002 #### Interrupter 0 IMAN read
usb_xhci_runtime_write off 0x0020, val 0x00000003 #### Interrupted 0 IMAN write-back to clear IP
000010:0:ygg_driver_usb_xhci::controller:149: IRQ started: 0x8   ##### This is the USBSTS value at the time of interrupt
000010:0:ygg_driver_usb_xhci::controller:85: Event Ring: port 5 changed
usb_xhci_port_read port 5, off 0x0000, ret 0x00020ee1
usb_xhci_port_write port 5, off 0x0000, val 0x00020ee1
000010:0:ygg_driver_usb_xhci::controller:94: port 5 connected = true
usb_xhci_runtime_read off 0x0038, ret 0x00594008
usb_xhci_runtime_read off 0x003c, ret 0x00000000
usb_xhci_runtime_write off 0x0038, val 0x00594018 #### Interrupter 0 ERDP update
usb_xhci_runtime_write off 0x003c, val 0x00000000
000010:0:ygg_driver_usb_xhci::controller:159: IRQ finished
(qemu) device_del kbd0
usb_xhci_port_link port 5, pls 5
usb_xhci_port_notify port 5, bits 0x20000
usb_xhci_queue_event v 0, idx 1, ER_PORT_STATUS_CHANGE, CC_SUCCESS, p 0x0000000005000000, s 0x01000000, c 0x00008801
usb_xhci_oper_read off 0x0004, ret 0x00000008
usb_xhci_oper_write off 0x0004, val 0x00000008
usb_xhci_runtime_read off 0x0020, ret 0x00000002
usb_xhci_runtime_write off 0x0020, val 0x00000003
000018:0:ygg_driver_usb_xhci::controller:149: IRQ started: 0x8
000018:0:ygg_driver_usb_xhci::controller:85: Event Ring: port 5 changed
usb_xhci_port_read port 5, off 0x0000, ret 0x000202a0
usb_xhci_port_write port 5, off 0x0000, val 0x000202a0
000018:0:ygg_driver_usb_xhci::controller:94: port 5 connected = false
usb_xhci_runtime_read off 0x0038, ret 0x00594018
usb_xhci_runtime_read off 0x003c, ret 0x00000000
usb_xhci_runtime_write off 0x0038, val 0x00594028
usb_xhci_runtime_write off 0x003c, val 0x00000000
000018:0:ygg_driver_usb_xhci::controller:159: IRQ finished
And here's what happens on my laptop (obviously, no handy xhci regs tracing) when an USB flash drive is plugged into an USB3.1 port:

Code: Select all

IRQ started 0x18
Event Ring: port 6 changed (the USB3.1 port number)
port 6 connected = true
Event Ring: port 6 changed
port 6 connected = false
IRQ finished
IRQ started 0x18
Event Ring: port 2 changed
port 2 connected = true
IRQ finished
##### I disconnect the flash drive
IRQ started 0x18
Event Ring: port 2 changed
port 2 connected = false
IRQ finished
So looks like when a USB flash drive is connected to an USB3.1 port, the xHC first sends two events (connect+disconnect) immediately regarding the USB3.1 (6) port, then it sends an event regarding USB2.0 port (2). Both are the same physical port.
After that, I never receive any events from the 6th port again.

I've also noticed that unlike QEMU, a real xHC also sets the PCD=1 in USBSTS when issuing interrupts, which means "any port has a change bit transition from 0 to 1". Not quite sure if that's relevant here.

Re: xHCI Address Device (BSR=0) and Device Descriptor queries return completion code 4

Posted: Sat Feb 01, 2025 4:22 pm
by BenLunt
alnyannn wrote: Sat Feb 01, 2025 1:30 pm UPD: Just set up Bochs with OVMF, it's doing the same for me, but I'm absolutely sure it's OVMF — it's doing that even if I supply no cdrom image
There seems to be an issue with Bochs or OVMF. Make the following change to the bochsrc.txt file and see if it will now boot.

Code: Select all

-   ata1: enabled=0, ioaddr1=0x170, ioaddr2=0x370, irq=15
+   ata1: enabled=1, ioaddr1=0x170, ioaddr2=0x370, irq=15
i.e.: Enable ATA1 even though it isn't used...