MSI problems on Virtualbox

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
User avatar
justinian
Posts: 19
Joined: Mon Apr 23, 2018 6:25 pm
Location: San Francisco

MSI problems on Virtualbox

Post by justinian »

So I've been developing an AHCI driver and using MSI-based interrupts. My main dev/test iteration is with QEMU/OVMF (where it works as expected), but I periodically test on virtualbox/uefi64, and there I've been running into a weird issue:

On virtualbox, I find the PCI capability structure for MSI, with a control word of 0x0170. This seems like an incorrect value, as the PCI spec says the following about MSI's control word:
  • 15::09 reserved (always 0 on read)
  • 8 Per-vector masking capable (RO)
  • 7 64 bit address capable (RO)
  • 6::4 Multiple message enable - indicates 2**n allocated vectors (RW, 0 on reset, 110 & 111 are reserved values)
  • 3::1 Multiple message capable - indicates 2**n enabled vectors (R0)
  • 0 MSI enable (RW, 0 after reset)
So:
  • current control value don't at all match post-reset (6::4 are 111 not 000) - this one actually seems conceptually fine, it could be set that way by UEFI firmware before my bootloader
  • 111 in 6::4 is a reserved value and doesn't make sense
  • even if 111 (which should signify 2**7 interrupts) were valid, 3::1 say the device only supports 2**0 (1) interrupt, so it's still invalid.
I'm sure this is somehow a bug of mine and not Virtualbox, but I'm not sure where. The capability pointers seem valid - MSI is first in the linked list, and points to a Power Management capability, and a SATA capability, so that looks ok. The result of this (or, i think this is the cause) is that I don't get the MSI interrupt that I expect to get on an AHCI read. Reading the per-vector mask returns 0, but pending interrupts returns 0x01000000, which again, doesn't make sense when control bits 3::1 are 0, because the device should only support bit 0 in the pending register.

Does anyone have any ideas about what the mistake I'm making is?

You can see my PCI device class here, but here are the relevant bits:

Code: Select all

struct pci_cap_msi
{
    pci_cap::type id;
    uint8_t next;
    uint16_t control;
} __attribute__ ((packed));

struct pci_cap_msi32
{
    pci_cap::type id;
    uint8_t next;
    uint16_t control;
    uint32_t address;
    uint16_t data;
    uint16_t reserved;
    uint32_t mask;
    uint32_t pending;
} __attribute__ ((packed));

struct pci_cap_msi64
{
    pci_cap::type id;
    uint8_t next;
    uint16_t control;
    uint64_t address;
    uint16_t data;
    uint16_t reserved;
    uint32_t mask;
    uint32_t pending;
} __attribute__ ((packed));

pci_device::pci_device(pci_group &group, uint8_t bus, uint8_t device, uint8_t func) :
    m_base(group.base_for(bus, device, func)),
    m_msi(nullptr),
    m_bus_addr(bus_addr(bus, device, func)),
    m_irq(isr::isrIgnoreF)
{
    // snipped saving out the vendor/device/class/etc fields into class members

    uint16_t *command = reinterpret_cast<uint16_t *>(&m_base[1]);
    *command |= 0x400; // Mask old INTx style interrupts

    log::info(logs::device, "Found PCIe device at %02d:%02d:%d of type %d.%d id %04x:%04x",
            bus, device, func, m_class, m_subclass, m_vendor, m_device);

    // m_base is a uint32_t* pointing at the start of the PCIe configuration space, so [13] is offset 0x34
    // which is the capabilities pointer byte
    uint8_t next = m_base[13] & 0xff;
    while (next) {
        pci_cap *cap = reinterpret_cast<pci_cap *>(kutil::offset_pointer(m_base, next));
        next = cap->next;
        log::debug(logs::device, "  - found PCI cap type %02x", cap->id);

        if (cap->id == pci_cap::type::msi) {
            m_msi = cap;
            pci_cap_msi *mcap = reinterpret_cast<pci_cap_msi *>(cap);
            mcap->control |= ~0x1; // Mask interrupts
            log::debug(logs::device, "  - MSI control %04x", mcap->control);
        }
    }
}

void
pci_device::write_msi_regs(addr_t address, uint16_t data)
{
    kassert(m_msi, "Tried to write MSI for a device without that cap");
    if (m_msi->id == pci_cap::type::msi) {
        pci_cap_msi *mcap = reinterpret_cast<pci_cap_msi *>(m_msi);
        if (mcap->control & 0x0080) {
            pci_cap_msi64 *mcap64 = reinterpret_cast<pci_cap_msi64 *>(m_msi);
            mcap64->address = address;
            mcap64->data = data;
            if (mcap64->control & 0x0100)
                log::debug(logs::device, "  - MSI mask %08x pending %08x", mcap64->mask, mcap64->pending);
        } else {
            pci_cap_msi32 *mcap32 = reinterpret_cast<pci_cap_msi32 *>(m_msi);
            mcap32->address = address;
            mcap32->data = data;
            if (mcap32->control & 0x0100)
                log::debug(logs::device, "  - MSI mask %08x pending %08x", mcap32->mask, mcap32->pending);
        }

        uint16_t control = mcap->control;
        control &= 0xff8f; // We're allocating one vector, clear 6::4
        control |= 0x0001; // Enable MSI
        mcap->control = control;
    } else {
        kassert(0, "MIS-X is NYI");
    }
}
Relevant log output is:

Code: Select all

dev  debug: Found MCFG entry: base dc000000  group 0  bus 0-63
dev  debug: Probing PCI group at base ffffff80dc000000
dev   info: Found PCIe device at 00:02:0 of type 3.0 id 80ee:beef
dev   info: Found PCIe device at 00:03:0 of type 2.0 id 1022:2000
dev   info: Found PCIe device at 00:04:0 of type 8.128 id 80ee:cafe
dev   info: Found PCIe device at 00:07:0 of type 6.128 id 8086:7113
dev   info: Found PCIe device at 00:31:0 of type 6.1 id 8086:27b9
dev   info: Found PCIe device at 00:31:2 of type 1.6 id 8086:2829
dev  debug:   - found PCI cap type 05
dev  debug:   - MSI control 0170
dev  debug:   - found PCI cap type 01
dev  debug:   - found PCI cap type 12
dev  debug:   Found table BGRT
driv  info: AHCI registering device 0:31:2:
driv debug:   1 ports
driv debug:   32 command slots
driv  info: Found device type SATA at port 0
mem  debug: Got request to offset map 4 pages
driv debug: Rebasing address for AHCI port 0 to ffffff8000050000 [4]
dev  debug: Allocating IRQ 10 to AHCI Device.
dev  debug:   - MSI mask 00000000 pending 01000000
driv debug: Created command, slot 0, 1 PRD entries.
driv debug: Reading 1 sectors, starting from 1 (0x200)
Developing: jsix - UEFI-booted x64 kernel
linuxyne
Member
Member
Posts: 211
Joined: Sat Jul 02, 2016 7:02 am

Re: MSI problems on Virtualbox

Post by linuxyne »

This line looks like the cause of reading the unexpected value:

Code: Select all

mcap->control |= ~0x1; // Mask interrupts
User avatar
justinian
Posts: 19
Joined: Mon Apr 23, 2018 6:25 pm
Location: San Francisco

Re: MSI problems on Virtualbox

Post by justinian »

linuxyne wrote:This line looks like the cause of reading the unexpected value:

Code: Select all

mcap->control |= ~0x1; // Mask interrupts
Yep, it totally was. (Derp.) Sadly it was a red herring. I found that a little while after but then was still not getting interrupts, but the SNotification flag was being set after issuing my read command. The HBA reports not supporting SNotification, and also isn't a port multiplier. (The way the spec is written, it seems like SNotification is for reporting events from devices attached through a multiplier? I can't actually tell.)

I suspect that's still a red herring, though. I think I must just have some bug preventing me from getting intterupts at all from the HBA.

Thanks for the code review, though!
Developing: jsix - UEFI-booted x64 kernel
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: MSI problems on Virtualbox

Post by Brendan »

Hi,
justinian wrote:Does anyone have any ideas about what the mistake I'm making is?
I didn't see anything that should cause your problem; but I noticed an unrelated bug.

For safety; you want to check bit 4 of the "status register" (offset 0x0006 in the device's PCI configuration space) to determine if the capability list exists before trying to parse capability list. If you assume it exists and it doesn't, then you end up reading "random" data and misinterpreting it.

Note that a PCI-E device that has no IRQs may not have a capability list; and neither MSI nor the capability list is required for any "PCI conventional" device that's behind a PCI-E to PCI bridge.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
justinian
Posts: 19
Joined: Mon Apr 23, 2018 6:25 pm
Location: San Francisco

Re: MSI problems on Virtualbox

Post by justinian »

Brendan wrote: For safety; you want to check bit 4 of the "status register" (offset 0x0006 in the device's PCI configuration space) to determine if the capability list exists before trying to parse capability list. If you assume it exists and it doesn't, then you end up reading "random" data and misinterpreting it.

Note that a PCI-E device that has no IRQs may not have a capability list; and neither MSI nor the capability list is required for any "PCI conventional" device that's behind a PCI-E to PCI bridge.
Oh hey, good catch. I somehow assumed that if there was no capability list, it had to be 0. Thanks for the code review!
Developing: jsix - UEFI-booted x64 kernel
User avatar
justinian
Posts: 19
Joined: Mon Apr 23, 2018 6:25 pm
Location: San Francisco

Re: MSI problems on Virtualbox

Post by justinian »

justinian wrote: I suspect that's still a red herring, though. I think I must just have some bug preventing me from getting intterupts at all from the HBA.
Thanks for all the help so far. Now I'm pretty sure the SNotification stuff was a useless line of investigation - the virtualbox debugger shows the AHCI registers without the SNotification one, with the values I expect for the ones it does have. Setting up a timeout and a dump afterwards makes it look like it's not that I'm not getting the interrupts from the device, but rather that it's never processing the commands and generating interrupts in the first place. :?

Both before and after timeout the registers contain the same values:

Code: Select all

HBA Registers:
   CAP: c8241f80
   GHC: 80000002
    IS: 00000000
    PI: 00000001
    VS: 00010100
   C3C: 00000000
   C3P: 00000000
   EML: 00000000
   EMC: 00000000
  CAP2: 00000000
  BOHC: 00000000

Port Registers:
   CLB: 00050000
  +CLB: 00000000
    FB: 00050400
   +FB: 00000000
    IS: 00000000
    IE: fdc000ff
   CMD: 0011c117
   TFD: 00000050
   SIG: 00000101
  SSTS: 00000123
  SCTL: 00000300
  SERR: 00000000
  SACT: 00000000
    CI: 00000003
  SNTF: 00000003
   FBS: 00000000
  DEVS: 00000000
After the timeout, the virtualbox debugger has this to say:

Code: Select all

VBoxDbg> info ahci0
ahci#0: mmio=00000000e8404000 ports=1 GC=true  R0=true 
HbaCap=0xc8241f80
HbaCtrl=0x80000002
HbaIs=0x0
HbaPi=0x1
HbaVs=0x10100
HbaCccCtl=0x0
HbaCccPorts=0x0
PortsInterrupted=0x0
Port 0: device-attached=true 
PortClb=0x50000
PortClbU=0x0
PortFb=0x50400
PortFbU=0x0
PortIs=0x0
PortIe=0xfdc000ff
PortCmd=0x11c017
PortTfd=0x50
PortSig=0x101
PortSSts=0x123
PortSCtl=0x300
PortSErr=0x0
PortSAct=0x0
PortCi=0x3
PortPhysClb=0000000000050000
PortPhysFb=0000000000050400
PortActTasksActive=0
PortPoweredOn=true 
PortSpunUp=true 
PortFirstD2HFisSent=true 
PortATAPI=false
PortTasksFinished=0x0
PortQueuedTasksFinished=0x0
PortTasksNew=0x0
That PortTasksFinished=0x0 makes me think it's just not running commands. But the command register is reporting "command list running" and "FIS receive running" in bits 15/16.

I feel like everything you need to know to implement AHCI is split among like five different specs (PCI, ATA Commands, SATA, AHCI... okay four.), so I'm probably piecing it all together badly and missing some piece, and qemu/ovmf just happens to set things up for me whereas vbox doesn't. Any pointers?
Developing: jsix - UEFI-booted x64 kernel
linuxyne
Member
Member
Posts: 211
Joined: Sat Jul 02, 2016 7:02 am

Re: MSI problems on Virtualbox

Post by linuxyne »

vbox checks for a valid CFL value.

Code: Select all

    AssertMsgReturn((cmdHdr.u32DescInf & AHCI_CMDHDR_CFL_MASK) * sizeof(uint32_t) == AHCI_CMDFIS_TYPE_H2D_SIZE,
                    ("This is not a command FIS!!\n"),
                    false);
If I did not miss anything vital, it seems that popcorn keeps it at 0. Can't say for sure if this alone is a fix, but it looks like a potential issue.
User avatar
justinian
Posts: 19
Joined: Mon Apr 23, 2018 6:25 pm
Location: San Francisco

Re: MSI problems on Virtualbox

Post by justinian »

linuxyne wrote: If I did not miss anything vital, it seems that popcorn keeps it at 0. Can't say for sure if this alone is a fix, but it looks like a potential issue.
I.. wow. Yeah, I even had a cmd_list_fis_size() helper function to set it, but never call it. Seems like I deleted that by accident when I moved to interrupt-based reads. I am kind of astonished that QEMU just happily went along with it.

Seems like that wasn't the problem, but it was certainly a problem. Thanks a ton for catching that.
Developing: jsix - UEFI-booted x64 kernel
linuxyne
Member
Member
Posts: 211
Joined: Sat Jul 02, 2016 7:02 am

Re: MSI problems on Virtualbox

Post by linuxyne »

To test the interrupt generation, one can exercise the port reset path.
Instead of sending the identify command, reset the port (after the port setup, update, rebase, etc. are done, and interrupts are enabled on the port and the hba.)

set sctl.det = 1
set sctl.det = 0

The function ahciPortResetFinish queues the firstd2hfis, sets DHRS and fires an interrupt for this port.

Is it possible to run a debug build of vbox? It has plenty of log messages, but they may be unavailable in a release build.

Late Edit: Does the OS also guard against optimizations surrounding the reading/writing of ahci registers?
User avatar
justinian
Posts: 19
Joined: Mon Apr 23, 2018 6:25 pm
Location: San Francisco

Re: MSI problems on Virtualbox

Post by justinian »

linuxyne wrote:To test the interrupt generation, one can exercise the port reset path.
Instead of sending the identify command, reset the port (after the port setup, update, rebase, etc. are done, and interrupts are enabled on the port and the hba.)

set sctl.det = 1
set sctl.det = 0

The function ahciPortResetFinish queues the firstd2hfis, sets DHRS and fires an interrupt for this port.
Well that certainly generated a ton of interrupts. :D (Just one, really, but I wasn't yet handling/clearing IS.PCS and SERR.X) Ok, so now I know my interrupt paths are actually enabled and working. Thanks for that suggestion.
linuxyne wrote:Is it possible to run a debug build of vbox? It has plenty of log messages, but they may be unavailable in a release build.
I was just working on building a debug vbox, actually. (Your last comment made me go double-check log settings to make sure I wasn't seeing those errors, but I discovered that the release log settings vbox exposes are not as useful.)
Late Edit: Does the OS also guard against optimizations surrounding the reading/writing of ahci registers?
Not yet, but I'm building with no optimizations and AFAICT haven't seen any in the disassembly. (I haven't been looking explicitly for memory ordering issues, but I have been over the disassembly for these functions a lot lately.) I should still go through and at least add some 'volatile's in there.
Developing: jsix - UEFI-booted x64 kernel
linuxyne
Member
Member
Posts: 211
Joined: Sat Jul 02, 2016 7:02 am

Re: MSI problems on Virtualbox

Post by linuxyne »

That the little test succeeded in generating interrupts could mean that the programming of the identify (or read) command is not as vbox expects it to be. VBox's debug logs seem to be an easy way to discover its expectations.
Post Reply