Qemu: NVMe controller gets stuck during controller config

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Ethin
Member
Member
Posts: 625
Joined: Sun Jun 23, 2019 5:36 pm
Location: North Dakota, United States

Qemu: NVMe controller gets stuck during controller config

Post by Ethin »

I'm working on an NVMe driver and it seems to be getting stuck, either when the controller is reset or when its being enabled. (QEMU also seems to be alternating between supporting the NVM command set and only supporting the admin command set at random, which is weird.) The output of my kernel is below:
[INFO] [nvme] initializing controller
[INFO] [nvme] running controller checks
[INFO] [nvme] Checking controller version
[DEBUG] [nvme] VS = 7000800, 111000000000000100000000000
[INFO] [nvme] Checking command set support
[INFO] [nvme] NVM command set supported
[DEBUG] [nvme] CSS = 69, 1101001
[INFO] [nvme] device supports 4KiB pages
[DEBUG] [nvme] MPSMIN = 2, 10
[INFO] [nvme] resetting controller
[DEBUG] [nvme] CC = 216876D, 10000101101000011101101101
[DEBUG] [nvme] CSTS = 1C090703, 11100000010010000011100000011
[DEBUG] [nvme] CC[0] = 0
[DEBUG] [nvme] CC = 216876C, 10000101101000011101101100
I'm not honestly sure what's causing this -- it could be an issue with my code or with QEMU. My code is available over here. Initialization code for the NVMe driver is on lines 564-746 (I need to break this file up into chunks). I'm doing my best to follow the NVMe article as well as the NVMe base specification, revision 1.4a.
User avatar
nakst
Member
Member
Posts: 51
Joined: Sun Jan 17, 2016 7:57 am

Re: Qemu: NVMe controller gets stuck during controller confi

Post by nakst »

I'm surprised your version check for 1.4 passes on line 568, since Qemu's NVMe implementation only reports version 1.3: https://github.com/qemu/qemu/blob/maste ... nvme.c#L75

Are you sure you're accessing the registers correctly? You should be taking the address in BAR 0 from the PCI configuration space of the NVMe controller, clearing out the lower 4 bits, and mapping as uncacheable memory.
Ethin
Member
Member
Posts: 625
Joined: Sun Jun 23, 2019 5:36 pm
Location: North Dakota, United States

Re: Qemu: NVMe controller gets stuck during controller confi

Post by Ethin »

nakst wrote:I'm surprised your version check for 1.4 passes on line 568, since Qemu's NVMe implementation only reports version 1.3: https://github.com/qemu/qemu/blob/maste ... nvme.c#L75

Are you sure you're accessing the registers correctly? You should be taking the address in BAR 0 from the PCI configuration space of the NVMe controller, clearing out the lower 4 bits, and mapping as uncacheable memory.
I believe I am. I am definitely mapping as uncacheable and I am using BAR 0. My function for getting BARs does this when calculating the actual BAR:

Code: Select all

fn calculate_bar_addr(dev: &PCIDevice, addr: u32) -> usize {
    let bar1 = read_dword(dev.phys_addr as usize, addr);
    if !bar1.get_bit(0) {
        match bar1.get_bits(1..=2) {
            0 => (bar1 & 0xFFFF_FFF0) as usize,
            1 => (bar1 & 0xFFF0) as usize,
            2 => {
                let bar2 = read_dword(
                    dev.phys_addr as usize,
                    match addr {
                        BAR0 => BAR1,
                        BAR1 => BAR2,
                        BAR2 => BAR3,
                        BAR3 => BAR4,
                        BAR4 => BAR5,
                        BAR5 | _ => 0,
                    },
                );
                (((bar1 as u64) & 0xFFFF_FFF0) + (((bar2 as u64) & 0xFFFF_FFFF) << 32)) as usize
            }
            _ => bar1 as usize,
        }
    } else {
        (bar1 & 0xFFFF_FFFC) as usize
    }
}
User avatar
nakst
Member
Member
Posts: 51
Joined: Sun Jan 17, 2016 7:57 am

Re: Qemu: NVMe controller gets stuck during controller confi

Post by nakst »

Where is the code to map the registers as uncacheable memory?
Ethin
Member
Member
Posts: 625
Joined: Sun Jun 23, 2019 5:36 pm
Location: North Dakota, United States

Re: Qemu: NVMe controller gets stuck during controller confi

Post by Ethin »

Mapping as uncachable is in memory.rs, lines 264-295.
User avatar
nakst
Member
Member
Posts: 51
Joined: Sun Jan 17, 2016 7:57 am

Re: Qemu: NVMe controller gets stuck during controller confi

Post by nakst »

But you don't seem to call the function allocate_phys_range with the address in BAR0? Or perhaps I am missing something since I don't know Rust.

The value for the version register you gave in your first post looks very wrong.
[DEBUG] [nvme] VS = 7000800, 111000000000000100000000000
My guess is you are reading from the physical address by mistake?
Ethin
Member
Member
Posts: 625
Joined: Sun Jun 23, 2019 5:36 pm
Location: North Dakota, United States

Re: Qemu: NVMe controller gets stuck during controller confi

Post by Ethin »

My architecture for drivers is a bit unique. The driver takes a function to allocate physical addresses; it doesn't do memory allocation like that itself.
The BARs, according to my kernel, are (in hex) 408010, 0, 0, 0, 920010, and 20110000 (if its reading the PCIe address space properly). I've got two boot logs written, you can find them here and here, respectively. My PCIe memory space read functions are on lines 110-129 of pci.rs (I don't have a read_qword function). I'm not sure how much the boot logs help; I'm not reading the physical address but the data at that address.
Edit: yes, I'm definitely reading from BAR0.
Octocontrabass
Member
Member
Posts: 5562
Joined: Mon Mar 25, 2013 7:01 pm

Re: Qemu: NVMe controller gets stuck during controller confi

Post by Octocontrabass »

I don't think it's reading the PCIe space properly, those BARs look like nonsense. BARs should point to physical addresses that aren't already being used elsewhere, but your memory map says all of those addresses are already occupied by RAM.
Ethin
Member
Member
Posts: 625
Joined: Sun Jun 23, 2019 5:36 pm
Location: North Dakota, United States

Re: Qemu: NVMe controller gets stuck during controller confi

Post by Ethin »

Octocontrabass wrote:I don't think it's reading the PCIe space properly, those BARs look like nonsense. BARs should point to physical addresses that aren't already being used elsewhere, but your memory map says all of those addresses are already occupied by RAM.
That's what I was thinking. What is the "right way"? I get the address for the device in PCIe space, and then I read PCIe physical address + an offset of 0x10. The device PCIe address is at B0018000, so it would be reading from address B0018010. Is that how I do it, or am I missing something?
Edit: Okay, so I was calling .offset() on the address pointer. I've changed it to do simple addition, physical address + addr, instead. and that drastically changed things. (Version check still passes though... weird.) Now, when we boot, BARs are FEBD4000, 0, 0, 0, FEBD7000, and 0. The PCIe command register is 107h, and altering bits 10, 2, and 1 don't change that. The interrupt line is still 43. VS is now 10200h, CSS is 1 (bits 44:37 of CAP). MPSMIN is 0h, The oroginal AQA is FF003F (and I then change that to 7FF07FF). Now I'm just getting stuck on the controller enable process. I still think I'm doing something wrong though because my version check passes.
Edit 2: okay, so... I think I wasn't thinking clearly when I wrote some of this code :) because I use an and and not an or for my version check. Whoops? (If memory serves, I did code some of this while drunk, so... :D) I now get the expected "error" for version incompatibility, though I get version 1.2 and not 1.3. Might be my QEMU version though. Are there are any "unexpected" changes between 1.3 and 1.4/1.2 and 1.4 that I should know about? Or is it safe, for now, to follow the 1.4 spec?
Octocontrabass
Member
Member
Posts: 5562
Joined: Mon Mar 25, 2013 7:01 pm

Re: Qemu: NVMe controller gets stuck during controller confi

Post by Octocontrabass »

Ethin wrote:Are there are any "unexpected" changes between 1.3 and 1.4/1.2 and 1.4 that I should know about? Or is it safe, for now, to follow the 1.4 spec?
You might be interested in the lists of changes between NVMe 1.2 and 1.3 and between NVMe 1.3 and 1.4. I don't know how significant any of these changes are, though.
Ethin
Member
Member
Posts: 625
Joined: Sun Jun 23, 2019 5:36 pm
Location: North Dakota, United States

Re: Qemu: NVMe controller gets stuck during controller confi

Post by Ethin »

Octocontrabass wrote:
Ethin wrote:Are there are any "unexpected" changes between 1.3 and 1.4/1.2 and 1.4 that I should know about? Or is it safe, for now, to follow the 1.4 spec?
You might be interested in the lists of changes between NVMe 1.2 and 1.3 and between NVMe 1.3 and 1.4. I don't know how significant any of these changes are, though.
Thanks for those! Looking at the QEMU source for 5.1.0, it does only implement NVMe 1.2. That sadly means I'll have to go acquire the 1.2 specifications, wait for the next QEMU release and then acquire 1.3. I wonder if the NVMe driver is relatively new?
Ethin
Member
Member
Posts: 625
Joined: Sun Jun 23, 2019 5:36 pm
Location: North Dakota, United States

Re: Qemu: NVMe controller gets stuck during controller confi

Post by Ethin »

Update: so I know why the kernel is hanging when it enables the controller. Apparently Cc.CFS is being set. I'm not quite sure why this is happening. Looking at the QEMU source code for NVMe doesn't really help much, unfortunately. According to the NVMe specification, I am to do the following to initialize the controller:
1. Set PCIe configuration,
2. Wait for CSTS.RDY to be zero (I don't do this),
3. Configure the admin queue, including the AQA, ASQ, and ACQ registers,
4. Configure the controller settings, specifically the arbitration mechanism, the memory page size, and the I/O command set, and
5. Set CC.EN to one (this is where it all breaks).
There are more, but this is what works (and then dies).
Edit: So I know why its failing (thank you, qemu traces). I just have no idea how to actually fix it. The qemu traces say:
[email protected]:pci_nvme_err_startfail_cqent_too_small nvme_start_ctrl failed because the completion queue entry size is too small: log2size=0, min=15
[email protected]:pci_nvme_err_startfail setting controller enable bit failed
The problem is that the NVMe specification says this for queue entry sizes:
23:20 RW/RO 0h
I/O Completion Queue Entry Size (IOCQES): This field defines the I/O Completion Queue entry size that is used for the selected I/O Command Set. The required and maximum values for this field are specified in the CQES field in the Identify Controller data structure in Figure 247 for each I/O Command Set. The value is in bytes and is specified as a power of two (2^n). If any I/O Completion Queues exist, then write operations that change the value in this field produce undefined results. If the controller does not support I/O queues, then this field shall be read-only with a value of 0h.
19:16 RW/RO 0h
I/O Submission Queue Entry Size (IOSQES): This field defines the I/O Submission Queue entry size that is used for the selected I/O Command Set. The required and maximum values for this field are specified in the SQES field in the Identify Controller data structure in Figure 247 for each I/O Command Set. The value is in bytes and is specified as a power of two (2^n). If any I/O Submission Queues exist, then write operations that change the value in this field produce undefined results. If the controller does not support I/O queues, then this field shall be read-only with a value of 0h.
I can rely on cap.MQES, but the problem is that that's a 16-bit integer, whereas these only want four bits. SeaBIOS does this for its its initialization:

Code: Select all

    ctrl->reg->cc = NVME_CC_EN | (NVME_CQE_SIZE_LOG << 20)
        | (NVME_SQE_SIZE_LOG << 16 /* IOSQES */);
NVME_CQE_SIZE_LOG is 4 and NVME_SQE_SIZE_LOG is 6, but this seems... really arbitrary and I don't understand how they get that.
User avatar
BenLunt
Member
Member
Posts: 941
Joined: Sat Nov 22, 2014 6:33 pm
Location: USA
Contact:

Re: Qemu: NVMe controller gets stuck during controller confi

Post by BenLunt »

Ethin wrote:My code is available over here. Initialization code for the NVMe driver is on lines 564-746 (I need to break this file up into chunks). I'm doing my best to follow the NVMe article as well as the NVMe base specification, revision 1.4a.
Hi Ethin,

I have recently started my NVMe driver as well, so it is nice to see someone else posting on this subject. May I ask what article you are referring to above?

Also, may I make a few comments about your code? Don't take them as criticism, since first I am not like that :-), and second, I don't know RUST at all, so I may be all wet here.

Using the latest code you have posted:
Starting with line 578, if bit 37 or bit 44 in the CAPS register is set, line 582 is redundant. In fact, line 583 will never be executed if bit 37 or bit 44 is set.

Line 593 is in error. You use "greater than or equal to". What if the minimum is greater than 4096? If this is true, the next line is wrong.

Line 604 writes a zero to the CC.EN bit whether it was already zero or not. As you later state, you probably should check this first. If it is already zero, no need to write to it. If it is one, you need to wait for the CSTS.RDY bit to also become zero, possibly timing out after so long if it doesn't.

What is the intent of line 606? "self.cc.write(cc);" Does this read from itself, then write the same value back? This would be my assumption, but there are no "Write Clear" bits in CC, so I am confused by its use.

I would put a timeout in the loop at Line 608. If the bit never becomes clear, you won't get very far.

Just asking a question and then providing some thoughts.

Thanks,
Ben
Ethin
Member
Member
Posts: 625
Joined: Sun Jun 23, 2019 5:36 pm
Location: North Dakota, United States

Re: Qemu: NVMe controller gets stuck during controller confi

Post by Ethin »

BenLunt wrote:
Ethin wrote:My code is available over here. Initialization code for the NVMe driver is on lines 564-746 (I need to break this file up into chunks). I'm doing my best to follow the NVMe article as well as the NVMe base specification, revision 1.4a.
Hi Ethin,

I have recently started my NVMe driver as well, so it is nice to see someone else posting on this subject. May I ask what article you are referring to above?
I am referring to the NVMe article, https://wiki.osdev.org/NVMe
BenLunt wrote: Also, may I make a few comments about your code? Don't take them as criticism, since first I am not like that :-), and second, I don't know RUST at all, so I may be all wet here.

Using the latest code you have posted:
Starting with line 578, if bit 37 or bit 44 in the CAPS register is set, line 582 is redundant. In fact, line 583 will never be executed if bit 37 or bit 44 is set.
Good point, thanks!
BenLunt wrote:Line 593 is in error. You use "greater than or equal to". What if the minimum is greater than 4096? If this is true, the next line is wrong.
I'll definitely fix that. Thanks!
BenLunt wrote:Line 604 writes a zero to the CC.EN bit whether it was already zero or not. As you later state, you probably should check this first. If it is already zero, no need to write to it. If it is one, you need to wait for the CSTS.RDY bit to also become zero, possibly timing out after so long if it doesn't.
I don't have a timeout mechanism set up yet; though I have access to the RTC and such, waiting exactly 500ms is something I'm not quite sure about. My ACPI library, I don't believe, lets me access the HPET yet (I'll check). My kernel does have a sleep function but its a bit convoluted.
BenLunt wrote:What is the intent of line 606? "self.cc.write(cc);" Does this read from itself, then write the same value back? This would be my assumption, but there are no "Write Clear" bits in CC, so I am confused by its use.
self.cc.write() is a Voladdress<> from the voladdress crate. It makes it easy to read and write MMIO values in a volatile manner without having to mess with pointers (pointers are managed internally).
BenLunt wrote:I would put a timeout in the loop at Line 608. If the bit never becomes clear, you won't get very far.

Just asking a question and then providing some thoughts.

Thanks,
Ben
Again, Thanks for all your suggestions!
Octocontrabass
Member
Member
Posts: 5562
Joined: Mon Mar 25, 2013 7:01 pm

Re: Qemu: NVMe controller gets stuck during controller confi

Post by Octocontrabass »

Ethin wrote:I can rely on cap.MQES, but the problem is that that's a 16-bit integer, whereas these only want four bits.
The size is specified as a power of 2, so you need to write the binary logarithm of the size you want. It just so happens that four bits is enough to represent it in that form.
Ethin wrote:SeaBIOS does this for its its initialization:
[...]
NVME_CQE_SIZE_LOG is 4 and NVME_SQE_SIZE_LOG is 6, but this seems... really arbitrary and I don't understand how they get that.
They took the binary logarithm of 16 and 64.
Post Reply