Qemu: NVMe controller gets stuck during controller config

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
User avatar
BenLunt
Member
Member
Posts: 941
Joined: Sat Nov 22, 2014 6:33 pm
Location: USA
Contact:

Re: Qemu: NVMe controller gets stuck during controller confi

Post by BenLunt »

BenLunt wrote:Update: (At a glance) it looks like version 1.2 (of the QEMU code) doesn't support Scatter Gather
Just for an update. The new version of QEMU (5.2.0-rc2) now includes the patch for version 1.3 of the NVMe.

Ben
- http://www.fysnet.net/osdesign_book_series.htm
User avatar
IanSeyler
Member
Member
Posts: 326
Joined: Mon Jul 28, 2008 9:46 am
Location: Ontario, Canada
Contact:

Re: Qemu: NVMe controller gets stuck during controller confi

Post by IanSeyler »

Sorry to necro this thread but I'm in the middle of writing an NVMe driver myself. Ran into the same issue here with the IOCQES and IOSQES needing to be set. According to QEMU the NVMe controller was successfully enabled now.

I'm just trying to wrap my head around the data structures involved now. My ASQ base is temporarily set to 0x8000.

Two questions:
Is 0x8000 where I can build my first 64-byte Submission Queue Entry? Attempting an IDENTIFY (0x06) first.
How does one work the doorbell? I need to point it to the next queue entry?

My current work in progress is here.

Thanks,
-Ian
BareMetal OS - http://www.returninfinity.com/
Mono-tasking 64-bit OS for x86-64 based computers, written entirely in Assembly
Octocontrabass
Member
Member
Posts: 5563
Joined: Mon Mar 25, 2013 7:01 pm

Re: Qemu: NVMe controller gets stuck during controller confi

Post by Octocontrabass »

IanSeyler wrote:Is 0x8000 where I can build my first 64-byte Submission Queue Entry?
If you mean physical address 0x8000, yes. (Unless you're using an IOMMU with DMAR...)
IanSeyler wrote:How does one work the doorbell? I need to point it to the next queue entry?
Yes. You can make multiple submissions with a single write to the doorbell.
User avatar
IanSeyler
Member
Member
Posts: 326
Joined: Mon Jul 28, 2008 9:46 am
Location: Ontario, Canada
Contact:

Re: Qemu: NVMe controller gets stuck during controller confi

Post by IanSeyler »

Thanks! What I don't understand is the doorbell head/tail location. Where is that specified? I'm not sure how QEMU got NVME_Base+0x1000 and NVME_Base+0x1004.

QEMU:

Code: Select all

ci_nvme_admin_cmd cid 0 sqid 0 opc 0x6 opname 'NVME_ADM_CMD_IDENTIFY'
pci_nvme_identify cid 0 cns 0x1 ctrlid 0 csi 0x0
pci_nvme_identify_ctrl identify controller
pci_nvme_map_prp trans_len 4096 len 4096 prp1 0xff96000 prp2 0x0 num_prps 2
pci_nvme_map_addr addr 0xff96000 len 4096
pci_nvme_enqueue_req_completion cid 0 cqid 0 dw0 0x0 dw1 0x0 status 0x0
pci_nvme_irq_pin pulsing IRQ pin
My code:

Code: Select all

pci_nvme_admin_cmd cid 0 sqid 0 opc 0x6 opname 'NVME_ADM_CMD_IDENTIFY'
pci_nvme_identify cid 0 cns 0x1 ctrlid 0 csi 0x0
pci_nvme_identify_ctrl identify controller
pci_nvme_map_prp trans_len 4096 len 4096 prp1 0xc000 prp2 0x0 num_prps 2
pci_nvme_map_addr addr 0xc000 len 4096
pci_nvme_enqueue_req_completion cid 0 cqid 0 dw0 0x0 dw1 0x0 status 0x0
pci_nvme_irq_pin pulsing IRQ pin
Making progress it seems at least!
BareMetal OS - http://www.returninfinity.com/
Mono-tasking 64-bit OS for x86-64 based computers, written entirely in Assembly
Octocontrabass
Member
Member
Posts: 5563
Joined: Mon Mar 25, 2013 7:01 pm

Re: Qemu: NVMe controller gets stuck during controller confi

Post by Octocontrabass »

IanSeyler wrote:What I don't understand is the doorbell head/tail location. Where is that specified?
It's in the NVMe over PCIe transport specification. It used to be part of the base specification.
User avatar
IanSeyler
Member
Member
Posts: 326
Joined: Mon Jul 28, 2008 9:46 am
Location: Ontario, Canada
Contact:

Re: Qemu: NVMe controller gets stuck during controller confi

Post by IanSeyler »

Thanks for that! I see how that works now.

Ok next up: Processing the Admin Completion ring (stored at 0x9000 for now).

Currently I queue up 4 commands and run them: Identify Controller, Get the Active Namespace, Create I/O Completion Queue, and Create I/O Submission Queue

Code: Select all

pci_nvme_mmio_write addr 0x1004 data 0x0 size 4
pci_nvme_mmio_doorbell_cq cqid 0 new_head 0
pci_nvme_mmio_write addr 0x1000 data 0x4 size 4
pci_nvme_mmio_doorbell_sq sqid 0 new_tail 4
pci_nvme_admin_cmd cid 0 sqid 0 opc 0x6 opname 'NVME_ADM_CMD_IDENTIFY'
pci_nvme_identify cid 0 cns 0x1 ctrlid 0 csi 0x0
pci_nvme_identify_ctrl identify controller
pci_nvme_map_prp trans_len 4096 len 4096 prp1 0xc000 prp2 0x0 num_prps 2
pci_nvme_map_addr addr 0xc000 len 4096
pci_nvme_enqueue_req_completion cid 0 cqid 0 dw0 0x0 dw1 0x0 status 0x0
pci_nvme_admin_cmd cid 0 sqid 0 opc 0x6 opname 'NVME_ADM_CMD_IDENTIFY'
pci_nvme_identify cid 0 cns 0x2 ctrlid 0 csi 0x0
pci_nvme_identify_nslist nsid 0
pci_nvme_map_prp trans_len 4096 len 4096 prp1 0xd000 prp2 0x0 num_prps 2
pci_nvme_map_addr addr 0xd000 len 4096
pci_nvme_enqueue_req_completion cid 0 cqid 0 dw0 0x0 dw1 0x0 status 0x0
pci_nvme_admin_cmd cid 1 sqid 0 opc 0x5 opname 'NVME_ADM_CMD_CREATE_CQ'
pci_nvme_create_cq create completion queue, addr=0xb000, cqid=1, vector=0, qsize=1, qflags=1, ien=0
pci_nvme_enqueue_req_completion cid 1 cqid 0 dw0 0x0 dw1 0x0 status 0x0
pci_nvme_admin_cmd cid 1 sqid 0 opc 0x1 opname 'NVME_ADM_CMD_CREATE_SQ'
pci_nvme_create_sq create submission queue, addr=0xa000, sqid=1, cqid=1, qsize=1, qflags=1
pci_nvme_enqueue_req_completion cid 1 cqid 0 dw0 0x0 dw1 0x0 status 0x0
pci_nvme_irq_pin pulsing IRQ pin
Admin Completion Queue:

Code: Select all

(qemu) xp /64xw 0x9000
0000000000009000: 0x00000000 0x00000000 0x00000004 0x00010000
0000000000009010: 0x00000000 0x00000000 0x00000004 0x00010000
0000000000009020: 0x00000000 0x00000000 0x00000004 0x00010001
0000000000009030: 0x00000000 0x00000000 0x00000004 0x00010001
0000000000009040: 0x00000000 0x00000000 0x00000000 0x00000000
0000000000009050: 0x00000000 0x00000000 0x00000000 0x00000000
So based on that the ring head is currently set to 0x4 and the phase tag (bit 16 of DWORD 3) is currently set.

Do I clear the phase tag myself? Can I manually start the ring at entry 0 again if I want to?

Thanks,
-Ian
BareMetal OS - http://www.returninfinity.com/
Mono-tasking 64-bit OS for x86-64 based computers, written entirely in Assembly
Octocontrabass
Member
Member
Posts: 5563
Joined: Mon Mar 25, 2013 7:01 pm

Re: Qemu: NVMe controller gets stuck during controller confi

Post by Octocontrabass »

IanSeyler wrote:Do I clear the phase tag myself?
Before you enable a completion queue, you need to clear the tag, which you'll probably want to do by zeroing the whole buffer. Once the completion queue is enabled, the controller automatically switches between setting and clearing the tag every time it reaches the end of the buffer and restarts at the beginning.
IanSeyler wrote:Can I manually start the ring at entry 0 again if I want to?
I think the only way to restart the ACQ at entry 0 is by resetting the whole controller. I don't think you'll want to do that very often.
Ethin
Member
Member
Posts: 625
Joined: Sun Jun 23, 2019 5:36 pm
Location: North Dakota, United States

Re: Qemu: NVMe controller gets stuck during controller confi

Post by Ethin »

No, you can't arbitrarily restart the ring at position zero without a controller reset. The phase bit will be toggled whenever the controller wraps around and starts processing new entries. You use the new state of the phase bit to tell the difference between new and old CQEs.
Edit: to clarify about the phase bit, assume you've got a ring of 64 entries and you've just zeroed the buffer. When you send the "Identify" command, your interrupt handler will loop through the ring checking the phase bit of each entry. If the phase bit is set (as an example), you know that's a new entry. So all new entries will have their phase bits set to indicate new entries. The setting of that bit continues until you send a 65th command. That command obviously won't be placed at index 64; rather, it'll be placed at position 64%64, so the controller will write the new CQE there. However, now you can't tell the difference between new and old entries! To solve this problem, the controller changes the "new entry" indicator to a clearing of the phase bit. So now when your interrupt handler goes through the ring, it must check which entries have their phase bits cleared, not set. So its your responsibility to know when to wrap around and what the new phase bit state should be for new entries. That thankfully isn't too difficult to do, which is what makes NVMe so pleasant to program. I really hope you have fun working with NVMe -- I certainly do!
User avatar
IanSeyler
Member
Member
Posts: 326
Joined: Mon Jul 28, 2008 9:46 am
Location: Ontario, Canada
Contact:

Re: Qemu: NVMe controller gets stuck during controller confi

Post by IanSeyler »

EDIT: I'm writing the I/O queue head and tail backwards below (it should be head 0, tail 1). Changing this I get a read error now at least!

Thanks everyone! I think I have a decent enough grasp of the the admin queues now. I don't have interrupts enabled so I'm checking the completion queue before I add new admin commands and update the doorbell again.

Now on to actually attempting a disk read. I'm assuming the I/O submission and completion buffers I created earlier work like the Admin ones.

My I/O rings are as follows:

Code: Select all

pci_nvme_admin_cmd cid 1 sqid 0 opc 0x5 opname 'NVME_ADM_CMD_CREATE_CQ'
pci_nvme_create_cq create completion queue, addr=0xb000, cqid=1, vector=0, qsize=1, qflags=1, ien=0
pci_nvme_enqueue_req_completion cid 1 cqid 0 dw0 0x0 dw1 0x0 status 0x0
pci_nvme_admin_cmd cid 1 sqid 0 opc 0x1 opname 'NVME_ADM_CMD_CREATE_SQ'
pci_nvme_create_sq create submission queue, addr=0xa000, sqid=1, cqid=1, qsize=1, qflags=1
pci_nvme_enqueue_req_completion cid 1 cqid 0 dw0 0x0 dw1 0x0 status 0x0
pci_nvme_irq_pin pulsing IRQ pin
I build a 64-byte command in memory at 0xA000 and update the doorbell for it:

Code: Select all

pci_nvme_mmio_write addr 0x1008 data 0x0 size 4
pci_nvme_mmio_doorbell_sq sqid 1 new_tail 0
pci_nvme_mmio_write addr 0x100c data 0x1 size 4
pci_nvme_mmio_doorbell_cq cqid 1 new_head 1
That seems to be where it ends. No read actually takes place. I was at least expecting QEMU to display something.

I'm pretty sure I'm writing to the correct doorbell locations for that I/O queue as if I try to write to the next doorbell location for QID 2 I get the following:

Code: Select all

pci_nvme_mmio_write addr 0x1010 data 0x0 size 4
pci_nvme_ub_db_wr_invalid_sq submission queue doorbell write for nonexistent queue, sqid=2, ignoring
pci_nvme_mmio_write addr 0x1014 data 0x1 size 4
pci_nvme_ub_db_wr_invalid_cq completion queue doorbell write for nonexistent queue, cqid=2, ignoring
Any ideas?

Thanks,
-Ian
BareMetal OS - http://www.returninfinity.com/
Mono-tasking 64-bit OS for x86-64 based computers, written entirely in Assembly
User avatar
IanSeyler
Member
Member
Posts: 326
Joined: Mon Jul 28, 2008 9:46 am
Location: Ontario, Canada
Contact:

Re: Qemu: NVMe controller gets stuck during controller confi

Post by IanSeyler »

Haven't made too much progress with a read from disk just yet. I keep getting errors back from the controller that the opcode is invalid.

I dumped the namespace (via Identify with CNS=2) to memory and it has one entry:

Code: Select all

(qemu) xp /16xw 0xD000
000000000000d000: 0x00000001 0x00000000 0x00000000 0x00000000
000000000000d010: 0x00000000 0x00000000 0x00000000 0x00000000
That NSID of 0x1 matches what I see QEMU reading from boot up:

Code: Select all

pci_nvme_io_cmd cid 0 nsid 0x1 sqid 1 opc 0x2 opname 'NVME_NVM_CMD_READ'
pci_nvme_read cid 0 nsid 1 nlb 1 count 512 lba 0x0
pci_nvme_map_prp trans_len 512 len 512 prp1 0x7c00 prp2 0x0 num_prps 1
pci_nvme_map_addr addr 0x7c00 len 512
pci_nvme_rw_cb cid 0 blk 'disk1'
pci_nvme_rw_complete_cb cid 0 blk 'disk1'
pci_nvme_enqueue_req_completion cid 0 cqid 1 dw0 0x0 dw1 0x0 status 0x0
pci_nvme_irq_masked IRQ is masked
However I get an invalid opcode when using NSID 0x1. Any other NSID gives an Invalid NSID error:

Code: Select all

pci_nvme_io_cmd cid 0 nsid 0x1 sqid 1 opc 0x2 opname 'NVME_NVM_CMD_READ'
pci_nvme_err_invalid_opc invalid opcode 0x2
pci_nvme_enqueue_req_completion cid 0 cqid 1 dw0 0x0 dw1 0x0 status 0x4001
pci_nvme_err_req_status cid 0 nsid 0 status 0x4001 opc 0x2
pci_nvme_irq_masked IRQ is masked
My I/O entry is as follows:

Code: Select all

0x00000002			; CDW0 CID (31:16), PRP used (15:14 clear), FUSE normal (bits 9:8 clear), command Read (0x02)
0x00000001			; CDW1 NSID
0x0000000000000000	; CDW2-3 ELBST EILBRT (47:00)
0x0000000000000000	; CDW4-5 MPTR
0x000000000000E000	; CDW6-7 DPTR1
0x0000000000000000	; CDW8-9 DPTR2
0x00000000			; CDW10 SLBA (31:00)
0x00000000			; CDW11 SLBA (63:32)
0x00000001			; CDW12 Number of Logical Blocks (15:00)
0x00000000			; CDW13 DSM (07:00)
0x00000000			; CDW14 ELBST EILBRT (31:00)
0x00000000			; CDW15 ELBATM (31:16), ELBAT (15:00)
I've confirmed the NSID for the disk via Identify Namespace. It appears the disk size it reports is correct (0x8000 x 4096 byte sectors = 128MB). Any ideas?

Thanks,
-Ian
BareMetal OS - http://www.returninfinity.com/
Mono-tasking 64-bit OS for x86-64 based computers, written entirely in Assembly
Ethin
Member
Member
Posts: 625
Joined: Sun Jun 23, 2019 5:36 pm
Location: North Dakota, United States

Re: Qemu: NVMe controller gets stuck during controller confi

Post by Ethin »

Hi there,
Are you reading from the doorbell registers? If so, don't (its undefined). But as for your main issue, I'm not honestly sure why your getting invalid opcode. Try setting the LR and FUA bits.
User avatar
IanSeyler
Member
Member
Posts: 326
Joined: Mon Jul 28, 2008 9:46 am
Location: Ontario, Canada
Contact:

Re: Qemu: NVMe controller gets stuck during controller confi

Post by IanSeyler »

Hi Ethin,

I'm reading the entries from the Command Completion Queue to verify they were done - not the doorbell registers.

I tried setting the LR and FUA bits (Bits 31 and 30 in CDW12) but it had no effect.

Thanks,
-Ian
BareMetal OS - http://www.returninfinity.com/
Mono-tasking 64-bit OS for x86-64 based computers, written entirely in Assembly
Ethin
Member
Member
Posts: 625
Joined: Sun Jun 23, 2019 5:36 pm
Location: North Dakota, United States

Re: Qemu: NVMe controller gets stuck during controller confi

Post by Ethin »

I don't know but I'm pretty sure this might be a bug. But I don't know -- I haven't tested it on my version of Qemu. But your command is entirely correct, even assuming that the Qemu NVMe controller is version 1.4 and not 2.0. I might be missing something though.
linuxyne
Member
Member
Posts: 211
Joined: Sat Jul 02, 2016 7:02 am

Re: Qemu: NVMe controller gets stuck during controller confi

Post by linuxyne »

IanSeyler wrote:...However I get an invalid opcode when using NSID 0x1. Any other NSID gives an Invalid NSID error:

Code: Select all

pci_nvme_io_cmd cid 0 nsid 0x1 sqid 1 opc 0x2 opname 'NVME_NVM_CMD_READ'
pci_nvme_err_invalid_opc invalid opcode 0x2
pci_nvme_enqueue_req_completion cid 0 cqid 1 dw0 0x0 dw1 0x0 status 0x4001
pci_nvme_err_req_status cid 0 nsid 0 status 0x4001 opc 0x2
pci_nvme_irq_masked IRQ is masked
The status is 0x4001, which under qemu is

Code: Select all

NVME_INVALID_OPCODE | NVME_DNR
Qemu is most likely returning the error from this location inside its nvme_io_cmd function.

That is, the bit CSUPP within the Commands Supported and Effects Data Structure for opcode# 2 is not set. The source contains functions such as nvme_select_iocs_ns, which are responsible for selecting the appropriate iocs, at the time of attachment, or of start of the controller. Either an attachment is not performed, or the state at the start of the controller may cause qemu to select empty IOCS.

---

Qemu can be instrumented with fprintf's at selected locations and built, for greater debuggability, like so:

Code: Select all

# /home/user/src/qemu is the source
$ mkdir /home/user/src/bin
$ cd /home/user/src/bin
$ ../qemu/configure --prefix=/home/user/tools/qemu --target-list=x86_64-softmmu
$ make -j4 install
Edit: Qemu sets bits 0, 6, and 7 within CAP.CSS field. Based on the BareMetal nvme driver, that causes CC.CSS to be 0b111, which very likely results in selecting the empty IOCS.

According to the spec, for CC.CSS,
If CAP.CSS bit 7 is set to ‘1’, then the value 111b indicates that only the Admin
Command Set is supported and that no I/O Command Set or I/O Command Set
specific Admin commands are supported. When only the Admin Command Set
is supported, any command submitted on an I/O Submission Queue and any
I/O Command Set specific Admin command submitted on the Admin
Submission Queue is completed with status Invalid Command Opcode. If
CAP.CSS bit 7 is cleared to ‘0’, then the value of 111b is reserved.
The wordings above allow themselves to be read as if CC.CSS can be set to a value != 7 even when CAP.CSS bit 7 is 1.

Otherwise, this might well be a bug in Qemu, though you can work around, for now, by setting CC.CSS to a supported value other than 7.

Linux checks for CAP_CSS_CSI (6) first, and if set, selects CC_CSS_CSI. If not, it selects CC_CSS_NVM. It doesn't seem to bother itself with the admin-only CAP_CSS bit.

Edit2: OpenBSD and FreeBSD seem to set CC_CSS == 0, without checking capabilities, afaics.

Edit3: SeaBios checks for CAP_CSS_NVM, and if not set, errors out. Otherwise, it too sets CC.CSS = 0.
User avatar
IanSeyler
Member
Member
Posts: 326
Joined: Mon Jul 28, 2008 9:46 am
Location: Ontario, Canada
Contact:

Re: Qemu: NVMe controller gets stuck during controller confi

Post by IanSeyler »

Hi linuxyne,

Thanks! I commented out my code for setting CC.CSS and the read works now. I'll review that part later on.

-Ian
BareMetal OS - http://www.returninfinity.com/
Mono-tasking 64-bit OS for x86-64 based computers, written entirely in Assembly
Post Reply