creating NVMe IO submission and completion queues

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
glolichen
Posts: 15
Joined: Wed Jul 10, 2024 9:23 pm

creating NVMe IO submission and completion queues

Post by glolichen »

I've been trying to create NVMe IO submission and completion queues. The completion queue creation looks like it works (though I am not sure of this) and the completion queue creation command never finishes.

The code to create those queues (with structures defined as on the spec) is:

Code: Select all

	u64 io_complete_addr = kmalloc_page();
	asq[admin_submission_tail] = (const struct SubmissionEntry) {0};
	asq[admin_submission_tail].opcode = 5;
	asq[admin_submission_tail].identifier = 1;
	asq[admin_submission_tail].nsid = 0;
	asq[admin_submission_tail].dword10 = (63 << 16) | 1; // size 64 identifier 1
	asq[admin_submission_tail].dword11 = 1; // disable interrupts, physically contiguous
	asq[admin_submission_tail].prp1 = page_virt_to_phys_addr(io_complete_addr);
	(*admin_tail_doorbell)++;
	admin_submission_tail++;
	// wait for phase change / new entry in completion queue
	while (!(acq[admin_completion_head].dword3 & 1))
		asm volatile("nop");

	vga_printf("completion ok\n");
	serial_info("ACQ: dword 0: 0x%x", acq[admin_completion_head].dword0);
	serial_info("ACQ: dword 1: 0x%x", acq[admin_completion_head].dword1);
	serial_info("ACQ: dword 2: 0x%x", acq[admin_completion_head].dword2);
	serial_info("ACQ: dword 3: 0x%x", acq[admin_completion_head].dword3);
	admin_completion_head++;

	// create io submission queue (base spec 102-103)
	u64 io_submit_addr = kmalloc_page();
	asq[admin_submission_tail] = (const struct SubmissionEntry) {0};
	asq[admin_submission_tail].opcode = 1;
	asq[admin_submission_tail].identifier = 2;
	asq[admin_submission_tail].nsid = 0;
	asq[admin_submission_tail].dword10 = (63 << 16) | 1; // size 64 identifier 1
	asq[admin_submission_tail].dword11 = (1 << 16) | 1; // completion identifier 1, physically contiguous
	asq[admin_submission_tail].prp1 = page_virt_to_phys_addr(io_submit_addr);
	(*admin_tail_doorbell)++;
	admin_submission_tail++;
	while (!(acq[admin_completion_head].dword3 & 1)) {
		vga_printf("waiting %u\n", acq[admin_completion_head].dword3);
		asm volatile("nop");
	}
	
	vga_printf("submission ok");
	serial_info("ACQ: dword 0: 0x%x", acq[admin_completion_head].dword0);
	serial_info("ACQ: dword 1: 0x%x", acq[admin_completion_head].dword1);
	serial_info("ACQ: dword 2: 0x%x", acq[admin_completion_head].dword2);
	serial_info("ACQ: dword 3: 0x%x", acq[admin_completion_head].dword3);
	admin_completion_head++;
The last while loop (the one with vga_printf("waiting %u...) never finishes. The first group of serial outputs from creating completion queue is:

Code: Select all

INFO:  ACQ: dword 0: 0x0
INFO:  ACQ: dword 1: 0x0
INFO:  ACQ: dword 2: 0x1
INFO:  ACQ: dword 3: 0x82050001
dword 0 is 0, which should be good.

Full code is at https://github.com/glolichen/os. The lines above are src/pci/nvme.c lines 179-221.

Is there something wrong with the commands I am sending, for both completion and submission creation? I based this on the wiki page and the base spec, and I'm sorry that everything is a mess since the NVMe specs are quite hard to understand. Thanks all.
Octocontrabass
Member
Member
Posts: 5571
Joined: Mon Mar 25, 2013 7:01 pm

Re: creating NVMe IO submission and completion queues

Post by Octocontrabass »

glolichen wrote: Thu Nov 21, 2024 9:09 pm

Code: Select all

	(*admin_tail_doorbell)++;
Doorbell registers are write-only. Also, you usually need to use a volatile pointer when you access MMIO.

Have you tried QEMU's trace log? One of its trace events might tell you what's wrong.
glolichen
Posts: 15
Joined: Wed Jul 10, 2024 9:23 pm

Re: creating NVMe IO submission and completion queues

Post by glolichen »

Octocontrabass wrote: Thu Nov 21, 2024 10:52 pm
glolichen wrote: Thu Nov 21, 2024 9:09 pm

Code: Select all

	(*admin_tail_doorbell)++;
Doorbell registers are write-only. Also, you usually need to use a volatile pointer when you access MMIO.

Have you tried QEMU's trace log? One of its trace events might tell you what's wrong.
QEMU does give

Code: Select all

pci_nvme_ub_mmiord_toosmall in nvme_mmio_read: MMIO read smaller than 32-bits, offset=0x1c
a few times, but changing nvme_base and the doorbell pointers to volatile fixes this. Looks like the compiler is optimizing the memory reads to smaller sizes.

I have also changed the increment to

Code: Select all

	*admin_tail_doorbell = ++admin_submission_tail;
so the first time (createing completion queue) it would be set to 1 and creating the submission queue sets it to 2. The problem still remains, though dword3 now has a new value in the dead loop.

I do suspect that the IO completion queue creation is wrong. Even if I deliberately change the queue size to a broken value (0 instead of 63) with

Code: Select all

asq[admin_submission_tail].dword10 = (0 << 16) | 1; // size 64 identifier 1
, I get the same debug output:

Code: Select all

INFO:  ACQ: dword 0: 0x0
INFO:  ACQ: dword 1: 0x0
INFO:  ACQ: dword 2: 0x1
INFO:  ACQ: dword 3: 0x82050001
dword0 should be 1 to indicate an invalid queue size.

Am I doing something wrong with any of these steps?
Octocontrabass
Member
Member
Posts: 5571
Joined: Mon Mar 25, 2013 7:01 pm

Re: creating NVMe IO submission and completion queues

Post by Octocontrabass »

glolichen wrote: Sat Nov 23, 2024 2:36 pmdword0 should be 1 to indicate an invalid queue size.
Huh? The commands to create I/O submission and completion queues don't use dword 0, it will always be 0. If there's an error, it will be reported in dword 3.
glolichen
Posts: 15
Joined: Wed Jul 10, 2024 9:23 pm

Re: creating NVMe IO submission and completion queues

Post by glolichen »

Octocontrabass wrote: Sat Nov 23, 2024 6:51 pm
glolichen wrote: Sat Nov 23, 2024 2:36 pmdword0 should be 1 to indicate an invalid queue size.
Huh? The commands to create I/O submission and completion queues don't use dword 0, it will always be 0. If there's an error, it will be reported in dword 3.
Thanks, I don't know why I made that assumption.

The dword3 value of 0x85020001 corresponds to status code 2 based on page 140 of revision 2.1 of the base NVMe spec, which in turn corresponds to
Invalid Queue Size: The host attempted to create an I/O Completion Queue:
• with an invalid number of entries (e.g., a value of 0h or a value which exceeds the maximum
supported by the controller, specified in CAP.MQES); or
• before initializing the CC.IOCQES field.
I put in a queue size of 63, which means 64, which is not 0 and definitely does not exceed the maximum. And I'm also pretty sure I set CC.IOCQES with

Code: Select all

nvme_base->CC = (3 << 16) | (3 << 20);
which I think sets IO completion and submission queue sizes to 2^3 bytes, which is 64 bits.

It's also possible I'm interpreting the error wrong, or there is another problem somewhere in this whole process.
Octocontrabass
Member
Member
Posts: 5571
Joined: Mon Mar 25, 2013 7:01 pm

Re: creating NVMe IO submission and completion queues

Post by Octocontrabass »

glolichen wrote: Wed Nov 27, 2024 2:26 amAnd I'm also pretty sure I set CC.IOCQES with

Code: Select all

nvme_base->CC = (3 << 16) | (3 << 20);
which I think sets IO completion and submission queue sizes to 2^3 bytes, which is 64 bits.
That's too small. The minimum for I/O completion queue entries is 16 bytes (2^4) and the minimum for I/O submission queue entries is 64 bytes (2^6).
Post Reply