Re: Qemu: NVMe controller gets stuck during controller confi
Posted: Mon Sep 28, 2020 9:36 am
Thanks, Octocontrabass -- I forgot about that.
The Place to Start for Operating System Developers
http://f.osdev.org/
Code: Select all
(8 * size_of::<u64>() - (n.leading_zeros() as usize) - 1) as u64
Code: Select all
#define LOG2(X) ((unsigned) (8*sizeof (unsigned long long) - __builtin_clzll((X)) - 1))
Thoughts?[email protected]:pci_nvme_err_startfail_cqent_too_large nvme_start_ctrl failed because the completion queue entry size is too large: log2size=10, max=15
[email protected]:pci_nvme_err_startfail setting controller enable bit failed
Remember that the value is zero based. You need to add 1 before you calculate the value. The CAP.MQES values is the index of the highest allowed, not the count. Add 1 to get a count.Ethin wrote:This gets me 10 (MQES is 2047). Using math.log2 in Python yields 10.99 (so my implementation is correct).
I am getting the same results.Ethin wrote:Thoughts?[email protected]:pci_nvme_err_startfail_cqent_too_large nvme_start_ctrl failed because the completion queue entry size is too large: log2size=10, max=15
[email protected]:pci_nvme_err_startfail setting controller enable bit failed
QEMU is broken. The error message is printing garbage instead of the actual maximum allowed value.Ethin wrote:Thoughts?
Yep, after a little more looking (I don't know why I didn't catch this before) QEMU is broken.Octocontrabass wrote:QEMU is broken.Ethin wrote:Thoughts?
Code: Select all
id->sqes = (0x6 << 4) | 0x6;
id->cqes = (0x4 << 4) | 0x4;
Code: Select all
if (unlikely(NVME_CC_IOCQES(n->bar.cc) < NVME_CTRL_CQES_MIN(n->id_ctrl.cqes))) {
// return error
}
if (unlikely(NVME_CC_IOCQES(n->bar.cc) > NVME_CTRL_CQES_MAX(n->id_ctrl.cqes))) {
// return error
}
if (unlikely(NVME_CC_IOSQES(n->bar.cc) < NVME_CTRL_SQES_MIN(n->id_ctrl.sqes))) {
// return error
}
if (unlikely(NVME_CC_IOSQES(n->bar.cc) > NVME_CTRL_SQES_MAX(n->id_ctrl.sqes))) {
// return error
}
Code: Select all
if (unlikely(NVME_CC_IOCQES(n->bar.cc) != NVME_CTRL_CQES_MIN(n->id_ctrl.cqes))) {
// return error
}
if (unlikely(NVME_CC_IOSQES(n->bar.cc) != NVME_CTRL_SQES_MIN(n->id_ctrl.sqes))) {
// return error
}
Code: Select all
id->sqes = (0x6 << 4) | 0x6;
id->cqes = (0x4 << 4) | 0x4;
Code: Select all
id->sqes = (0xF << 4) | 0x0;
id->cqes = (0xF << 4) | 0x0;
Both of these suggestions violate the NVMe spec and would cause QEMU to behave differently from real hardware. Only the error messages are broken.BenLunt wrote:QEMU needs to do one of two things:
1) skip the check above if the IDENTIFY command has not been called.
2) set the code to beby default and once the IDENTIFY command has been called, then set the valid values.Code: Select all
id->sqes = (0xF << 4) | 0x0; id->cqes = (0xF << 4) | 0x0;
Hi Octocontrabass,Octocontrabass wrote:Both of these suggestions violate the NVMe spec and would cause QEMU to behave differently from real hardware. Only the error messages are broken.BenLunt wrote:QEMU needs to do one of two things:
1) skip the check above if the IDENTIFY command has not been called.
2) set the code to beby default and once the IDENTIFY command has been called, then set the valid values.Code: Select all
id->sqes = (0xF << 4) | 0x0; id->cqes = (0xF << 4) | 0x0;
Those numbers come from page 186 of the current NVMe spec (1.4a), where it's mandatory for all devices to use those values as their minimum supported queue entry sizes. Since those values are mandatory, you don't need to read the Identify Controller block to know what they are.BenLunt wrote:Where are these numbers coming from? Does the specification state that these are to be used as defaults? I don't see where it does.
Indeed. I guess it is all how you interpret the specification.Octocontrabass wrote:Those numbers come from page 186 of the current NVMe spec (1.4a), where it's mandatory for all devices to use those values as their minimum supported queue entry sizes. Since those values are mandatory, you don't need to read the Identify Controller block to know what they are.BenLunt wrote:Where are these numbers coming from? Does the specification state that these are to be used as defaults? I don't see where it does.
You do need to read the Identify Controller block to find the maximum queue entry sizes, but I'm not sure why you would want to use queue entries bigger than the minimum size.
After setting my values to 6 and 4, I get the controller to enable. Here is my set up:Ethin wrote:Edit: setting bits 23:20 and 19:16 to 6 and 4 do not actually fix this issue. QEMU still fails to initialize the controller, and the trace events are no help. It appears that Linux also defines IOSQES and IOCQES to 6 and 4 so I'm not sure how my code differs from theirs (although its significantly less evolved).
Code: Select all
mem_write_io_regs(addr, SSS_HC_CC,
SSS_HC_CC_SET_IOCQES(4) | // (1 << 4) = 16 Defined minimum
SSS_HC_CC_SET_IOSQES(6) | // (1 << 6) = 64 Defined minimum
SSS_HC_CC_SET_SHN(0) | // no shutdown notification
SSS_HC_CC_SET_AMS(0) | // round robin arbitration
SSS_HC_CC_SET_MPS(0) | // 0 = 4096 page size
SSS_HC_CC_SET_CSS(SSS_HC_CC_CSS_NVM)| // NVM command set
SSS_HC_CC_EN); // enable the controller
Therefore I thought it might be the maximum allowed transfer size per transfer. (Specs: v1.2, page 100, Figure 90, Byte 77)02: Invalid Field in Command: An invalid or unsupported field specified in the command parameters.
However, the value at Byte 77 returns 0. (Specs: v1.2, page 100, Figure 90, Byte 77)If a command is submitted that exceeds the transfer size, then the command is aborted with a status of Invalid Field in Command.
I am using a Scatter/Gather list (not PRPs) with a single Data Block Segment Entry since the buffer used is physically continuous.A value of 0h indicates no restrictions on transfer size.
Therefore, the limit is calculated with the minimum page size, NOT the current page size. Therefore, if you use a page size other than the Minimum, remember that this limit is calculated on the Minimum page size, not the current used page size you specify in CC.MPS)The value is in units of the minimum memory page size (CAP.MPSMIN) and is reported as a power of two (2^n).
- adds support for scatter gather lists (SGLs)
I guess I don't understand what you issue is.Ethin wrote:I haven't gotten there yet. In general I'm going to strive to use PRPs as much as possible; I'm not very good with SGLs, and I'm not exactly sure how to construct one (and more things seem to work with PRPs than SGLs). I'm getting stuck just sending identify. For some reason, my memory allocator goes rogue when my NVMe driver starts.At first I thought that my math was wrong, so I switched it to just allocate a 16KiB ringbuffer that I can just reuse over and over (is that a bad idea, by the way?). I'm using 4KiB pages, so that should equal four memory frames of size 4096, right? Because the last time I ran my code my memory allocator allocated more than a thousand frames (actually it was more like 5 thousand and rising). The addresses of those higher frames exceeded the 16 KiB I'd requested too. And I've no idea exactly how to debug it either -- because there's no error, there's no condition... there's not much for me to go on. I've pushed my commit -- would appreciate some help because I'm at a complete and utter loss.
Yes, and that shows me what I need to do. However, I can't even queue the command. As I said, my memory allocation routine goes rogue when I ask it to allocate the buffer for the PRP. And yes, I enable the controller after allocating queues.BenLunt wrote:I guess I don't understand what you issue is.Ethin wrote:I haven't gotten there yet. In general I'm going to strive to use PRPs as much as possible; I'm not very good with SGLs, and I'm not exactly sure how to construct one (and more things seem to work with PRPs than SGLs). I'm getting stuck just sending identify. For some reason, my memory allocator goes rogue when my NVMe driver starts.At first I thought that my math was wrong, so I switched it to just allocate a 16KiB ringbuffer that I can just reuse over and over (is that a bad idea, by the way?). I'm using 4KiB pages, so that should equal four memory frames of size 4096, right? Because the last time I ran my code my memory allocator allocated more than a thousand frames (actually it was more like 5 thousand and rising). The addresses of those higher frames exceeded the 16 KiB I'd requested too. And I've no idea exactly how to debug it either -- because there's no error, there's no condition... there's not much for me to go on. I've pushed my commit -- would appreciate some help because I'm at a complete and utter loss.
You need to allocate physical continuous memory for your Submission and Completion rings. The CAP.MQES will give you a limit of how many entries per ring, though I only use 64 each.
Therefore, since the Submission Ring uses 64-byte entries, 64 of them would occupy a single 4k page. The Completion Ring uses 16-byte entries, 64 of them would occupy less than a single 4k page.
This is the same for the I/O ring(s) as well.
The IDENTIFY blocks (CNS values 0, 1, and 2), all require a single 4k block, no matter the page size you use.
So to keep it simple, you need the following:
1) One 4k block for the Admin Submission Ring
2) One 4k block for the Admin Completion Ring
3) One 4k block for returning IDENTIFY data
4) One 4k block for each I/O Submission Ring
5) One 4k block for each I/O Completion Ring
Since you haven't gotten past the IDENTIFY command yet, you don't need to worry about the I/O rings yet.
From previous posts, you have been able to enable the controller. Did you create your Admin rings before or after enabling the controller? You should have done this before enabling it.
At this point, just after enabling the controller, you should have an empty Admin Submission Ring and Completion Ring.
You can now send the IDENTIFY command.
CDW0 = CID, USE_PRP's, FUSE_NORMAL, OPCODE_IDENTIFY;
NSID = NSID_NONE;
MPTR = NULL;
PRP1 = 4k page aligned pointer to the 4k page of physical memory to store the data
PRP2 = 0
CDW10 = CNS (0, 1, or 2)
CDW11 = 0;
etc = 0;
Insert the Submission into the Admin Submission Queue (Ring) and ring the Admin Doorbell.
Wait for the interrupt
Process the Admin Completion Ring (Using the Phase Bit)
Return
You should now have the 4k data you are looking for.
Does this help?
Ben