Qemu: NVMe controller gets stuck during controller config
Re: Qemu: NVMe controller gets stuck during controller confi
Thanks, Octocontrabass -- I forgot about that.
Re: Qemu: NVMe controller gets stuck during controller confi
So I think qemu is broken. My log2 function is like this:
Which is equivalent to, in C:
This gets me 10 (MQES is 2047). Using math.log2 in Python yields 10.99 (so my implementation is correct). This doesn't handle zero, but the likelihood of MQES being zero is nil anyway. However, I still get this same error, no matter what binary logarithm algorithm I try:
Code: Select all
(8 * size_of::<u64>() - (n.leading_zeros() as usize) - 1) as u64
Code: Select all
#define LOG2(X) ((unsigned) (8*sizeof (unsigned long long) - __builtin_clzll((X)) - 1))
Thoughts?8896@1601321519.344101:pci_nvme_err_startfail_cqent_too_large nvme_start_ctrl failed because the completion queue entry size is too large: log2size=10, max=15
8896@1601321519.344101:pci_nvme_err_startfail setting controller enable bit failed
Re: Qemu: NVMe controller gets stuck during controller confi
Remember that the value is zero based. You need to add 1 before you calculate the value. The CAP.MQES values is the index of the highest allowed, not the count. Add 1 to get a count.Ethin wrote:This gets me 10 (MQES is 2047). Using math.log2 in Python yields 10.99 (so my implementation is correct).
I am getting the same results.Ethin wrote:Thoughts?8896@1601321519.344101:pci_nvme_err_startfail_cqent_too_large nvme_start_ctrl failed because the completion queue entry size is too large: log2size=10, max=15
8896@1601321519.344101:pci_nvme_err_startfail setting controller enable bit failed
Here are my thoughts, though don't take them as fact. I haven't researched it enough yet.
Starting with line 1707 is where QEMU makes the checks you are showing above.
However, after initialization and before you get the IDENTIFY block, you have no clue what the value is to compare to. Yet, QEMU is already comparing these values.
It is the chicken before the egg.
You need to send the IDENTIFY command to get the minimum and maximum values before you can place these values, yet QEMU is comparing these values before you have a chance to send the IDENTIFY command.
However, since QEMU displays the values it compared with, even after placing these values in my initialization, QEMU still complains at this point.
If I figure something out, I will post here. I ask that you do the same.
Thanks,
Ben
- http://www.fysnet.net/osdesign_book_series.htm
-
- Member
- Posts: 5586
- Joined: Mon Mar 25, 2013 7:01 pm
Re: Qemu: NVMe controller gets stuck during controller confi
QEMU is broken. The error message is printing garbage instead of the actual maximum allowed value.Ethin wrote:Thoughts?
As for the bug in your code, MQES indicates the maximum number of queue entries, not the maximum size of each entry. For that, you need to look at SQES and CQES in the Identify Controller data structure, which you can retrieve using the Identify command.
In order to send the Identify command, you still need to set the entry size; lucky for you, the minimum entry sizes are fixed values, so you can just use those to start out.
I'm not sure you actually need to know the maximum entry size. The spec seems to imply that the minimum is good enough for typical use.
Re: Qemu: NVMe controller gets stuck during controller confi
Yep, after a little more looking (I don't know why I didn't catch this before) QEMU is broken.Octocontrabass wrote:QEMU is broken.Ethin wrote:Thoughts?
Look at lines 2357 and 2358.
Code: Select all
id->sqes = (0x6 << 4) | 0x6;
id->cqes = (0x4 << 4) | 0x4;
Now look at the code starting at line 1707: (I have removed the inside part of the block for clarity here)
Code: Select all
if (unlikely(NVME_CC_IOCQES(n->bar.cc) < NVME_CTRL_CQES_MIN(n->id_ctrl.cqes))) {
// return error
}
if (unlikely(NVME_CC_IOCQES(n->bar.cc) > NVME_CTRL_CQES_MAX(n->id_ctrl.cqes))) {
// return error
}
if (unlikely(NVME_CC_IOSQES(n->bar.cc) < NVME_CTRL_SQES_MIN(n->id_ctrl.sqes))) {
// return error
}
if (unlikely(NVME_CC_IOSQES(n->bar.cc) > NVME_CTRL_SQES_MAX(n->id_ctrl.sqes))) {
// return error
}
Code: Select all
if (unlikely(NVME_CC_IOCQES(n->bar.cc) != NVME_CTRL_CQES_MIN(n->id_ctrl.cqes))) {
// return error
}
if (unlikely(NVME_CC_IOSQES(n->bar.cc) != NVME_CTRL_SQES_MIN(n->id_ctrl.sqes))) {
// return error
}
Code: Select all
id->sqes = (0x6 << 4) | 0x6;
id->cqes = (0x4 << 4) | 0x4;
IMHO, QEMU needs to do one of two things:
1) skip the check above if the IDENTIFY command has not been called, ignoring the current settings of CC.IOSQES and CC.IOCQES.
or
2) set the code to be
Code: Select all
id->sqes = (0xF << 4) | 0x0;
id->cqes = (0xF << 4) | 0x0;
As a work-around, if you initialize your driver to use 6 and 4 (respectively), the emulation will enable the controller and continue on.
Ben
Last edited by BenLunt on Mon Sep 28, 2020 10:15 pm, edited 1 time in total.
-
- Member
- Posts: 5586
- Joined: Mon Mar 25, 2013 7:01 pm
Re: Qemu: NVMe controller gets stuck during controller confi
Both of these suggestions violate the NVMe spec and would cause QEMU to behave differently from real hardware. Only the error messages are broken.BenLunt wrote:QEMU needs to do one of two things:
1) skip the check above if the IDENTIFY command has not been called.
2) set the code to beby default and once the IDENTIFY command has been called, then set the valid values.Code: Select all
id->sqes = (0xF << 4) | 0x0; id->cqes = (0xF << 4) | 0x0;
Re: Qemu: NVMe controller gets stuck during controller confi
Hi Octocontrabass,Octocontrabass wrote:Both of these suggestions violate the NVMe spec and would cause QEMU to behave differently from real hardware. Only the error messages are broken.BenLunt wrote:QEMU needs to do one of two things:
1) skip the check above if the IDENTIFY command has not been called.
2) set the code to beby default and once the IDENTIFY command has been called, then set the valid values.Code: Select all
id->sqes = (0xF << 4) | 0x0; id->cqes = (0xF << 4) | 0x0;
Please explain, because the way I see it, the current code requires you to set the CC.IOSQES and CC.IOCQES values to 6 and 4 (respectively) just to enable the controller (set the CC.EN bit).
Where are these numbers coming from? Does the specification state that these are to be used as defaults? I don't see where it does.
One cannot request and receive the IDENTIFY block, which contains these MAX and MIN values, without first enabling the controller.
Therefore, using the current QEMU code, you are required to use 6 and 4 simply to enable the controller to then send the IDENTIFY command to get these values. Chicken and egg.
Am I wrong here? If so, please correct me.
Ben
P.S. I agree that the error messages need to be fixed. It is comparing one set of values and then displaying a second set of values. However, fixing the error messages doesn't fix the problem I describe here.
-
- Member
- Posts: 5586
- Joined: Mon Mar 25, 2013 7:01 pm
Re: Qemu: NVMe controller gets stuck during controller confi
Those numbers come from page 186 of the current NVMe spec (1.4a), where it's mandatory for all devices to use those values as their minimum supported queue entry sizes. Since those values are mandatory, you don't need to read the Identify Controller block to know what they are.BenLunt wrote:Where are these numbers coming from? Does the specification state that these are to be used as defaults? I don't see where it does.
You do need to read the Identify Controller block to find the maximum queue entry sizes, but I'm not sure why you would want to use queue entries bigger than the minimum size.
Re: Qemu: NVMe controller gets stuck during controller confi
Octocontrabass is right; I had completely forgotten to re-read that part of the specification. It indeed does define the absolute minimums as 6 and 4, respectively, in bits 3:0 of bytes 512 and 513 of Fig. 249 (it says figure 247, but its actually 249).
Edit: setting bits 23:20 and 19:16 to 6 and 4 do not actually fix this issue. QEMU still fails to initialize the controller, and the trace events are no help. It appears that Linux also defines IOSQES and IOCQES to 6 and 4 so I'm not sure how my code differs from theirs (although its significantly less evolved).
Edit: setting bits 23:20 and 19:16 to 6 and 4 do not actually fix this issue. QEMU still fails to initialize the controller, and the trace events are no help. It appears that Linux also defines IOSQES and IOCQES to 6 and 4 so I'm not sure how my code differs from theirs (although its significantly less evolved).
Re: Qemu: NVMe controller gets stuck during controller confi
Indeed. I guess it is all how you interpret the specification.Octocontrabass wrote:Those numbers come from page 186 of the current NVMe spec (1.4a), where it's mandatory for all devices to use those values as their minimum supported queue entry sizes. Since those values are mandatory, you don't need to read the Identify Controller block to know what they are.BenLunt wrote:Where are these numbers coming from? Does the specification state that these are to be used as defaults? I don't see where it does.
You do need to read the Identify Controller block to find the maximum queue entry sizes, but I'm not sure why you would want to use queue entries bigger than the minimum size.
After reading all the comments here, I now understand the interpretation that was meant. The minimum values are, by requirement, set at 6 and 4 respectively, but the maximum values can be higher.
Therefore, when initializing (enabling) the controller for the first time, since we have not received the IDENTIFY command yet, we can set the values to 6 and 4. Then after receiving the IDENTIFY command's response, we can evaluate to a new set of values if the values we choose still are within the new range found. My interpretation of the specification is that these values can be changed while CC.EN is 1 as long as they are within the range given. i.e.: CC.IOCQES and CC.IOSQES are writable while CC.EN is 1.
After setting my values to 6 and 4, I get the controller to enable. Here is my set up:Ethin wrote:Edit: setting bits 23:20 and 19:16 to 6 and 4 do not actually fix this issue. QEMU still fails to initialize the controller, and the trace events are no help. It appears that Linux also defines IOSQES and IOCQES to 6 and 4 so I'm not sure how my code differs from theirs (although its significantly less evolved).
Code: Select all
mem_write_io_regs(addr, SSS_HC_CC,
SSS_HC_CC_SET_IOCQES(4) | // (1 << 4) = 16 Defined minimum
SSS_HC_CC_SET_IOSQES(6) | // (1 << 6) = 64 Defined minimum
SSS_HC_CC_SET_SHN(0) | // no shutdown notification
SSS_HC_CC_SET_AMS(0) | // round robin arbitration
SSS_HC_CC_SET_MPS(0) | // 0 = 4096 page size
SSS_HC_CC_SET_CSS(SSS_HC_CC_CSS_NVM)| // NVM command set
SSS_HC_CC_EN); // enable the controller
Similar to Ethin's comment, I had completely missed the two statements where it defined a required minimum of 6 and 4 so my interpretation was off a bit.
Thanks,
Ben
Re: Qemu: NVMe controller gets stuck during controller confi
Idk what I'm doing wrong... Its definitely not working for me. Pushed my latest code just now. Lines 314-326 now (I restructured my NVMe code).
Edit: So I got it to work. Using Ben's, I found out taht I had reversed IOCQES and IOSQES.
Edit: So I got it to work. Using Ben's, I found out taht I had reversed IOCQES and IOSQES.
Re: Qemu: NVMe controller gets stuck during controller confi
(See update at end of post)
Just out of curiosity, have you got your driver to the point where you can read sectors from the disk?
I have a working driver and can read up to 16 sectors just fine. However, when I try to read more than 16 sectors at a time, it fails.
It fails with the error: (Specs: v1.2, page 65, Figure 30)
Here are my concerns.
1) Sixteen 512-byte sectors is 8192 bytes, exactly two (2) pages of data. Doesn't mean anything at the moment, that I can tell.
2) Does QEMU's emulation actually have a transfer limit, they just forgot to update Byte 77 in the Indentify block?
The check is at: https://github.com/qemu/qemu/blob/maste ... vme.c#L575
I believe that the mdts member is 7 (https://github.com/qemu/qemu/blob/maste ... me.c#L2453) which using the test at Line 575 is well above the 8192+ bytes I am trying to transfer.
Line 18 shows that I can add a parameter to set this value, though my version of QEMU barks at the parameter stating it is not a member of nvme.
(A note if you haven't thought of it already. The value at Byte 77 is:
I don't find where QEMU actually sets Byte 77 of the (Controller) Identify block, but I am reading a value of zero from that byte where as I believe Line 2453 is setting it to 7 (4096 << 7 = 524,288).
Just wondering if anyone has any thoughts about this. I am sure I am missing something simple. Just can't pin-point it at the moment.
Ben
P.S. I guess one thing I need to mention and it has a bit to do with it (though I didn't think of it until just now), I am using the Windows version of QEMU which has a different source listing still reporting version 1.2. I will have to study this code instead.
Update: (At a glance) it looks like version 1.2 (of the QEMU code) doesn't support Scatter Gather, so it was taking my SGL address as the PRP1 address and PRP2 as the length of the data as an actual address. Since PRP1 points to the first page, and PRP2 points to the second page, this is the 8192 bytes it will transfer. Again, (at a glance) it looks like version 1.2 doesn't support Scatter Gather.
.
.
.
Proof: Patch for version 1.3 states
Just out of curiosity, have you got your driver to the point where you can read sectors from the disk?
I have a working driver and can read up to 16 sectors just fine. However, when I try to read more than 16 sectors at a time, it fails.
It fails with the error: (Specs: v1.2, page 65, Figure 30)
Therefore I thought it might be the maximum allowed transfer size per transfer. (Specs: v1.2, page 100, Figure 90, Byte 77)02: Invalid Field in Command: An invalid or unsupported field specified in the command parameters.
However, the value at Byte 77 returns 0. (Specs: v1.2, page 100, Figure 90, Byte 77)If a command is submitted that exceeds the transfer size, then the command is aborted with a status of Invalid Field in Command.
I am using a Scatter/Gather list (not PRPs) with a single Data Block Segment Entry since the buffer used is physically continuous.A value of 0h indicates no restrictions on transfer size.
Here are my concerns.
1) Sixteen 512-byte sectors is 8192 bytes, exactly two (2) pages of data. Doesn't mean anything at the moment, that I can tell.
2) Does QEMU's emulation actually have a transfer limit, they just forgot to update Byte 77 in the Indentify block?
The check is at: https://github.com/qemu/qemu/blob/maste ... vme.c#L575
I believe that the mdts member is 7 (https://github.com/qemu/qemu/blob/maste ... me.c#L2453) which using the test at Line 575 is well above the 8192+ bytes I am trying to transfer.
Line 18 shows that I can add a parameter to set this value, though my version of QEMU barks at the parameter stating it is not a member of nvme.
(A note if you haven't thought of it already. The value at Byte 77 is:
Therefore, the limit is calculated with the minimum page size, NOT the current page size. Therefore, if you use a page size other than the Minimum, remember that this limit is calculated on the Minimum page size, not the current used page size you specify in CC.MPS)The value is in units of the minimum memory page size (CAP.MPSMIN) and is reported as a power of two (2^n).
I don't find where QEMU actually sets Byte 77 of the (Controller) Identify block, but I am reading a value of zero from that byte where as I believe Line 2453 is setting it to 7 (4096 << 7 = 524,288).
Just wondering if anyone has any thoughts about this. I am sure I am missing something simple. Just can't pin-point it at the moment.
Ben
P.S. I guess one thing I need to mention and it has a bit to do with it (though I didn't think of it until just now), I am using the Windows version of QEMU which has a different source listing still reporting version 1.2. I will have to study this code instead.
Update: (At a glance) it looks like version 1.2 (of the QEMU code) doesn't support Scatter Gather, so it was taking my SGL address as the PRP1 address and PRP2 as the length of the data as an actual address. Since PRP1 points to the first page, and PRP2 points to the second page, this is the 8192 bytes it will transfer. Again, (at a glance) it looks like version 1.2 doesn't support Scatter Gather.
.
.
.
Proof: Patch for version 1.3 states
- adds support for scatter gather lists (SGLs)
Re: Qemu: NVMe controller gets stuck during controller confi
I haven't gotten there yet. In general I'm going to strive to use PRPs as much as possible; I'm not very good with SGLs, and I'm not exactly sure how to construct one (and more things seem to work with PRPs than SGLs). I'm getting stuck just sending identify. For some reason, my memory allocator goes rogue when my NVMe driver starts.At first I thought that my math was wrong, so I switched it to just allocate a 16KiB ringbuffer that I can just reuse over and over (is that a bad idea, by the way?). I'm using 4KiB pages, so that should equal four memory frames of size 4096, right? Because the last time I ran my code my memory allocator allocated more than a thousand frames (actually it was more like 5 thousand and rising). The addresses of those higher frames exceeded the 16 KiB I'd requested too. And I've no idea exactly how to debug it either -- because there's no error, there's no condition... there's not much for me to go on. I've pushed my commit -- would appreciate some help because I'm at a complete and utter loss.
Re: Qemu: NVMe controller gets stuck during controller confi
I guess I don't understand what you issue is.Ethin wrote:I haven't gotten there yet. In general I'm going to strive to use PRPs as much as possible; I'm not very good with SGLs, and I'm not exactly sure how to construct one (and more things seem to work with PRPs than SGLs). I'm getting stuck just sending identify. For some reason, my memory allocator goes rogue when my NVMe driver starts.At first I thought that my math was wrong, so I switched it to just allocate a 16KiB ringbuffer that I can just reuse over and over (is that a bad idea, by the way?). I'm using 4KiB pages, so that should equal four memory frames of size 4096, right? Because the last time I ran my code my memory allocator allocated more than a thousand frames (actually it was more like 5 thousand and rising). The addresses of those higher frames exceeded the 16 KiB I'd requested too. And I've no idea exactly how to debug it either -- because there's no error, there's no condition... there's not much for me to go on. I've pushed my commit -- would appreciate some help because I'm at a complete and utter loss.
You need to allocate physical continuous memory for your Submission and Completion rings. The CAP.MQES will give you a limit of how many entries per ring, though I only use 64 each.
Therefore, since the Submission Ring uses 64-byte entries, 64 of them would occupy a single 4k page. The Completion Ring uses 16-byte entries, 64 of them would occupy less than a single 4k page.
This is the same for the I/O ring(s) as well.
The IDENTIFY blocks (CNS values 0, 1, and 2), all require a single 4k block, no matter the page size you use.
So to keep it simple, you need the following:
1) One 4k block for the Admin Submission Ring
2) One 4k block for the Admin Completion Ring
3) One 4k block for returning IDENTIFY data
4) One 4k block for each I/O Submission Ring
5) One 4k block for each I/O Completion Ring
Since you haven't gotten past the IDENTIFY command yet, you don't need to worry about the I/O rings yet.
From previous posts, you have been able to enable the controller. Did you create your Admin rings before or after enabling the controller? You should have done this before enabling it.
At this point, just after enabling the controller, you should have an empty Admin Submission Ring and Completion Ring.
You can now send the IDENTIFY command.
CDW0 = CID, USE_PRP's, FUSE_NORMAL, OPCODE_IDENTIFY;
NSID = NSID_NONE;
MPTR = NULL;
PRP1 = 4k page aligned pointer to the 4k page of physical memory to store the data
PRP2 = 0
CDW10 = CNS (0, 1, or 2)
CDW11 = 0;
etc = 0;
Insert the Submission into the Admin Submission Queue (Ring) and ring the Admin Doorbell.
Wait for the interrupt
Process the Admin Completion Ring (Using the Phase Bit)
Return
You should now have the 4k data you are looking for.
Does this help?
Ben
Re: Qemu: NVMe controller gets stuck during controller confi
Yes, and that shows me what I need to do. However, I can't even queue the command. As I said, my memory allocation routine goes rogue when I ask it to allocate the buffer for the PRP. And yes, I enable the controller after allocating queues.BenLunt wrote:I guess I don't understand what you issue is.Ethin wrote:I haven't gotten there yet. In general I'm going to strive to use PRPs as much as possible; I'm not very good with SGLs, and I'm not exactly sure how to construct one (and more things seem to work with PRPs than SGLs). I'm getting stuck just sending identify. For some reason, my memory allocator goes rogue when my NVMe driver starts.At first I thought that my math was wrong, so I switched it to just allocate a 16KiB ringbuffer that I can just reuse over and over (is that a bad idea, by the way?). I'm using 4KiB pages, so that should equal four memory frames of size 4096, right? Because the last time I ran my code my memory allocator allocated more than a thousand frames (actually it was more like 5 thousand and rising). The addresses of those higher frames exceeded the 16 KiB I'd requested too. And I've no idea exactly how to debug it either -- because there's no error, there's no condition... there's not much for me to go on. I've pushed my commit -- would appreciate some help because I'm at a complete and utter loss.
You need to allocate physical continuous memory for your Submission and Completion rings. The CAP.MQES will give you a limit of how many entries per ring, though I only use 64 each.
Therefore, since the Submission Ring uses 64-byte entries, 64 of them would occupy a single 4k page. The Completion Ring uses 16-byte entries, 64 of them would occupy less than a single 4k page.
This is the same for the I/O ring(s) as well.
The IDENTIFY blocks (CNS values 0, 1, and 2), all require a single 4k block, no matter the page size you use.
So to keep it simple, you need the following:
1) One 4k block for the Admin Submission Ring
2) One 4k block for the Admin Completion Ring
3) One 4k block for returning IDENTIFY data
4) One 4k block for each I/O Submission Ring
5) One 4k block for each I/O Completion Ring
Since you haven't gotten past the IDENTIFY command yet, you don't need to worry about the I/O rings yet.
From previous posts, you have been able to enable the controller. Did you create your Admin rings before or after enabling the controller? You should have done this before enabling it.
At this point, just after enabling the controller, you should have an empty Admin Submission Ring and Completion Ring.
You can now send the IDENTIFY command.
CDW0 = CID, USE_PRP's, FUSE_NORMAL, OPCODE_IDENTIFY;
NSID = NSID_NONE;
MPTR = NULL;
PRP1 = 4k page aligned pointer to the 4k page of physical memory to store the data
PRP2 = 0
CDW10 = CNS (0, 1, or 2)
CDW11 = 0;
etc = 0;
Insert the Submission into the Admin Submission Queue (Ring) and ring the Admin Doorbell.
Wait for the interrupt
Process the Admin Completion Ring (Using the Phase Bit)
Return
You should now have the 4k data you are looking for.
Does this help?
Ben