I'm not honestly sure what's causing this -- it could be an issue with my code or with QEMU. My code is available over here. Initialization code for the NVMe driver is on lines 564-746 (I need to break this file up into chunks). I'm doing my best to follow the NVMe article as well as the NVMe base specification, revision 1.4a.[INFO] [nvme] initializing controller
[INFO] [nvme] running controller checks
[INFO] [nvme] Checking controller version
[DEBUG] [nvme] VS = 7000800, 111000000000000100000000000
[INFO] [nvme] Checking command set support
[INFO] [nvme] NVM command set supported
[DEBUG] [nvme] CSS = 69, 1101001
[INFO] [nvme] device supports 4KiB pages
[DEBUG] [nvme] MPSMIN = 2, 10
[INFO] [nvme] resetting controller
[DEBUG] [nvme] CC = 216876D, 10000101101000011101101101
[DEBUG] [nvme] CSTS = 1C090703, 11100000010010000011100000011
[DEBUG] [nvme] CC[0] = 0
[DEBUG] [nvme] CC = 216876C, 10000101101000011101101100
Qemu: NVMe controller gets stuck during controller config
Qemu: NVMe controller gets stuck during controller config
I'm working on an NVMe driver and it seems to be getting stuck, either when the controller is reset or when its being enabled. (QEMU also seems to be alternating between supporting the NVM command set and only supporting the admin command set at random, which is weird.) The output of my kernel is below:
Re: Qemu: NVMe controller gets stuck during controller confi
I'm surprised your version check for 1.4 passes on line 568, since Qemu's NVMe implementation only reports version 1.3: https://github.com/qemu/qemu/blob/maste ... nvme.c#L75
Are you sure you're accessing the registers correctly? You should be taking the address in BAR 0 from the PCI configuration space of the NVMe controller, clearing out the lower 4 bits, and mapping as uncacheable memory.
Are you sure you're accessing the registers correctly? You should be taking the address in BAR 0 from the PCI configuration space of the NVMe controller, clearing out the lower 4 bits, and mapping as uncacheable memory.
Re: Qemu: NVMe controller gets stuck during controller confi
I believe I am. I am definitely mapping as uncacheable and I am using BAR 0. My function for getting BARs does this when calculating the actual BAR:nakst wrote:I'm surprised your version check for 1.4 passes on line 568, since Qemu's NVMe implementation only reports version 1.3: https://github.com/qemu/qemu/blob/maste ... nvme.c#L75
Are you sure you're accessing the registers correctly? You should be taking the address in BAR 0 from the PCI configuration space of the NVMe controller, clearing out the lower 4 bits, and mapping as uncacheable memory.
Code: Select all
fn calculate_bar_addr(dev: &PCIDevice, addr: u32) -> usize {
let bar1 = read_dword(dev.phys_addr as usize, addr);
if !bar1.get_bit(0) {
match bar1.get_bits(1..=2) {
0 => (bar1 & 0xFFFF_FFF0) as usize,
1 => (bar1 & 0xFFF0) as usize,
2 => {
let bar2 = read_dword(
dev.phys_addr as usize,
match addr {
BAR0 => BAR1,
BAR1 => BAR2,
BAR2 => BAR3,
BAR3 => BAR4,
BAR4 => BAR5,
BAR5 | _ => 0,
},
);
(((bar1 as u64) & 0xFFFF_FFF0) + (((bar2 as u64) & 0xFFFF_FFFF) << 32)) as usize
}
_ => bar1 as usize,
}
} else {
(bar1 & 0xFFFF_FFFC) as usize
}
}
Re: Qemu: NVMe controller gets stuck during controller confi
Where is the code to map the registers as uncacheable memory?
Re: Qemu: NVMe controller gets stuck during controller confi
Mapping as uncachable is in memory.rs, lines 264-295.
Re: Qemu: NVMe controller gets stuck during controller confi
But you don't seem to call the function allocate_phys_range with the address in BAR0? Or perhaps I am missing something since I don't know Rust.
The value for the version register you gave in your first post looks very wrong.
The value for the version register you gave in your first post looks very wrong.
My guess is you are reading from the physical address by mistake?[DEBUG] [nvme] VS = 7000800, 111000000000000100000000000
Re: Qemu: NVMe controller gets stuck during controller confi
My architecture for drivers is a bit unique. The driver takes a function to allocate physical addresses; it doesn't do memory allocation like that itself.
The BARs, according to my kernel, are (in hex) 408010, 0, 0, 0, 920010, and 20110000 (if its reading the PCIe address space properly). I've got two boot logs written, you can find them here and here, respectively. My PCIe memory space read functions are on lines 110-129 of pci.rs (I don't have a read_qword function). I'm not sure how much the boot logs help; I'm not reading the physical address but the data at that address.
Edit: yes, I'm definitely reading from BAR0.
The BARs, according to my kernel, are (in hex) 408010, 0, 0, 0, 920010, and 20110000 (if its reading the PCIe address space properly). I've got two boot logs written, you can find them here and here, respectively. My PCIe memory space read functions are on lines 110-129 of pci.rs (I don't have a read_qword function). I'm not sure how much the boot logs help; I'm not reading the physical address but the data at that address.
Edit: yes, I'm definitely reading from BAR0.
-
- Member
- Posts: 5563
- Joined: Mon Mar 25, 2013 7:01 pm
Re: Qemu: NVMe controller gets stuck during controller confi
I don't think it's reading the PCIe space properly, those BARs look like nonsense. BARs should point to physical addresses that aren't already being used elsewhere, but your memory map says all of those addresses are already occupied by RAM.
Re: Qemu: NVMe controller gets stuck during controller confi
That's what I was thinking. What is the "right way"? I get the address for the device in PCIe space, and then I read PCIe physical address + an offset of 0x10. The device PCIe address is at B0018000, so it would be reading from address B0018010. Is that how I do it, or am I missing something?Octocontrabass wrote:I don't think it's reading the PCIe space properly, those BARs look like nonsense. BARs should point to physical addresses that aren't already being used elsewhere, but your memory map says all of those addresses are already occupied by RAM.
Edit: Okay, so I was calling .offset() on the address pointer. I've changed it to do simple addition, physical address + addr, instead. and that drastically changed things. (Version check still passes though... weird.) Now, when we boot, BARs are FEBD4000, 0, 0, 0, FEBD7000, and 0. The PCIe command register is 107h, and altering bits 10, 2, and 1 don't change that. The interrupt line is still 43. VS is now 10200h, CSS is 1 (bits 44:37 of CAP). MPSMIN is 0h, The oroginal AQA is FF003F (and I then change that to 7FF07FF). Now I'm just getting stuck on the controller enable process. I still think I'm doing something wrong though because my version check passes.
Edit 2: okay, so... I think I wasn't thinking clearly when I wrote some of this code because I use an and and not an or for my version check. Whoops? (If memory serves, I did code some of this while drunk, so... ) I now get the expected "error" for version incompatibility, though I get version 1.2 and not 1.3. Might be my QEMU version though. Are there are any "unexpected" changes between 1.3 and 1.4/1.2 and 1.4 that I should know about? Or is it safe, for now, to follow the 1.4 spec?
-
- Member
- Posts: 5563
- Joined: Mon Mar 25, 2013 7:01 pm
Re: Qemu: NVMe controller gets stuck during controller confi
You might be interested in the lists of changes between NVMe 1.2 and 1.3 and between NVMe 1.3 and 1.4. I don't know how significant any of these changes are, though.Ethin wrote:Are there are any "unexpected" changes between 1.3 and 1.4/1.2 and 1.4 that I should know about? Or is it safe, for now, to follow the 1.4 spec?
Re: Qemu: NVMe controller gets stuck during controller confi
Thanks for those! Looking at the QEMU source for 5.1.0, it does only implement NVMe 1.2. That sadly means I'll have to go acquire the 1.2 specifications, wait for the next QEMU release and then acquire 1.3. I wonder if the NVMe driver is relatively new?Octocontrabass wrote:You might be interested in the lists of changes between NVMe 1.2 and 1.3 and between NVMe 1.3 and 1.4. I don't know how significant any of these changes are, though.Ethin wrote:Are there are any "unexpected" changes between 1.3 and 1.4/1.2 and 1.4 that I should know about? Or is it safe, for now, to follow the 1.4 spec?
Re: Qemu: NVMe controller gets stuck during controller confi
Update: so I know why the kernel is hanging when it enables the controller. Apparently Cc.CFS is being set. I'm not quite sure why this is happening. Looking at the QEMU source code for NVMe doesn't really help much, unfortunately. According to the NVMe specification, I am to do the following to initialize the controller:
1. Set PCIe configuration,
2. Wait for CSTS.RDY to be zero (I don't do this),
3. Configure the admin queue, including the AQA, ASQ, and ACQ registers,
4. Configure the controller settings, specifically the arbitration mechanism, the memory page size, and the I/O command set, and
5. Set CC.EN to one (this is where it all breaks).
There are more, but this is what works (and then dies).
Edit: So I know why its failing (thank you, qemu traces). I just have no idea how to actually fix it. The qemu traces say:
NVME_CQE_SIZE_LOG is 4 and NVME_SQE_SIZE_LOG is 6, but this seems... really arbitrary and I don't understand how they get that.
1. Set PCIe configuration,
2. Wait for CSTS.RDY to be zero (I don't do this),
3. Configure the admin queue, including the AQA, ASQ, and ACQ registers,
4. Configure the controller settings, specifically the arbitration mechanism, the memory page size, and the I/O command set, and
5. Set CC.EN to one (this is where it all breaks).
There are more, but this is what works (and then dies).
Edit: So I know why its failing (thank you, qemu traces). I just have no idea how to actually fix it. The qemu traces say:
The problem is that the NVMe specification says this for queue entry sizes:[email protected]:pci_nvme_err_startfail_cqent_too_small nvme_start_ctrl failed because the completion queue entry size is too small: log2size=0, min=15
[email protected]:pci_nvme_err_startfail setting controller enable bit failed
I can rely on cap.MQES, but the problem is that that's a 16-bit integer, whereas these only want four bits. SeaBIOS does this for its its initialization:23:20 RW/RO 0h
I/O Completion Queue Entry Size (IOCQES): This field defines the I/O Completion Queue entry size that is used for the selected I/O Command Set. The required and maximum values for this field are specified in the CQES field in the Identify Controller data structure in Figure 247 for each I/O Command Set. The value is in bytes and is specified as a power of two (2^n). If any I/O Completion Queues exist, then write operations that change the value in this field produce undefined results. If the controller does not support I/O queues, then this field shall be read-only with a value of 0h.
19:16 RW/RO 0h
I/O Submission Queue Entry Size (IOSQES): This field defines the I/O Submission Queue entry size that is used for the selected I/O Command Set. The required and maximum values for this field are specified in the SQES field in the Identify Controller data structure in Figure 247 for each I/O Command Set. The value is in bytes and is specified as a power of two (2^n). If any I/O Submission Queues exist, then write operations that change the value in this field produce undefined results. If the controller does not support I/O queues, then this field shall be read-only with a value of 0h.
Code: Select all
ctrl->reg->cc = NVME_CC_EN | (NVME_CQE_SIZE_LOG << 20)
| (NVME_SQE_SIZE_LOG << 16 /* IOSQES */);
Re: Qemu: NVMe controller gets stuck during controller confi
Hi Ethin,Ethin wrote:My code is available over here. Initialization code for the NVMe driver is on lines 564-746 (I need to break this file up into chunks). I'm doing my best to follow the NVMe article as well as the NVMe base specification, revision 1.4a.
I have recently started my NVMe driver as well, so it is nice to see someone else posting on this subject. May I ask what article you are referring to above?
Also, may I make a few comments about your code? Don't take them as criticism, since first I am not like that :-), and second, I don't know RUST at all, so I may be all wet here.
Using the latest code you have posted:
Starting with line 578, if bit 37 or bit 44 in the CAPS register is set, line 582 is redundant. In fact, line 583 will never be executed if bit 37 or bit 44 is set.
Line 593 is in error. You use "greater than or equal to". What if the minimum is greater than 4096? If this is true, the next line is wrong.
Line 604 writes a zero to the CC.EN bit whether it was already zero or not. As you later state, you probably should check this first. If it is already zero, no need to write to it. If it is one, you need to wait for the CSTS.RDY bit to also become zero, possibly timing out after so long if it doesn't.
What is the intent of line 606? "self.cc.write(cc);" Does this read from itself, then write the same value back? This would be my assumption, but there are no "Write Clear" bits in CC, so I am confused by its use.
I would put a timeout in the loop at Line 608. If the bit never becomes clear, you won't get very far.
Just asking a question and then providing some thoughts.
Thanks,
Ben
Re: Qemu: NVMe controller gets stuck during controller confi
I am referring to the NVMe article, https://wiki.osdev.org/NVMeBenLunt wrote:Hi Ethin,Ethin wrote:My code is available over here. Initialization code for the NVMe driver is on lines 564-746 (I need to break this file up into chunks). I'm doing my best to follow the NVMe article as well as the NVMe base specification, revision 1.4a.
I have recently started my NVMe driver as well, so it is nice to see someone else posting on this subject. May I ask what article you are referring to above?
Good point, thanks!BenLunt wrote: Also, may I make a few comments about your code? Don't take them as criticism, since first I am not like that , and second, I don't know RUST at all, so I may be all wet here.
Using the latest code you have posted:
Starting with line 578, if bit 37 or bit 44 in the CAPS register is set, line 582 is redundant. In fact, line 583 will never be executed if bit 37 or bit 44 is set.
I'll definitely fix that. Thanks!BenLunt wrote:Line 593 is in error. You use "greater than or equal to". What if the minimum is greater than 4096? If this is true, the next line is wrong.
I don't have a timeout mechanism set up yet; though I have access to the RTC and such, waiting exactly 500ms is something I'm not quite sure about. My ACPI library, I don't believe, lets me access the HPET yet (I'll check). My kernel does have a sleep function but its a bit convoluted.BenLunt wrote:Line 604 writes a zero to the CC.EN bit whether it was already zero or not. As you later state, you probably should check this first. If it is already zero, no need to write to it. If it is one, you need to wait for the CSTS.RDY bit to also become zero, possibly timing out after so long if it doesn't.
self.cc.write() is a Voladdress<> from the voladdress crate. It makes it easy to read and write MMIO values in a volatile manner without having to mess with pointers (pointers are managed internally).BenLunt wrote:What is the intent of line 606? "self.cc.write(cc);" Does this read from itself, then write the same value back? This would be my assumption, but there are no "Write Clear" bits in CC, so I am confused by its use.
Again, Thanks for all your suggestions!BenLunt wrote:I would put a timeout in the loop at Line 608. If the bit never becomes clear, you won't get very far.
Just asking a question and then providing some thoughts.
Thanks,
Ben
-
- Member
- Posts: 5563
- Joined: Mon Mar 25, 2013 7:01 pm
Re: Qemu: NVMe controller gets stuck during controller confi
The size is specified as a power of 2, so you need to write the binary logarithm of the size you want. It just so happens that four bits is enough to represent it in that form.Ethin wrote:I can rely on cap.MQES, but the problem is that that's a 16-bit integer, whereas these only want four bits.
They took the binary logarithm of 16 and 64.Ethin wrote:SeaBIOS does this for its its initialization:
[...]
NVME_CQE_SIZE_LOG is 4 and NVME_SQE_SIZE_LOG is 6, but this seems... really arbitrary and I don't understand how they get that.