NVMe: NVM capacity and unallocated capacity are zero?

Ethin · Post by **Ethin** » Wed Mar 24, 2021 10:49 pm

So I'm kinda confused about this. I have a disk that's 100 GiB virtual size, but QEMU doesn't (seem) to be accurately reporting this. The identify controller data structure lists 0 for NVM capacity as well as unallocated capacity. (Also, I don't think the controller has *any* namespaces.) I don't have any physical NVMe hardware on-hand -- which would be nice to have -- so I'm not sure of a good way of testing all of the functionality I'll need to add. Thoughts?

Octocontrabass · Post by **Octocontrabass** » Thu Mar 25, 2021 12:47 pm

TNVMCAP and UNVMCAP are mandatory for any NVMe controller that supports the Namespace Management capability, which QEMU claims to support, but I couldn't find any code to set them to appropriate values. I'd say that's a bug in QEMU.

Ethin · Post by **Ethin** » Thu Mar 25, 2021 1:52 pm

Octocontrabass wrote:TNVMCAP and UNVMCAP are mandatory for any NVMe controller that supports the Namespace Management capability, which QEMU claims to support, but I couldn't find any code to set them to appropriate values. I'd say that's a bug in QEMU.

The more I dig into this the more I'm starting to wonder if QEMU's NVMe implementation was never programmed properly. Evidence:

MDTS is 7, or 128 bytes, which is ridiculously tiny for data transfers -- that's not even a full sector.
ESDTT is 0, suggesting that self-tests aren't supported
Max NVM set ID and endurance group IDs are 0
MAXCMD is 0
NN is only 256 and MNAN is 0
I can't retrieve a list of active namespaces. The system hangs when I try to do that. No interrupt is fired, suggesting an error condition.

I significantly doubt a real NVMe implementation would behave this way. The problem is that, though I can get a real NVMe disk, I don't know how to use USB passthrough but make it look like a PCIe device so I can test it, and I'd rather not build an entire USB stack just to test an NVMe driver. Talk about overkill...

Octocontrabass · Post by **Octocontrabass** » Thu Mar 25, 2021 4:54 pm

Ethin wrote:MDTS is 7, or 128 bytes, which is ridiculously tiny for data transfers -- that's not even a full sector.

MDTS is specified as the number of MPSMIN-sized blocks. QEMU sets MPSMIN to indicate 4kiB, so 128 of those is 512kiB.

Ethin wrote:ESDTT is 0, suggesting that self-tests aren't supported

Device self-test support is indicated by OACS bit 4. QEMU doesn't set that bit.

Ethin wrote:Max NVM set ID and endurance group IDs are 0

Support for NVM sets are indicated by CTRATT bit 2. Support for endurance groups are indicated by CTRATT bit 4. QEMU doesn't set either bit.

Ethin wrote:MAXCMD is 0

MAXCMD is optional for NVMe attached to PCIe.

Ethin wrote:NN is only 256 and MNAN is 0

That sounds reasonable to me. MNAN is optional. Do you really need more than 256 namespaces?

Ethin wrote:The problem is that, though I can get a real NVMe disk, I don't know how to use USB passthrough but make it look like a PCIe device so I can test it, and I'd rather not build an entire USB stack just to test an NVMe driver. Talk about overkill...

You can do PCIe passthrough with an external Thunderbolt enclosure. I can't imagine it's a very cost-effective way to test your driver, but it's an option!

Ethin · Post by **Ethin** » Thu Mar 25, 2021 6:27 pm

Octocontrabass wrote:
Ethin wrote:MDTS is 7, or 128 bytes, which is ridiculously tiny for data transfers -- that's not even a full sector.
MDTS is specified as the number of MPSMIN-sized blocks. QEMU sets MPSMIN to indicate 4kiB, so 128 of those is 512kiB.

Ethin wrote:ESDTT is 0, suggesting that self-tests aren't supported
Device self-test support is indicated by OACS bit 4. QEMU doesn't set that bit.

Ethin wrote:Max NVM set ID and endurance group IDs are 0
Support for NVM sets are indicated by CTRATT bit 2. Support for endurance groups are indicated by CTRATT bit 4. QEMU doesn't set either bit.

Ethin wrote:MAXCMD is 0
MAXCMD is optional for NVMe attached to PCIe.

Ethin wrote:NN is only 256 and MNAN is 0
That sounds reasonable to me. MNAN is optional. Do you really need more than 256 namespaces?

Ethin wrote:The problem is that, though I can get a real NVMe disk, I don't know how to use USB passthrough but make it look like a PCIe device so I can test it, and I'd rather not build an entire USB stack just to test an NVMe driver. Talk about overkill...
You can do PCIe passthrough with an external Thunderbolt enclosure. I can't imagine it's a very cost-effective way to test your driver, but it's an option!

Aha, thanks. Yeah, my NVMe controller is getting hung somewhere. Or my code is messing up. I can send the initial identify command, but its getting hung when I ask it for active NSIDs. The code looks like:

Code: Select all

    fn process_command(
        &mut self,
        req: Self::CommandRequest,
    ) -> Result<Self::Response, Self::Error> {
        debug!("Processing command {:?}", req);
        debug!("Waiting for controller to be ready...");
        loop {
            if self.read_csts().get_bit(0) {
                break;
            }
        }
        debug!("Writing request");
        self.sqs[req.qid].queue_command(req.entry);
        if req.qid == 0 {
            debug!("Queue is admin queue, writing admin queue doorbell");
            self.write_adm_sub_tail_queue_doorbell(self.sqs[req.qid].get_queue_tail());
        } else {
            debug!("Writing to queue {} doorbell", req.qid);
            self.write_sub_tail_doorbell(req.qid, self.sqs[req.qid].get_queue_tail());
        }
        debug!("Waiting for response");
        loop {
            debug!("Attempting to acquire interrupt queue read lock");
            if let Some(i) = INTRS.try_read() {
                debug!("Checking to see if this interrupt was ours");
                if i[self.intrspos].1 > 0 {
                    break;
                }
            }
            debug!("No interrupts received, waiting for more");
            hlt();
        }
        debug!("Disabling interrupts");
        disable_interrupts();
        debug!("Acquiring interrupt queue write lock");
        let mut i = INTRS.write();
        debug!("Reading new queue entries");
        let mut entries: MiniVec<queues::CompletionQueueEntry> = MiniVec::new();
        self.cqs[req.qid].read_new_entries(&mut entries);
        debug!(
            "Decreasing interrupt count from {} to {}",
            i[self.intrspos].1,
            i[self.intrspos].1 - 1
        );
        i[self.intrspos].1 -= 1;
        if req.qid == 0 {
            debug!("Writing to admin completion queue doorbell");
            self.write_adm_comp_head_queue_doorbell(self.cqs[req.qid].get_queue_head());
        } else {
            debug!("Writing completion queue doorbell for queue {}", req.qid);
            self.write_comp_head_doorbell(req.qid, self.cqs[req.qid].get_queue_head());
        }
        if entries.len() > 1 {
            warn!(
                "Retrieved {} responses; returning only first",
                entries.len()
            );
            entries.truncate(1);
        }
        let entry = entries[0];
        enable_interrupts();
        if entry.status.sc != 0x00 {
            Err(entry.status)
        } else {
            Ok(Response {
                qid: req.qid,
                entry,
            })
        }
    }

The output looks like:

Code: Select all

[DEBUG] [libk::nvme] Processing command Request { qid: 0, entry: SubmissionQueueEntry { cdw0: 6, nsid: 0, _rsvd: 0, mptr: 0, prps: [2077036544, 0], operands: [1, 0, 0, 0, 0, 0] } }
[DEBUG] [libk::nvme] Waiting for controller to be ready...
[DEBUG] [libk::nvme] Writing request
[DEBUG] [libk::nvme] Queue is admin queue, writing admin queue doorbell
[DEBUG] [libk::nvme] Writing to adm submission tail doorbel at memaddr FEBD5000h: 1h
[DEBUG] [libk::nvme] Waiting for response
[DEBUG] [libk::interrupts] Interrupt received for int 229
[DEBUG] [libk::interrupts] Acquired lock
[DEBUG] [libk::interrupts] Calling func 0
[DEBUG] [libk::nvme] Attempting to acquire interrupt queue read lock
[DEBUG] [libk::nvme] Checking to see if this interrupt was ours
[DEBUG] [libk::nvme] Disabling interrupts
[DEBUG] [libk::nvme] Acquiring interrupt queue write lock
[DEBUG] [libk::nvme] Reading new queue entries
[DEBUG] [libk::nvme] Decreasing interrupt count from 1 to 0
[DEBUG] [libk::nvme] Writing to admin completion queue doorbell
[DEBUG] [libk::nvme] Writing to adm completion head doorbel at memaddr FEBD5004h: 1h
[INFO] [libk::nvme] Vendor ID: 1B36, subsystem vendor ID: 1AF4
[INFO] [libk::nvme] Serial number: 0001
[INFO] [libk::nvme] Model number: QEMU NVMe Ctrl
[INFO] [libk::nvme] Firmware revision: 1.0
[INFO] [libk::nvme] RAB: 40h
[INFO] [libk::nvme] FRU GUID: 0
[INFO] [libk::nvme] NVM capacity total: 0; unallocated NVM capacity: 0
[INFO] [libk::nvme] NVMe qualified name: nqn.2019-08.org.qemu:0001
[INFO] [libk::nvme] MDTS is 7 pages (128 bytes)
[INFO] [libk::nvme] Extended device self-test time is 0 minutes
[INFO] [libk::nvme] Max NVM set identifier is 0
[INFO] [libk::nvme] Max endurance group ID is 0
[INFO] [libk::nvme] SQES is 6
[INFO] [libk::nvme] CQES is 4
[INFO] [libk::nvme] maxcmd is 0
[INFO] [libk::nvme] nn is 256, mnan is 0
[INFO] [libk::nvme] Checking for active namespaces
[DEBUG] [libk::nvme] Processing command Request { qid: 0, entry: SubmissionQueueEntry { cdw0: 6, nsid: 0, _rsvd: 0, mptr: 0, prps: [2046713856, 0], operands: [2, 0, 0, 0, 0, 0] } }
[DEBUG] [libk::nvme] Waiting for controller to be ready...
[DEBUG] [libk::nvme] Writing request
[DEBUG] [libk::nvme] Queue is admin queue, writing admin queue doorbell
[DEBUG] [libk::nvme] Writing to adm submission tail doorbel at memaddr FEBD5000h: 2h
[DEBUG] [libk::nvme] Waiting for response
[DEBUG] [libk::nvme] Attempting to acquire interrupt queue read lock
[DEBUG] [libk::nvme] Checking to see if this interrupt was ours
[DEBUG] [libk::nvme] No interrupts received, waiting for more

Its weird and I'm not really sure what's wrong. Even scoping the locks to force them to auto-release doesn't solve the problem.
Update: I completely forgot the QEMU logs that I'd had generated. But the logs raise more questions than answers:

Code: Select all

[email protected]:pci_nvme_admin_cmd cid 0 sqid 0 opc 0x0 opname 'NVME_ADM_CMD_DELETE_SQ'
[email protected]:pci_nvme_err_invalid_del_sq invalid submission queue deletion, sid=0
[email protected]:pci_nvme_enqueue_req_completion cid 0 cqid 0 status 0x4101
[email protected]:pci_nvme_err_req_status cid 0 nsid 4294967295 status 0x4101 opc 0x0
[email protected]:pci_nvme_irq_msix raising MSI-X IRQ vector 0
[email protected]:apic_deliver_irq dest 255 dest_mode 0 delivery_mode 0 vector 158 trigger_mode 0
[email protected]:apic_report_irq_delivered coalescing 9
[email protected]:apic_report_irq_delivered coalescing 9
[email protected]:apic_report_irq_delivered coalescing 9
[email protected]:apic_report_irq_delivered coalescing 9
[email protected]:apic_report_irq_delivered coalescing 9
[email protected]:apic_report_irq_delivered coalescing 9
[email protected]:apic_report_irq_delivered coalescing 9
[email protected]:apic_report_irq_delivered coalescing 9

The confusing thing is that I'm not sending that opcode at all. My identify command sends the identify opcode, 0x06. I'm passing 0x02 as the CNS value and 0 as the NSID, controller ID, NVM set identifier, and UUID index. (Note: the interrupt value changed because this was a different run and my kernel attempts to assign random IRQ numbers to devices to reduce interrupt collisions.)
On a different note, I managed to get my kernel booting in VirtualBox, but it doesn't seem like it supports MSI-X and I haven't implemented standard MSI.

Octocontrabass · Post by **Octocontrabass** » Thu Mar 25, 2021 8:04 pm

Ethin wrote:The confusing thing is that I'm not sending that opcode at all.

The opcode is zero, and the other fields in the log are also zero. The controller is probably reading uninitialized memory, which would happen if you've written the command at the wrong location in the submission queue or told the controller to read past the end of the queue.

Where in the queue are the two commands being written? If they're the wrong distance apart, that would explain why only the first command works.

Ethin · Post by **Ethin** » Thu Mar 25, 2021 8:50 pm

Octocontrabass wrote:
Ethin wrote:The confusing thing is that I'm not sending that opcode at all.
The opcode is zero, and the other fields in the log are also zero. The controller is probably reading uninitialized memory, which would happen if you've written the command at the wrong location in the submission queue or told the controller to read past the end of the queue.

Where in the queue are the two commands being written? If they're the wrong distance apart, that would explain why only the first command works.

Actually, you just, I think, helped me solve my problem. In theory. My submission code writes commands at (tail + i), where tail is the completion tail value and is a 32-bit byte offset from 0-15. Writing code is this:

Code: Select all

    pub(crate) fn queue_command(&mut self, entry: SubmissionQueueEntry) {
        let addr: DynamicVolBlock<u32> =
            unsafe { DynamicVolBlock::new(self.addr, (self.entries as usize) * 16) };
        let mut entry = entry;
        self.cid = self.cid.wrapping_add(1);
        entry.cdw0.set_bits(16..32, self.cid as u32);
        // Fill in array
        let mut cmd = [0u32; 16];
        // Dword 0 - CDW0 (command-specific)
        cmd[0] = entry.cdw0;
        // Dword 1 - Namespace ID
        cmd[1] = entry.nsid;
        // Dwords 2-3 reserved
        cmd[2] = 0;
        cmd[3] = 0;
        // Dwords 4-5 - Metadata pointer
        cmd[4] = entry.mptr.get_bits(0..32) as u32;
        cmd[5] = entry.mptr.get_bits(32..64) as u32;
        // Dwords 6-9 - PRP list
        cmd[6] = entry.prps[0].get_bits(0..32) as u32;
        cmd[7] = entry.prps[0].get_bits(32..64) as u32;
        cmd[8] = entry.prps[1].get_bits(0..32) as u32;
        cmd[9] = entry.prps[1].get_bits(32..64) as u32;
        // Dwords 10-15 - command arguments
        cmd[10] = entry.operands[0];
        cmd[11] = entry.operands[1];
        cmd[12] = entry.operands[2];
        cmd[13] = entry.operands[3];
        cmd[14] = entry.operands[4];
        cmd[15] = entry.operands[5];
        cmd.iter().enumerate().for_each(|(i, c)| {
            addr.index((self.qtail as usize) + i).write(*c);
        });
        self.qtail = self.qtail.wrapping_add(1) % self.entries;
    }

The problematic bit is this:

Code: Select all

        let addr: DynamicVolBlock<u32> =
            unsafe { DynamicVolBlock::new(self.addr, (self.entries as usize) * 16) };

The issue is that I don't know how to write entries if I alter this code. SeaBios does it like this:

Code: Select all

struct nvme_sqe *sqe = &sq->sqe[sq->tail];

But I can't do this because I can't write complex objects without delving into unsafe pointers. I can't store complex objects in a VolAddress/DynamicVolBlock -- they're not designed for that, and so I have to settle for writing u32s instead. I could always make an inline function to do the calculation for me but I get the feeling like that's a bit much for something simple like this.

Octocontrabass · Post by **Octocontrabass** » Thu Mar 25, 2021 9:16 pm

Aren't you just missing a multiply by 16 in the loop that writes to memory? Something like this:

Code: Select all

addr.index((self.qtail as usize) * 16 + i).write(*c);

Ethin · Post by **Ethin** » Thu Mar 25, 2021 10:37 pm

Yeah, I was:

Code: Select all

[email protected]:pci_nvme_admin_cmd cid 2 sqid 0 opc 0x6 opname 'NVME_ADM_CMD_IDENTIFY'
[email protected]:pci_nvme_identify_nslist nsid 0
[email protected]:pci_nvme_map_prp trans_len 4096 len 4096 prp1 0x15e62000 prp2 0x0 num_prps 2
[email protected]:pci_nvme_map_addr addr 0x15e62000 len 4096
[email protected]:pci_nvme_enqueue_req_completion cid 2 cqid 0 status 0x0
[email protected]:pci_nvme_irq_msix raising MSI-X IRQ vector 0
[email protected]:apic_deliver_irq dest 255 dest_mode 0 delivery_mode 0 vector 234 trigger_mode 0
[email protected]:apic_report_irq_delivered coalescing 9
[email protected]:apic_report_irq_delivered coalescing 9
[email protected]:apic_report_irq_delivered coalescing 9
[email protected]:apic_report_irq_delivered coalescing 9
[email protected]:apic_report_irq_delivered coalescing 9
[email protected]:apic_report_irq_delivered coalescing 9
[email protected]:apic_report_irq_delivered coalescing 9
[email protected]:apic_report_irq_delivered coalescing 9

Now I'm just suffering a lock issue. It shouldn't be too hard to fix that, I don't think.

Ethin · Post by **Ethin** » Sat Mar 27, 2021 10:15 pm

Okay, so the lock issue I referenced is something I still haven't been able to solve. Currently, this is what happens:

System sends identify command:
1. Wait for controller to be ready
2. Queue command
3. Write submission doorbell
4. Use hlt to wait for interrupts
ISR is reached. In ISR:
1. Attempt to acquire interrupt queue lock
2. If successful, perform our work, release the mutex, and return. Otherwise, continuously attempt lock acquisition until we succeed.
After interrupt finishes, we hang.

The hang is, of course, not intended behavior at all. But its happening, and I cannot for the life of me figure out why. I assumed that HLT was akin to WFI on other processors, and I do want to wait for an interrupt; I don't want to loop and continuously attempt to perform lock acquisition in my command processor and not allow an interrupt through. Am I misunderstanding what HLT does?
Edit: this happens even if identify succeeds. As soon as I attempt to check for active namespaces, this same hang happens. GDB is completely useless with helping me -- maybe I'm just not using it correctly.

Octocontrabass · Post by **Octocontrabass** » Sun Mar 28, 2021 1:53 am

The HLT instruction waits for an interrupt. If you don't mess with the interrupt stack frame, the interrupt returns to whichever instruction follows the HLT.

Can the interrupt arrive before HLT? You could be waiting for an interrupt that already happened.

Are you using GDB to debug at source level or machine instruction level? Source-level debugging expects the CPU to eventually reach another line of source code, which might not be happening here.

Ethin · Post by **Ethin** » Sun Mar 28, 2021 10:02 am

I'm debugging at source-level. I'm going to try one last way with MPMC queues.
Update: Okay, so the queue stuff does work, but again, *only* for the identify command. I get the feeling that the queue idea is sound. But this is really, really irritating. I can't have the kernel randomly waiting indefinitely because an interrupt is never fired. This is what my code looks like:

Code: Select all

// Initialization
static INTRS: SegQueue<Interrupt> = SegQueue::new();
// Type definitions
#[derive(Clone, Copy, Debug, Eq, PartialEq, Ord, PartialOrd, Hash)]
struct Interrupt(pub u128, pub usize);
// Wait on interrupt
        let mut i: Interrupt;
        loop {
            if !INTRS.is_empty() {
                i = INTRS.pop().unwrap();
                if i.0 == self.id {
                    break;
                } else {
                    INTRS.push(i);
                }
            }
        }
// ISR
    register_interrupt_handler(
        dev.int_line,
        Box::new(move |_| {
            INTRS.push(Interrupt(dev.unique_dev_id, 1));
        }),
    );

Granted, all of this code isn't in the order its written in, but its the important bits. The *only* thing I'm not doing is checking the fatal status bit and issuing a reset if that happens. But my qemu logs are below and I doubt that's what's happening:

Code: Select all

[email protected]:pci_nvme_admin_cmd cid 2 sqid 0 opc 0x6 opname 'NVME_ADM_CMD_IDENTIFY'
[email protected]:pci_nvme_identify_nslist nsid 0
[email protected]:pci_nvme_map_prp trans_len 4096 len 4096 prp1 0x69516000 prp2 0x0 num_prps 2
[email protected]:pci_nvme_map_addr addr 0x69516000 len 4096
[email protected]:pci_nvme_enqueue_req_completion cid 2 cqid 0 status 0x0
[email protected]:pci_nvme_irq_msix raising MSI-X IRQ vector 0
[email protected]:apic_deliver_irq dest 255 dest_mode 0 delivery_mode 0 vector 102 trigger_mode 0
[email protected]:apic_report_irq_delivered coalescing 9
[email protected]:apic_report_irq_delivered coalescing 9
[email protected]:apic_report_irq_delivered coalescing 9
[email protected]:apic_report_irq_delivered coalescing 9
[email protected]:apic_report_irq_delivered coalescing 9
[email protected]:apic_report_irq_delivered coalescing 9
[email protected]:apic_report_irq_delivered coalescing 9
[email protected]:apic_report_irq_delivered coalescing 9

Octocontrabass · Post by **Octocontrabass** » Sun Mar 28, 2021 3:10 pm

According to your QEMU log, the NVMe controller is processing your command successfully, returning the namespace list, and raising an interrupt.

Are interrupts enabled for the code that checks if the interrupt has arrived? If so, the interrupt may arrive after deciding to use HLT but before actually using HLT.

Have you tried instruction-level debugging to see what the CPU is doing when it hangs?

Ethin · Post by **Ethin** » Sun Mar 28, 2021 7:42 pm

Octocontrabass wrote:According to your QEMU log, the NVMe controller is processing your command successfully, returning the namespace list, and raising an interrupt.

Are interrupts enabled for the code that checks if the interrupt has arrived? If so, the interrupt may arrive after deciding to use HLT but before actually using HLT.

Have you tried instruction-level debugging to see what the CPU is doing when it hangs?

I just did that, adding a cli and a sti in the code to try to force interrupts to occur when I want them to. This is what happens (in code) now:

Code: Select all

// At function entry
unsafe {
asm!("cli", options(nomem, nostack));
}
// When we're about to wait for a response
let mut i: Interrupt;
unsafe {
    asm!("sti", options(nomem, nostack));
    asm!("hlt", options(nomem, nostack));
}
// Now we're waiting...
        loop {
            if !INTRS.is_empty() {
                i = INTRS.pop().unwrap();
                if i.0 == self.id {
                    break;
                } else {
                    INTRS.push(i);
                    unsafe {
                        asm!("pause", options(nomem, nostack));
                    }
                }
            } else {
                unsafe {
                    asm!("pause", options(nomem, nostack));
                }
                continue;
            }
        }
        let mut entries: MiniVec<queues::CompletionQueueEntry> = MiniVec::new();

From what GDB is telling me (with the /s parameter to disassemble), this is the instruction stream (with GDB's stuff in it too), and nothing appears tobe amiss:

Code: Select all

728             let mut i: Interrupt;
729                 unsafe {
730         asm!("sti", options(nomem, nostack));
   0x00000000003228dc <+2172>:  sti

731         asm!("hlt", options(nomem, nostack));
   0x00000000003228dd <+2173>:  hlt # things get stuck here

732         }
733             loop {
=> 0x00000000003228de <+2174>:  jmp    0x3228e0 <<libk::nvme::NvmeController as libk::disk::Disk>::process_command+2176>

734                 if !INTRS.is_empty() {
   0x00000000003228e0 <+2176>:  mov    0x1282c9(%rip),%rdi        # 0x44abb0
   0x00000000003228e7 <+2183>:  callq  0x316750 <crossbeam_queue::seg_queue::SegQueue<T>::is_empty>
   0x00000000003228ec <+2188>:  mov    %al,0x93(%rsp)
   0x00000000003228f3 <+2195>:  mov    0x93(%rsp),%al
   0x00000000003228fa <+2202>:  xor    $0xff,%al
   0x00000000003228fc <+2204>:  test   $0x1,%al
   0x00000000003228fe <+2206>:  jne    0x322902 <<libk::nvme::NvmeController as libk::disk::Disk>::process_command+2210>
   0x0000000000322900 <+2208>:  jmp    0x322918 <<libk::nvme::NvmeController as libk::disk::Disk>::process_command+2232>

735                     i = INTRS.pop().unwrap();
   0x0000000000322902 <+2210>:  lea    0x340(%rsp),%rdi
   0x000000000032290a <+2218>:  mov    0x12829f(%rip),%rsi        # 0x44abb0
   0x0000000000322911 <+2225>:  callq  0x315620 <crossbeam_queue::seg_queue::SegQueue<T>::pop>
   0x0000000000322916 <+2230>:  jmp    0x32291f <<libk::nvme::NvmeController as libk::disk::Disk>::process_command+2239>

742                         }
743                     }
744                 } else {
745                 unsafe {
746                 asm!("pause", options(nomem, nostack));
   0x0000000000322918 <+2232>:  pause
   0x000000000032291a <+2234>:  jmpq   0x3229d6 <<libk::nvme::NvmeController as libk::disk::Disk>::process_command+2422>

735                     i = INTRS.pop().unwrap();
   0x000000000032291f <+2239>:  lea    0x117962(%rip),%rsi        # 0x43a288
   0x0000000000322926 <+2246>:  lea    0x340(%rsp),%rdi
   0x000000000032292e <+2254>:  callq  0x255690 <core::option::Option<T>::unwrap>
   0x0000000000322933 <+2259>:  mov    %rcx,0x78(%rsp)
   0x0000000000322938 <+2264>:  mov    %rdx,0x80(%rsp)
   0x0000000000322940 <+2272>:  mov    %rax,0x88(%rsp)
   0x0000000000322948 <+2280>:  mov    0x88(%rsp),%rax
   0x0000000000322950 <+2288>:  mov    0x80(%rsp),%rcx
   0x0000000000322958 <+2296>:  mov    0x100(%rsp),%rsi
   0x0000000000322960 <+2304>:  mov    0x78(%rsp),%rdx
   0x0000000000322965 <+2309>:  mov    %rax,0x4f0(%rsp)
   0x000000000032296d <+2317>:  mov    %rcx,0x4f8(%rsp)
   0x0000000000322975 <+2325>:  mov    %rdx,0x500(%rsp)

736                     if i.0 == self.id {
   0x000000000032297d <+2333>:  mov    0x108(%rsi),%rdx
   0x0000000000322984 <+2340>:  mov    0x110(%rsi),%rsi
   0x000000000032298b <+2347>:  xor    %rsi,%rcx
   0x000000000032298e <+2350>:  xor    %rdx,%rax
   0x0000000000322991 <+2353>:  or     %rcx,%rax
   0x0000000000322994 <+2356>:  jne    0x3229a7 <<libk::nvme::NvmeController as libk::disk::Disk>::process_command+2375>
   0x0000000000322996 <+2358>:  jmp    0x322998 <<libk::nvme::NvmeController as libk::disk::Disk>::process_command+2360>

747                 }
748                 continue;
749                 }
750             }
751             let mut entries: MiniVec<queues::CompletionQueueEntry> = MiniVec::new();
   0x0000000000322998 <+2360>:  callq  0x2b3c50 <minivec::MiniVec<T>::new>
   0x000000000032299d <+2365>:  mov    %rax,0x360(%rsp)
   0x00000000003229a5 <+2373>:  jmp    0x3229db <<libk::nvme::NvmeController as libk::disk::Disk>::process_command+2427>
   0x00000000003229a7 <+2375>:  mov    0x78(%rsp),%rcx
   0x00000000003229ac <+2380>:  mov    0x80(%rsp),%rdx
   0x00000000003229b4 <+2388>:  mov    0x88(%rsp),%rsi

739                         INTRS.push(i);
   0x00000000003229bc <+2396>:  mov    0x1281ed(%rip),%rax        # 0x44abb0
   0x00000000003229c3 <+2403>:  mov    0x1281e6(%rip),%rdi        # 0x44abb0
   0x00000000003229ca <+2410>:  callq  0x315e60 <crossbeam_queue::seg_queue::SegQueue<T>::push>

740                         unsafe {
741                         asm!("pause", options(nomem, nostack));
   0x00000000003229cf <+2415>:  pause

733             loop {
   0x00000000003229d1 <+2417>:  jmpq   0x3228e0 <<libk::nvme::NvmeController as libk::disk::Disk>::process_command+2176>

So I'm really, really confused. I'm not sure what's wrong -- this assembly dump looks perfectly fine to me. (Also, the bug appears to be random: sometimes the bug happens on the first command, other times the bug happens on the second, and other times the interrupt fires but the loop is never terminated).
Update: I've updated it to use a while loop (kinda pointless since I can just use a pure loop construct) but even that didn't help. I'd love for this driver to be completely interrupt driven, but I don't really know how to do that, so I'm stuck with this hack of sorts. I don't know if my driver implementation is just wrong or what at this point. My current command processing code for one command is this:

Code: Select all

impl Disk for NvmeController {
    type CommandRequest = Request;
    type Response = Response;
    type Error = queues::Status;
    fn process_command(
        &mut self,
        req: Self::CommandRequest,
    ) -> Result<Self::Response, Self::Error> {
        debug!("Processing command {:?}", req);
        debug!("Waiting for controller to be ready...");
        loop {
            if self.read_csts().get_bit(0) {
                break;
            }
        }
        debug!("Writing request");
        self.sqs[req.qid].queue_command(req.entry);
        if req.qid == 0 {
            debug!("Queue is admin queue, writing admin queue doorbell");
            self.write_adm_sub_tail_queue_doorbell(self.sqs[req.qid].get_queue_tail());
        } else {
            debug!("Writing to queue {} doorbell", req.qid);
            self.write_sub_tail_doorbell(req.qid, self.sqs[req.qid].get_queue_tail());
        }
        debug!("Waiting for response");
        while INTRS.is_empty() {
            unsafe {
                asm!("hlt", options(nomem, nostack));
            }
        }
        let _ = INTRS.pop().unwrap();
        let mut entries: MiniVec<queues::CompletionQueueEntry> = MiniVec::new();
        self.cqs[req.qid].read_new_entries(&mut entries);
        if req.qid == 0 {
            debug!("Writing to admin completion queue doorbell");
            self.write_adm_comp_head_queue_doorbell(self.cqs[req.qid].get_queue_head());
        } else {
            debug!("Writing completion queue doorbell for queue {}", req.qid);
            self.write_comp_head_doorbell(req.qid, self.cqs[req.qid].get_queue_head());
        }
        if entries.len() > 1 {
            warn!(
                "Retrieved {} responses; returning only first",
                entries.len()
            );
            entries.truncate(1);
        }
        let entry = entries[0];
        if entry.status.sc != 0x00 {
            Err(entry.status)
        } else {
            Ok(Response {
                qid: req.qid,
                entry,
            })
        }
    }
}

Octocontrabass · Post by **Octocontrabass** » Sun Mar 28, 2021 9:25 pm

Ethin wrote:Also, the bug appears to be random: sometimes the bug happens on the first command, other times the bug happens on the second, and other times the interrupt fires but the loop is never terminated.

Getting stuck at the HLT instruction is only possible if the interrupt has already arrived, which can only happen if interrupts are enabled anywhere between the write to the command queue doorbell and the STI/HLT pair. The loop never terminating sounds like either a bug in the crate you're using for thread-safe data structures or memory corruption. (Is your stack big enough? Did you disable the red zone?)

Ethin wrote:I'd love for this driver to be completely interrupt driven, but I don't really know how to do that, so I'm stuck with this hack of sorts.

Have your scheduler provide a generic "sleep until my IRQ has arrived" function for driver threads to call, and a "wake my driver" function for ISRs to call. Hiding all of the ugly architecture-specific bits will make it feel much less like a hack. The only trick is that the IRQ may arrive before the driver tries to sleep.

Unfortunately, even looking at your code I can't spot anything wrong. That doesn't mean there are no obvious problems, I'm just totally lost reading Rust.

OSDev.org

NVMe: NVM capacity and unallocated capacity are zero?

NVMe: NVM capacity and unallocated capacity are zero?

Re: NVMe: NVM capacity and unallocated capacity are zero?

Re: NVMe: NVM capacity and unallocated capacity are zero?

Re: NVMe: NVM capacity and unallocated capacity are zero?

Re: NVMe: NVM capacity and unallocated capacity are zero?

Re: NVMe: NVM capacity and unallocated capacity are zero?

Re: NVMe: NVM capacity and unallocated capacity are zero?

Re: NVMe: NVM capacity and unallocated capacity are zero?

Re: NVMe: NVM capacity and unallocated capacity are zero?

Re: NVMe: NVM capacity and unallocated capacity are zero?

Re: NVMe: NVM capacity and unallocated capacity are zero?

Re: NVMe: NVM capacity and unallocated capacity are zero?

Re: NVMe: NVM capacity and unallocated capacity are zero?

Re: NVMe: NVM capacity and unallocated capacity are zero?

Re: NVMe: NVM capacity and unallocated capacity are zero?