OSDev.org

Posted: **Mon Dec 21, 2020 3:43 pm**

Ethin wrote:I've tried lowest priority, fixed and ExtInt delivery modes.

You probably want to use fixed, since the whole idea is to assign interrupts to specific CPU cores, but lowest priority should also work and may be useful for slower hardware. Intel suggests that ExtInt is used for emulating legacy hardware, so it might not work correctly.

Ethin wrote:My code for PCIe/MSI-X is over here:

Does the destination ID match the ID of the target local APIC? You may also find useful information by looking at how to send inter-processor interrupts, since the basic functionality is the same.

Posted: **Mon Dec 21, 2020 4:03 pm**

Octocontrabass wrote:
Ethin wrote:I've tried lowest priority, fixed and ExtInt delivery modes.
You probably want to use fixed, since the whole idea is to assign interrupts to specific CPU cores, but lowest priority should also work and may be useful for slower hardware. Intel suggests that ExtInt is used for emulating legacy hardware, so it might not work correctly.

Yeah, I figured it was something like that, and I only tried ExtInt to see if it was a problem with the emulator, though I doubted that to begin with. (I also discovered that I - hadn't - enabled the aPIC globally, so I fixed that, but still no go.)

Octocontrabass wrote:
Ethin wrote:My code for PCIe/MSI-X is over here:
Does the destination ID match the ID of the target local APIC? You may also find useful information by looking at how to send inter-processor interrupts, since the basic functionality is the same.

I've read that section of the manual, but I might go re-read it again. I've tried with destination IDs 0xFF and 0x00. I read somewhere (I think on here) that 0x00 meant the currently running processor, and the Intel manuals hint that 0xFF links to all processors in the system (though it calls them "agents")... But I'll extract the processor ID and try that and post an update on my status.

Posted: **Mon Dec 21, 2020 4:44 pm**

Okay, so the interrupt *is* being received. I understand the problem now and am fixing it. I added a logging call in my interrupt handler (I hadn't thought about doing that) and it fired successfully, so we're good. The handler tries to acquire a lock though, but its not an acquisition it waits on, and so if it fails to acquire a read lock it just bails out. I'm not really sure what the best way to go about handling this is, especially since I need an interrupt handler to execute as fast as possible. But thanks for all the help -- I greatly appreciate it!

Posted: **Mon Dec 21, 2020 6:32 pm**

Ethin wrote:The handler tries to acquire a lock though, but its not an acquisition it waits on, and so if it fails to acquire a read lock it just bails out. I'm not really sure what the best way to go about handling this is, especially since I need an interrupt handler to execute as fast as possible.

That depends on what the lock is protecting. You might just disable interrupts while the lock is held so the ISR will never attempt to acquire the lock when the lock is already held. Or, you might use a lock-free algorithm to ensure the ISR can make progress even when something else is operating on the shared data. Or, you might rearrange the way resources are shared so that the resource is locked by the existence of an ISR that might need to use it at any time.

Sounds like a fun research project to me.

Posted: **Mon Dec 21, 2020 7:06 pm**

Octocontrabass wrote:
Ethin wrote:The handler tries to acquire a lock though, but its not an acquisition it waits on, and so if it fails to acquire a read lock it just bails out. I'm not really sure what the best way to go about handling this is, especially since I need an interrupt handler to execute as fast as possible.
That depends on what the lock is protecting. You might just disable interrupts while the lock is held so the ISR will never attempt to acquire the lock when the lock is already held. Or, you might use a lock-free algorithm to ensure the ISR can make progress even when something else is operating on the shared data. Or, you might rearrange the way resources are shared so that the resource is locked by the existence of an ISR that might need to use it at any time.

Sounds like a fun research project to me.

I know, right? One last problem and then I (should) be ready to get rolling. I'm getting a strange capability ID (0xFF) in the capabilities list for the AHCI controller. It implements MSI (0x5) and my kernel notes that, but then I come across this weird one that's not in the PCI Code and ID Assignment Specification, Rev. 1.11. I'm able to read it, so here's what happens:

Kernel enumerates capabilities list, discovers MSI (0x5).
Kernel finds unknown capability with ID 0xFF and next pointer 0xFF.
Kernel reads in that capability, then jumps to the next pointer.
Kernel tries to read capability -> exception (invalid opcode).

(It seems like QEMU favors the invalid opcode exception quite a bit for a lot of different things...)
I'm not really sure what this ID means. What should I have my kernel do? I thought that 0x00 with the Next pointer meant the end of the list but...

Posted: **Mon Dec 21, 2020 7:22 pm**

Ethin wrote:What should I have my kernel do?

I'd start by dumping the AHCI controller's PCI configuration space to see if it really does have garbage in its capabilities list. If the list looks fine when parsed by hand, there's probably a bug in how you're interpreting the pointer to the next capability.

Ethin wrote:(It seems like QEMU favors the invalid opcode exception quite a bit for a lot of different things...)

That sounds more like a Rust thing. Doesn't it have runtime bounds checking?

Posted: **Mon Dec 21, 2020 7:48 pm**

Octocontrabass wrote:
Ethin wrote:What should I have my kernel do?
I'd start by dumping the AHCI controller's PCI configuration space to see if it really does have garbage in its capabilities list. If the list looks fine when parsed by hand, there's probably a bug in how you're interpreting the pointer to the next capability.

I'm reading words (u16s) for the capabilities list. I could read dwords, too.

Ethin wrote:(It seems like QEMU favors the invalid opcode exception quite a bit for a lot of different things...)
That sounds more like a Rust thing. Doesn't it have runtime bounds checking?

It does, but I'm not using anything that requires bounds checking. I'm not indexing the entire address space as an array, so its not going to check that for me. I'm using my own functions for reading and writing to the address space, so it expects me to do those checks.

Posted: **Tue Dec 22, 2020 4:31 pm**

Okay, so I'm not done with my errors yet. I've gotten to the point where reading everything works... Almost, anyway. I get a response. I ask my completion queue to start reading entries. It gets to the 256th one and then faults. I've altered my code to allocate enough space for the entire queue -- in this instance 2047 entries for both -- and it does that. But its failing to get any further than entry 256. (Obviously, there's only one response, but I can't know that for sure, so I just have it blast through the queue and ignore all non-new entries.) Here's how I go about doing that:

Code: Select all

    pub(crate) fn read_new_entries(
        &mut self,
        entry_storage_queue: &mut MiniVec<CompletionQueueEntry>,
    ) {
        let addr: DynamicVolBlock<u128> =
            unsafe { DynamicVolBlock::new(self.addr, self.entries as usize) };
        (0..self.entries as usize).for_each(|i| {
        info!("Reading entry {}", i);
            let entry = addr.index((self.qhead as usize) + i).read();
            if entry.get_bit(112) == self.phase {
                self.qhead = self.qhead.wrapping_add(1) % self.entries;
                let cqe = CompletionQueueEntry {
                    cmdret: entry.get_bits(0..32) as u32,
                    _rsvd: 0,
                    sqhd: entry.get_bits(64..80) as u16,
                    sqid: entry.get_bits(80..96) as u16,
                    cid: entry.get_bits(96..112) as u16,
                    phase: entry.get_bit(112),
                    status: entry.get_bits(113..128) as u16,
                };
                entry_storage_queue.push(cqe);
            }
        });
        self.phase = !self.phase;
    }

I'm not really sure what's wrong here. I allocate my memory like this:

Code: Select all

            let asqaddr = get_aligned_free_addr((size_of::<queues::SubmissionQueueEntry>() as u64)*asqsize, 4096);
            let acqaddr = get_aligned_free_addr((size_of::<queues::CompletionQueueEntry>() as u64)*acqsize, 4096);

Both those variables -- asqsize and acqsize -- are calculated like so:

Code: Select all

            let asqsize = if self.read_cap().get_bits(0..16) > 4095 {
                0x3FFC0
            } else {
                self.read_cap().get_bits(0..16)
            };
            let acqsize = if self.read_cap().get_bits(0..16) > 4095 {
                0xFFF0
            } else {
                self.read_cap().get_bits(0..16)
            };

Any ideas as to what the problem with this is? Is my math incorrect?

Posted: **Tue Dec 22, 2020 6:58 pm**

Your math is incorrect in two ways: the MQES field contains one less than the number of entries, so you must add one to its value to find the number of entries, and the values you return when it indicates more than 4096 entries are significantly larger than 4096. (The ASQS and ACQS fields work the same way, so keep that in mind if you change how you read MQES.)

But that may not be the only problem, since you didn't specify exactly how it fails when it attempts to go beyond 256 entries.

Posted: **Tue Dec 22, 2020 8:07 pm**

Octocontrabass wrote:Your math is incorrect in two ways: the MQES field contains one less than the number of entries, so you must add one to its value to find the number of entries, and the values you return when it indicates more than 4096 entries are significantly larger than 4096. (The ASQS and ACQS fields work the same way, so keep that in mind if you change how you read MQES.)

But that may not be the only problem, since you didn't specify exactly how it fails when it attempts to go beyond 256 entries.

Well... I know why its faulting (page fault). I'm calculating the index and adding that to qhead. Which is causing it to read the entire page -- not just the address I mapped. I also wasn't reading the number of entries properly -- there should be about 128 iterations because each entry is 16 bytes. (I think, anyway.)

Posted: **Tue Dec 22, 2020 9:45 pm**

Okay, so not quite done. Damn. Seems like I keep struggling with this

.
So I'm able to identify the controller as an Qemu NVMe Ctrl for the model number and the serial number is 0001 as I specified in my qemu args. However, I'm getting some really weird values for other things. I (believe) that this disk is formatted with an FS -- though I'd need to check. But, either way, the controller indicates that the NVM capacity, both total and unallocated, is zero; that the completion and submission queue entrie sizes are zero; and that the controller can have zero outstanding commands. It also says that the controller type is also zero and that I have zero namespaces. I'm really confused, especially because, according to the spec, 0 for controller type is reserved, and I definitely shouldn't be getting 0 for the sector usage; the disk is at least 50 MB in size. Is the implementation of QEMU defective or is it my code?

Posted: **Tue Dec 22, 2020 10:08 pm**

Ethin wrote:the completion and submission queue entrie sizes are zero

I know for sure that QEMU reports nonzero values for the queue entry sizes, so there must be a bug in your code. Some of the other values you're looking at are optional, and those ones might be zero in QEMU.

Posted: **Tue Dec 22, 2020 10:34 pm**

Well, changing the representation to C and packed fixed the problem, I think. I also reviewed the spec and learned that some of my knowledge on the struct was wrong, so I've updated my code accordingly. My code officially works now. Now I can finish the initialization process. Woohoo! Thanks for all your time -- I'm glad to have finally gotten this working!

OSDev.org

NVMe confusion

Re: NVMe confusion

Re: NVMe confusion

Re: NVMe confusion

Re: NVMe confusion

Re: NVMe confusion

Re: NVMe confusion

Re: NVMe confusion

Re: NVMe confusion

Re: NVMe confusion

Re: NVMe confusion

Re: NVMe confusion

Re: NVMe confusion

Re: NVMe confusion