Receiving unnecessary interrupt from Intel 8254x

Brendan · Post by **Brendan** » Sun Apr 30, 2017 2:22 am

Hi,

mariuszp wrote:Hang on, but the EOI needs to be sent to the LAPIC... and I have SMP, so the thread may run on a different CPU (so with a different LAPIC).

There's 3 ways to fix that:

Make sure that the IRQ can only be received by a specific CPU (either BSP for PIC chips, or by using "fixed delivery" for IO APIC or MSI), and permanently pin the threads to that specific CPU.
Temporarily pin the thread to a specific CPU when kernel notifies the thread that the IRQ was received; and when the thread tells the kernel it finished handling the IRQ the kernel will be on the same CPU that received the IRQ and can unpin the thread and send EOI from that CPU. This could be complex/messy because you might need to worry about what happens when the thread wasn't blocked and is currently running on the "wrong" CPU.
Allow the thread to run on any CPU; and when the thread tells kernel it finished handling the IRQ the kernel would check which CPU the IRQ was received from and either send the EOI (if the IRQ was received by the same CPU) or ask a different CPU to send the EOI.

For "allow the thread to run on any CPU"; note that when any thread is unblocked the scheduler should compare its priority to the currently running threads' priorities and do a task switch immediately if the unblocked thread has "sufficiently higher" priority; and that an IRQ handling thread should be extremely high priority and is therefore extremely likely to be "sufficiently higher" priority than currently running threads. This ensures that IRQ latency isn't extremely bad (e.g. ensures that the device driver's IRQ handling thread doesn't wait for 3 days before getting CPU time). Depending on how the scheduler is designed, it's also likely that the code that unblocks a thread will prefer to do a task switch immediately (if necessary) on the same CPU as itself to avoid the overhead of having to tell some other CPU to do a task switch immediately. All of this combined would mean that for "allow the IRQ handling thread to run on any CPU" it's likely that the thread will be running on the correct CPU, and unlikely that the kernel will have to ask a different CPU to send the EOI.

Also note that a kernel can do "all of the above". More specifically; the kernel could always check if the CPU it's using is the right CPU to use to send EOI (and ask some other CPU to send EOI if it's not); and then kernel could also (temporarily or permanently) pin threads to specific CPUs when it feels like it.

Cheers,

Brendan

sleephacker · Post by **sleephacker** » Sun Apr 30, 2017 10:12 am

Just curious, what is the ICR set to on the first/'good' interrupt?

According to the 8254x manual interrupts are generated each time a bit in the ICR is set to 1b AND the interrupt is enabled. Any bits that were set at the time when the ICR is read are cleared. Also race conditions between software reading the ICR and hardware setting a bit are resolved by leaving the bit set (and I assume, although I haven't found a confirmation of this, the bit won't be set in the value of the ICR you read during the race condition but will instead cause another interrupt). This means that no interrupt cause bits can ever be lost, because you either read-and-clear them, or possibly-don't-read-and-certainly-don't-clear them, but you can't clear-but-don't-read them. In the code on your github the IMS is set to enable all interrupts in MODULE_INIT(), so there is the possibility of an interrupt being generated that you don't normally check for. Given that no bits can be missed, the number of interrupt causing events should equal the number of set bits read (to be 100% correct, each bit is multiplied by how many times the corresponding event occurred, e.g. if you receive two packets before you read the ICR the bit would only be set once but you would still read two packets in the receive buffer). So to receive two interrupts, either two bits must be set in the ICR, or one bit must be 'set twice' (the event occurred twice). I'm not sure if it's possible to still receive two interrupts if you already acknowledged the second one to the NIC before you received it, but that is the only reasonable explanation I can come up with, other than the emulator being bugged. The event causing the second interrupt can't have happened after the first time you read the ICR, because bits are only cleared by software, and interrupt causing events will always set their bit if they are enabled, so zero bits set means zero events occurred since last read. Another explanation could be that your interrupt handling code is wrong and this has nothing to do with the NIC itself, but that wouldn't explain why this doesn't happen with every interrupt or at least some interrupts from every (PCI) device.

If I understand it correctly, the 'nonsense' interrupts only arrive immediately after/when a packet is received (indicated by bit 7 in the ICR), and the only things that are likely to generate an interrupt as a (indirect) result of receiving a packet are a Small Receive Packet Detect (which as far as I can see you haven't enabled, in fact you haven't touched the RSRPD at all and these interrupts are disabled by default), reaching the Receive Descriptor Minimum Threshold (you've set it to 00b in the RCTL, meaning it is only generated each time the number of free receive descriptors is exactly half the total number of descriptors), or a Receiver FIFO Overrun which means there are either no descriptors available or the PCI bus was too slow resulting in the packet being dropped (in which case bit 7 would not be set because no new packet was written to memory, so the first interrupt in this case wouldn't have been a 'good' one either). The first one is impossible unless the emulated hardware has a different default state (which would be incorrect), or if something else has messed with the NIC. The other two, specifically the last one, are very unlikely considering you mentioned in another thread that your OS works fine on other emulators and on real hardware.

Also note that bit 7 in ICR is only set and the interrupt only generated each time a new packet is stored in memory, so the sequence: "IRQ -> read ICR -> packet still not handled -> IRQ -> packet was handled and ICR was quickly set to zero" is not possible.

8254x Family of Gigabit Ethernet Controllers Software Developer’s Manual, 13.4.18, ITR wrote:Software can use this register to pace (or even out) the delivery of interrupts to the host CPU. This register provides a guaranteed inter-interrupt delay between interrupts asserted by the Ethernet controller, regardless of network traffic conditions.

If this feature is supported by virtualbox, you could set the delay to the highest value possible to determine wether the second interrupt actually came from the NIC (in which case there would be a significant delay of about 1/60 seconds) or from something else (in which case there would be no delay at all). Additionally you could use the ITR to limit the interrupt rate if you are really worried about IRQ floods or IRQs arriving before the previous one has been handled. Also note that this register makes loops like this unnecesary/inefficient:

Brendan wrote:If you sent EOI at the right time (after your thread has told the kernel it has finished handling the cause of the IRQ) it'd be impossible for a second/unnecessary IRQ to occur before your thread starts.

Note: For gigabit Ethernet (where something else might cause an IRQ while it's handling an IRQ) I'd be tempted to do something like:
Code: Select all
    do {
        wait_or_whatever();   // block this thread (until something unblocks this thread)
        if(IRQ_occured) {     // if the thread was unblocked because an IRQ occured
            do {
                handle_cause(lastICR);
                lastICR = ICR;
            } while(lastICR != 0);
            tell_kernel_finished(status);  // Tell kernel I finished handling the cause of the IRQ

        } else if {
            // Handle other things that could've unblocked the thread
        }
    } while(running);       // Go back to waiting (unless driver was terminated somewhere)

Instead of spending valuable CPU time on making sure you handle interrupts arriving at a fast rate in the minimum number of IRQs/EOIs you could just use the ITR to make sure interrupts don't arrive at such a fast rate, but are combined instead. On top of that, doing all this work before sending an EOI throws away the main advantage of the 8254x interrupt mechanism. The whole thing is designed such that the minimum required work (in terms of communicating with the device) to handle an IRQ is as low as possible: a single read from the ICR, that's it. They specifically made the ICR clear-on-read and contain all interrupt conditions so that you wouldn't have to do any more reads or writes to/from MMIO within an IRQ handler / before sending EOI. All other work could (but doesn't have to) be done on a separate thread, allowing you to control the priority of the thread without fear of not sending EOIs in time.

EDIT:

sleephacker wrote:The device has a maximum interrupt rate, making two interrupts immediately after each other very unlikely:
8254x Family of Gigabit Ethernet Controllers Software Developer’s Manual, 13.4.18, ITR wrote:Software can use this register to pace (or even out) the delivery of interrupts to the host CPU. This register provides a guaranteed inter-interrupt delay between interrupts asserted by the Ethernet controller, regardless of network traffic conditions. [...] The maximum observable interrupt rate from the Ethernet controller must never exceed 7813 interrupts/sec.
I don't know if that last sentence is taken into consideration by the emulator, but at least on real hardware the NIC would never send an interrupt immediately after another, and given the many millions of instructions CPUs can execute per second, you would probably be way past your IRET before the next interrupt is sent.

Actually, I misinterpreted that, there is no maximum interrupt rate if you haven't enabled the ITR, so the NIC could send one interrupt immediately after another. The rate of 7813 ints/sec was from an example setting, but because they didn't say: "in this case / with this setting, the interrupt rate must never exceed...", I thought it applied at all times.

mariuszp · Post by **mariuszp** » Mon May 01, 2017 9:40 am

On the interrupt that I get before the "ICR=0" interrupt, only bit 7 is set in the ICR (the value is 0x80). (That is the receive interrupt)

I am certain the interrupt comes from the NIC (it ALWAYS comes right after enabling interupts, and is the same IRQ number as the NIC, and I checked, the NIC is the only PCI device to use that interrupt line).

Also note, that simply ignoring this interrupt allows the OS to continue working (although I am not sure if this is the correct or safe way to handle this... I'd hope that I can find a reason this interrupt is needlessly triggered and fix that instead).

OSDev.org

Receiving unnecessary interrupt from Intel 8254x

Re: Receiving unnecessary interrupt from Intel 8254x

Re: Receiving unnecessary interrupt from Intel 8254x

Re: Receiving unnecessary interrupt from Intel 8254x