Page 1 of 2
Intel PRO/1000 spams interrupts in VirtualBox
Posted: Tue Apr 18, 2017 3:22 am
by mariuszp
Hello.
My OS works everywhere (VBox, Bochs, real hardware...) except when my friend boots it in VBox with the Intel PRO/1000 card enabled. Disabling the card allows it to work. Im not sure if the problem is with my interrupt code, my driver or with VBOX itself.
My interrupt code does the following for PCI:
1.when it is triggered, interrupts get disabled (do they have to stay disabled?)
2. EOI is sent while tge interrupts are disabled
3. When the interrupt handler thread(s) for that device are woken up, tgere is a bug where interrupts are unnecessarily enabled. I know i need to fix it, but:
As soon as interrupts are enabled, it immediately triggers ANOTHER PCI interrupt! And an endless loop of interrupts occurs on that stack as shown in the stack trace. When the NIC is disabled, this does not happen.
Should interrupts be disabled while the waking occurs? Is it normal for such an amount of interrupts to arrive from the NIC? Did anyone else have a similar problem? Thank you.
Re: Intel PRO/1000 spams interrupts in VirtualBox
Posted: Tue Apr 18, 2017 4:48 am
by Korona
I don't know how Intel Pro/1000 cards work so here is some generic advice:
- PCI interrupts are level-triggered. That means that the IRQ is not triggered when there is a low-high (or high-low) transition on the interrupt line. Instead the IRQ is repeatedly triggered while the line is high. If your interrupt handler does not immediately deassert the IRQ you have to mask the interrupt (before sending EOI) in the PIC or I/O APIC. Note that this has nothing to do with disabling interrupt acceptance via the IF flag in RFLAGS (aka CLI/STI). The reason why you want IF cleared in your interrupt handler is that this prevents stack overflows/corruption due to nested IRQs. CLI at the start of the handler is not sufficient to achieve that; you need to use an interrupt gate (not a trap gate) in the IDT to prevent races. Similarly there is no explicit STI necessary as interrupts should be reenabled by IRET.
- Virtually all devices have some "interrupt status" register containing a bit that needs to be cleared to deassert the interrupt line. Your driver should check this bit to determine if the IRQ originated from its device (and ignore the IRQ if it did not!) and clear it. After that IRQs can be unmasked again. This enables IRQ sharing and prevents devices from generating excessive IRQs. Note that most devices use a "write-one-to-clear" mode for this register to prevent race conditions.
- Legacy devices that share interrupt lines with PCI devices can be a PITA. For example in compatibility mode ATA controllers will assert ISA IRQ 14 or 15. Unfortunately ATA controllers also enable interrupts by default (which is fundamentally broken for any device as it renders the IRQ line unusable if you don't have a driver); if you don't clear those IRQs, PCI IRQs on lines 14 and 15 won't work correctly (as they will always be asserted).
Re: Intel PRO/1000 spams interrupts in VirtualBox
Posted: Tue Apr 18, 2017 7:23 am
by sleephacker
I assume all pro/1000 are 8254x compatible (I know at least some are).
Did you read the Interrupt Cause Read register?
Interrupts are only acknoledged to the card if you read the ICR.
Re: Intel PRO/1000 spams interrupts in VirtualBox
Posted: Tue Apr 18, 2017 11:15 am
by mariuszp
My interrupt handling works by waking up a handler thread, and it is that thread that reads the ICR. I thoight this was how it is normally solved.
Does that mean i have to mask that IRQ until the driver reports that it acknowledged the IRQ?
Note that it works everywhere else, even in VirtuakBox on my PC. The problem only occured in that one instance (which i suppose is possible if there is a race condition).
The ICR was never read because the handler thread had no time to run, since the IRQs have overwhelmed the system.
Re: Intel PRO/1000 spams interrupts in VirtualBox
Posted: Tue Apr 18, 2017 11:19 am
by Korona
mariuszp wrote:Does that mean i have to mask that IRQ until the driver reports that it acknowledged the IRQ?
Yes. This is true for all level-triggered (including PCI) IRQs. If it works on some emulators/hardware those emulators are inaccurate/buggy. It might only work because the CPU is faster than the PIC on real hardware and thus probably able to execute instructions between EOI and receiving the next IRQ from the PIC.
EDIT: But note that if you get a stack overflow due to nested IRQs that is a problem in your code (i.e. you're not using an interrupt gate when you should be doing that).
Re: Intel PRO/1000 spams interrupts in VirtualBox
Posted: Tue Apr 18, 2017 11:30 am
by SpyderTL
mariuszp wrote:Note that it works everywhere else, even in VirtuakBox on my PC. The problem only occured in that one instance (which i suppose is possible if there is a race condition).
Are you saying that you have your VirtualBox virtual machine configured exactly the same, and it works on your machine, and not on your friend's machine?
I would take a little time to verify that this is the case, but if so, then it sounds like a bug in VirtualBox, or at least two different versions of VirtualBox.
If you can reproduce the issue on your machine using the exact same settings, then the problem is probably somewhere in your code.
Are you using the PIC or the APIC? You can acknowledge the IRQ with the PIC/APIC, but not touch the network card registers until your interrupt handler gets a chance to run. This should solve your problem, but if not, you'll either need to have your interrupt handler call your network card interrupt handler directly, or you'll have to have your network card interrupt handler acknowledge the IRQ with the PIC/APIC.
I've seem similar behavior from VirtualBox, and I just chalked it up to the fact that it is less of a "simulator" and more of a "emulator", meaning that it is not designed to act exactly like a physical machine; it is designed to run Windows as quickly as possible. As long as Windows works, that's all that really matters.
Let us know what you find out.
Re: Intel PRO/1000 spams interrupts in VirtualBox
Posted: Tue Apr 18, 2017 11:51 am
by Korona
SpyderTL wrote:Are you using the PIC or the APIC? You can acknowledge the IRQ with the PIC/APIC, but not touch the network card registers until your interrupt handler gets a chance to run. This should solve your problem, but if not, you'll either need to have your interrupt handler call your network card interrupt handler directly, or you'll have to have your network card interrupt handler acknowledge the IRQ with the PIC/APIC.
Sadly, that is not true. It does not matter if you're using the PIC or I/O APIC. There are basically two ways to handle level-triggered IRQs (e.g. all PCI IRQs and also things like SCI, at least on my hardware) sanely:
- Clear hardware (i.e. read/write a device register) -> EOI (has to be done after clearing)
- Mask -> EOI (has to be done after masking) -> Switch to handler thread -> Clear hardware -> Unmask
The following is broken:
- EOI -> Clear hardware. The IRQ might get resent before you clear the hardware register (and starve the CPU).
- Switch to handler thread -> Clear hardware -> EOI. This will block all other IRQs until the handler thread gets a chance to run.
All this logic is independent of RFLAGS.IF. But if you're getting a stack overflow then your IF handling is buggy.
Re: Intel PRO/1000 spams interrupts in VirtualBox
Posted: Tue Apr 18, 2017 12:18 pm
by SpyderTL
Korona wrote:Clear hardware (i.e. read/write a device register) -> EOI (has to be done after clearing)
I thought that this was the problem. As soon as you tell the hardware that the interrupt has been handled, it can immediately trigger another interrupt. So now the PIC thinks that you are still handling interrupt 1, and the hardware thinks that interrupt 1 is handled, and it is waiting for interrupt 2 to be handled.
Maybe the PIC is smart enough to handle this situation, but I thought that the whole problem was that the network card interrupt flag was being cleared too early.
With all that said, you are probably right, as I just have all of my interrupts/IRQs incrementing counters, and I have all of my software/drivers polling those counters for changes. So I'm probably not the best person to ask.
Re: Intel PRO/1000 spams interrupts in VirtualBox
Posted: Tue Apr 18, 2017 12:40 pm
by Korona
SpyderTL wrote:I thought that this was the problem. As soon as you tell the hardware that the interrupt has been handled, it can immediately trigger another interrupt. So now the PIC thinks that you are still handling interrupt 1, and the hardware thinks that interrupt 1 is handled, and it is waiting for interrupt 2 to be handled.
Maybe the PIC is smart enough to handle this situation, but I thought that the whole problem was that the network card interrupt flag was being cleared too early.
The PIC is pretty dumb
but there is not much it has to do here. What's happening in the situation that you described (i.e. a level-triggered active-low interrupt) is the following sequence:
- The Device pulls its interrupt line low (for some electrical reason that I'm oblivious of PCI uses low voltage to signal active IRQs). For conventional PCI this is a real wire on the expansion slot (and the reason that we need level-triggered and not edge-triggered IRQs is that if two devices concurrently pulled the same wire high we would still only get one edge and lose one of the IRQs); for PCIe this wire is emulated by the protocol layer. The devices does not (and cannot) wait until the IRQ is handled.
- The PIC (or I/O APIC) is connected to this wire. It detects the "wire is low" condition and sends an IRQ to the CPU (or more correctly the local APIC). It remembers that the IRQ is pending and won't issue another IRQ until you send EOI (for the I/O APIC this "there is a pending IRQ"-bit is actually split between the local and I/O APIC; the I/O APIC has a "i sent an IRQ to the CPU"-bit and the local APIC has a "i received a level-triggered IRQ"-bit).
- The CPU waits until RFLAGS.IF is set and enters the IRQ handler that is specified in the IDT.
- You send EOI to the PIC. The PIC clears it's "there is a pending IRQ"-bit and will thus resume sending IRQs to the CPU. If you did not clear the IRQ at the device level the device's IRQ line is still low. The PIC continues at step 2.
Re: Intel PRO/1000 spams interrupts in VirtualBox
Posted: Tue Apr 18, 2017 1:47 pm
by mariuszp
SpyderTL wrote:mariuszp wrote:Note that it works everywhere else, even in VirtuakBox on my PC. The problem only occured in that one instance (which i suppose is possible if there is a race condition).
Are you saying that you have your VirtualBox virtual machine configured exactly the same, and it works on your machine, and not on your friend's machine?
I would take a little time to verify that this is the case, but if so, then it sounds like a bug in VirtualBox, or at least two different versions of VirtualBox.
If you can reproduce the issue on your machine using the exact same settings, then the problem is probably somewhere in your code.
Are you using the PIC or the APIC? You can acknowledge the IRQ with the PIC/APIC, but not touch the network card registers until your interrupt handler gets a chance to run. This should solve your problem, but if not, you'll either need to have your interrupt handler call your network card interrupt handler directly, or you'll have to have your network card interrupt handler acknowledge the IRQ with the PIC/APIC.
I've seem similar behavior from VirtualBox, and I just chalked it up to the fact that it is less of a "simulator" and more of a "emulator", meaning that it is not designed to act exactly like a physical machine; it is designed to run Windows as quickly as possible. As long as Windows works, that's all that really matters.
Let us know what you find out.
I'm using the APIC. And that is what I do; I send the EOI before touching any hardware registers at all; i just wake up a thread created by the driver (I believe Linux does it the same way, so technically that shoudl work).
I am certain that the configuration is identical for both VirtualBox VMs.
Re: Intel PRO/1000 spams interrupts in VirtualBox
Posted: Tue Apr 18, 2017 1:57 pm
by Korona
mariuszp wrote:I'm using the APIC. And that is what I do; I send the EOI before touching any hardware registers at all; i just wake up a thread created by the driver (I believe Linux does it the same way, so technically that shoudl work).
No. Linux usually touches the hardware registers inside the IRQ handler (and then usually delegates to a thread that handles the IRQ). However it also implements the mask -> EOI -> ack -> unmask sequence for interrupts where that does not suffice. The relevant code is handle_level_irq() in
kernel/irq/chip.c.
If you're using the I/O APIC: You have to additionally ensure that you program the I/O APIC to detect level-triggered active-low IRQs.
Re: Intel PRO/1000 spams interrupts in VirtualBox
Posted: Tue Apr 18, 2017 2:13 pm
by mariuszp
Korona wrote:mariuszp wrote:I'm using the APIC. And that is what I do; I send the EOI before touching any hardware registers at all; i just wake up a thread created by the driver (I believe Linux does it the same way, so technically that shoudl work).
No. Linux usually touches the hardware registers inside the IRQ handler (and then usually delegates to a thread that handles the IRQ). However it also implements the mask -> EOI -> ack -> unmask sequence for interrupts where that does not suffice. The relevant code is handle_level_irq() in
kernel/irq/chip.c.
If you're using the I/O APIC: You have to additionally ensure that you program the I/O APIC to detect level-triggered active-low IRQs.
I set the level-triggered/active-low flags based on the ACPI tables.
Would this work:
1. if an IRQ handler thread exists, it is woken up. if not, an EOI is sent and the interrupt ignored.
2. the IRQ handler thread is responsible for sending the EOI.
Also, there may be multiple PCI devices connected to the same IRQ. Does that mean a driver should check whether its device has triggered the interrupt, and only send the EOI if so? Isn't there, therefore, a race condition, where the device may trigger an interrupt on the same line after another device has issued its own, and then the driver for device A will send the EOI even though it was actually device B sending the interrupt?
Re: Intel PRO/1000 spams interrupts in VirtualBox
Posted: Tue Apr 18, 2017 2:28 pm
by Korona
mariuszp wrote:Would this work:
1. if an IRQ handler thread exists, it is woken up. if not, an EOI is sent and the interrupt ignored.
2. the IRQ handler thread is responsible for sending the EOI.
Yes, that works. But per-IRQ (or per-device) masking is much better than sending EOI after the driver thread ran. If you mask the IRQ you can still receive IRQs from other devices. If you delay EOI then all IRQs are blocked. I'd really encourage you to mask the IRQ. Its not really hard to do; it's just a simple bit in the I/O APIC
.
mariuszp wrote:Also, there may be multiple PCI devices connected to the same IRQ. Does that mean a driver should check whether its device has triggered the interrupt, and only send the EOI if so? Isn't there, therefore, a race condition, where the device may trigger an interrupt on the same line after another device has issued its own, and then the driver for device A will send the EOI even though it was actually device B sending the interrupt?
Yes, that can happen. But it is not a problem because both IRQs are level-triggered. So device B is still pulling its IRQ line low (as you did not touch its hardware registers yet) and the PIC will issue another IRQ once you send EOI and unmask.
mariuszp wrote:I set the level-triggered/active-low flags based on the ACPI tables.
That is nice. But I have to warn you that there is some hardware that does not report PCI interrupts in the MADT. Here is a dump of my OS running on one of my PCs:
Code: Select all
thor: Dumping MADT
Local APIC id: 0
Local APIC id: 1
Local APIC id: 130 (disabled)
Local APIC id: 131 (disabled)
I/O APIC id: 2, sytem interrupt base: 0
Int override: ISA IRQ 0 is mapped to GSI 2 (Polarity: default, trigger mode: default)
Int override: ISA IRQ 9 is mapped to GSI 9 (Polarity: high, trigger mode: level)
thor: Configuring I/O APICs.
thor: I/O APIC supports 24 pins
[...]
thor: Configuring ISA IRQs.
thor: Configuring IRQ io-apic.2 to trigger mode: 1, polarity: 1
thor: Configuring IRQ io-apic.1 to trigger mode: 1, polarity: 1
thor: Configuring IRQ io-apic.4 to trigger mode: 1, polarity: 1
thor: Configuring IRQ io-apic.9 to trigger mode: 2, polarity: 1
thor: Configuring IRQ io-apic.12 to trigger mode: 1, polarity: 1
[...]
thor: Installing handler for ACPI IRQ 9
thor: Found PCI host bridge
Route for slot 1, pin 0: \_SB_.LNKA[0]
Route for slot 1, pin 1: \_SB_.LNKB[0]
Route for slot 1, pin 2: \_SB_.LNKC[0]
Route for slot 1, pin 3: \_SB_.LNKD[0]
Route for slot 6, pin 0: \_SB_.LNKA[0]
Route for slot 6, pin 1: \_SB_.LNKB[0]
Route for slot 6, pin 2: \_SB_.LNKC[0]
Route for slot 6, pin 3: \_SB_.LNKD[0]
Route for slot 31, pin 0: \_SB_.LNKC[0]
Route for slot 31, pin 1: \_SB_.LNKD[0]
Route for slot 29, pin 0: \_SB_.LNKH[0]
Route for slot 30, pin 0: \_SB_.LNKB[0]
Route for slot 30, pin 1: \_SB_.LNKE[0]
Route for slot 27, pin 0: \_SB_.LNKA[0]
Route for slot 28, pin 0: \_SB_.LNKA[0]
Route for slot 28, pin 1: \_SB_.LNKB[0]
Route for slot 28, pin 2: \_SB_.LNKC[0]
Route for slot 28, pin 3: \_SB_.LNKD[0]
Route for slot 2, pin 0: \_SB_.LNKA[0]
Route for slot 29, pin 1: \_SB_.LNKD[0]
Route for slot 29, pin 2: \_SB_.LNKC[0]
Route for slot 29, pin 3: \_SB_.LNKA[0]
Configurating link device \_SB_.LNKA.
Resource: Irq 10, trigger mode: 2, polarity: 2
Configurating link device \_SB_.LNKB.
Device is not enabled.
Configurating link device \_SB_.LNKC.
Resource: Irq 11, trigger mode: 2, polarity: 2
Configurating link device \_SB_.LNKD.
Resource: Irq 3, trigger mode: 2, polarity: 2
Configurating link device \_SB_.LNKH.
Resource: Irq 5, trigger mode: 2, polarity: 2
Configurating link device \_SB_.LNKE.
Device is not enabled.
thor: Configuring IRQ io-apic.10 to trigger mode: 2, polarity: 2
thor: Configuring IRQ io-apic.11 to trigger mode: 2, polarity: 2
thor: Configuring IRQ io-apic.3 to trigger mode: 2, polarity: 2
thor: Configuring IRQ io-apic.5 to trigger mode: 2, polarity: 2
That means that it is necessary to run AML code (the _CRS method of the LNK{A-H} devices) to get the correct IRQs. Hopefully your hardware is not as nasty as this one but it still makes sense to check if the MADT contains all PCI interrupts.
Re: Intel PRO/1000 spams interrupts in VirtualBox
Posted: Tue Apr 18, 2017 4:25 pm
by Brendan
HI,
Korona wrote:mariuszp wrote:I set the level-triggered/active-low flags based on the ACPI tables.
That is nice. But I have to warn you that there is some hardware that does not report PCI interrupts in the MADT.
The ACPI MADT/APIC table is only supposed to provide "interrupt overrides" for legacy/ISA IRQs, and isn't supposed to mention PCI IRQs at all.
mariuszp wrote:Would this work:
1. if an IRQ handler thread exists, it is woken up. if not, an EOI is sent and the interrupt ignored.
2. the IRQ handler thread is responsible for sending the EOI.
For level triggered IRQs it should be more like:
- If there are no IRQ handling threads, do a kernel panic to inform the user that there's been a failure (kernel unmasked an IRQ when there's no IRQ handler threads, IDT is corrupt, faulty hardware allowed a masked IRQ to be delivered to CPU, etc).
- Otherwise (there is at one or more IRQ handling threads):
- Notify all IRQ handling threads (for the IRQ) that the IRQ occured; and wait for all of them to return some kind of status indicating that they've checked/serviced their device that contains some kind of "my device did/didn't cause the IRQ" status flag.
- If all of the IRQ handling threads returned "my device didn't cause the IRQ"; then something is wrong. In this case kernel might mask the IRQ and terminate all the device drivers that were using it (to keep running), or do a kernel panic to inform user (and stop running).
- If one or more IRQ handling thread returned "my device did cause an IRQ"; then everything is fine and you can send the EOI.
Note that there are alternatives (like "mask then EOI early, then unmask at the end"). It is possible to do something more advanced, where kernel keeps track of how likely/unlikely it is that a device caused an IRQ and informs IRQ handling threads one at a time in order (to avoid the overhead of telling drivers for "unlikely to have caused IRQ" devices if some other device caused the IRQ). It's also possible to do this in groups (e.g. if there are N CPUs; then inform the first N IRQ handling threads, and if none of them say their device caused the IRQ inform the next N IRQ handling threads, etc).
Also note that there is a problem when 2 devices share an IRQ, and there is no device driver for the device that caused the IRQ. In this case you end up at "something is wrong, terminate drivers or kernel panic". To avoid this (and possibly for security reasons - malicious devices spying on you), disable all devices in PCI configuration space until/unless they do have a driver by writing 0xFFFF to the Command Register (at offset 0x04 in the device's PCI configuration space) to "logically disconnect the device from the PCI bus".
This means that starting a device driver looks something like this:
- Device driver initialises itself (not the device)
- Device driver tells kernel about its IRQ handling thread. Kernel unmasks the IRQ in the PIC or IO APIC if it's not already unmasked, or sets up MSI for the device.
- Device driver tells kernel it's ready. Kernel enables the device by restoring to the Command Register (at offset 0x04 in the device's PCI configuration space) to "logically re-connect the device to the PCI bus".
- Device driver begins to initialise the device
Stopping a device driver would be the reverse of this ("logically disconnect from bus", then mask IRQ if nothing else uses it).
WARNING: Doing the "logically disconnect all PCI devices from the bus" during boot needs to be done with extreme care. Specifically, you need to disable any legacy emulation (e.g. in USB controllers) before "logically disconnecting" to ensure that firmware's SMM doesn't cause problems; and you also need to make sure you don't "logically disconnect" something you're actually using (e.g. video cards that are being used for "raw framebuffer") or interfere with any bridges (including "PCI to LPC bridge").
Cheers,
Brendan
Re: Intel PRO/1000 spams interrupts in VirtualBox
Posted: Wed Apr 19, 2017 2:38 pm
by mariuszp
I've done as you said but there is a problem. I have a function called irqMask() which masks interrupts, and irqUnmask() which unmasks them:
Code: Select all
void irqMask(uint8_t intNo)
{
uint64_t flags = getFlagsRegister();
cli();
uint32_t volatile* regsel = (uint32_t volatile*) 0xFFFF808000002000;
uint32_t volatile* iowin = (uint32_t volatile*) 0xFFFF808000002010;
int i;
for (i=0; i<24; i++)
{
*regsel = (0x10+i*2);
__sync_synchronize();
uint64_t entry = (uint64_t) (*iowin);
__sync_synchronize();
*regsel = (0x10+i*2+1);
__sync_synchronize();
entry |= ((uint64_t)(*iowin) << 32);
uint32_t apicID = (entry >> 56) & 0xF;
uint8_t intvec = (entry & 0xFF);
if ((intvec == intNo) && (apicID == apic->id))
{
// that's the one; mask it!
*regsel = (0x10+i*2);
__sync_synchronize();
*iowin = ((*iowin) | (1 << 16)) & ~(1 << 12);
__sync_synchronize();
setFlagsRegister(flags);
return;
};
};
panic("irqMask(0x%02hhX) failed!", intNo);
};
void irqUnmask(uint8_t intNo)
{
uint64_t flags = getFlagsRegister();
cli();
uint32_t volatile* regsel = (uint32_t volatile*) 0xFFFF808000002000;
uint32_t volatile* iowin = (uint32_t volatile*) 0xFFFF808000002010;
int i;
for (i=0; i<24; i++)
{
*regsel = (0x10+i*2);
__sync_synchronize();
uint64_t entry = (uint64_t) (*iowin);
__sync_synchronize();
*regsel = (0x10+i*2+1);
__sync_synchronize();
entry |= ((uint64_t)(*iowin) << 32);
uint32_t apicID = (entry >> 56) & 0xF;
uint8_t intvec = (entry & 0xFF);
if ((intvec == intNo) && (apicID == apic->id))
{
// that's the one; mask it!
*regsel = (0x10+i*2);
__sync_synchronize();
*iowin &= ~((1 << 16) | (1 << 12));
__sync_synchronize();
setFlagsRegister(flags);
return;
};
};
panic("irqUnmask(0x%02hhX) failed!", intNo);
};
Now when I test it on his VirtualBox installation - ONLY THAT ONE - the OS is once again stuck, handling interrupts indefinitely as they arrive (it seems). If you send an NMI, it is ALWAYS on some instruction within irqMask() or irqUnmask(), which comes right after reading from (*iowin). Is it a problem with those 2 functions (as shown above) or coudl it still something else?