Page 1 of 1

Extraneous IRQ

Posted: Mon Mar 01, 2010 3:54 pm
by FlashBurn
I donĀ“t know if it would be better to spawn a new thread.

What would you do if you get an irq, but no driver says it was my device? I mean should I mask the irq so that the source of this irq could not slow down the system or should I just do nothing? The problem with masking would be that all other devices which use the same irq would be not usable anymore.

So how is this solved when you do not have a driver for a device which is sending irqs on and on?

Re: synchronous ipc

Posted: Tue Mar 02, 2010 10:13 am
by Combuster
FlashBurn wrote:So how is this solved when you do not have a driver for a device which is sending irqs on and on?
While this is a whole different subject, you can tell which devices are connected to an interrupt based on the PCI configuration. You can dedicate an interrupt line for unknown devices, and then assign the ones you do know to one of the other IRQs, although it shouldn't be necessary as devices don't normally send interrupts until you configure them.

Even so, if you find an IRQ to have no origin, you can determine which subset of physical devices are responsible and by elimination determine the offending device (pci is normally level triggered, so an unhandled IRQ will keep coming back), and restart the driver or lock down the device where it won't bother you.

Re: Extraneous IRQ

Posted: Tue Mar 02, 2010 11:43 am
by Owen
Of course, a misbehaved device can hold a PCI interrupt line low and you can't do anything to stop it, and, really, is it worth your time to prevent a problem which is pretty much non-existent?

Re: Extraneous IRQ

Posted: Wed Mar 03, 2010 2:56 am
by FlashBurn
So I can ignore this problem or move all devices for which I have no driver into one irq which I can then mask.

Another question is, what is better, wait till all drivers looked if it was their device and then send the eoi or send the eoi when I reach my irq handler?

Re: Extraneous IRQ

Posted: Wed Mar 03, 2010 6:59 am
by Combuster
Owen wrote:Of course, a misbehaved device can hold a PCI interrupt line low and you can't do anything to stop it, and, really, is it worth your time to prevent a problem which is pretty much non-existent?
Obviously, a device is broken if it shortcircuits the interrupt line off. A device that tells it wants to interrupt the host is however not broken by definition, but it does cause the mentioned problem.

Re: Extraneous IRQ

Posted: Wed Mar 03, 2010 10:14 am
by Owen
Combuster wrote:
Owen wrote:Of course, a misbehaved device can hold a PCI interrupt line low and you can't do anything to stop it, and, really, is it worth your time to prevent a problem which is pretty much non-existent?
Obviously, a device is broken if it shortcircuits the interrupt line off. A device that tells it wants to interrupt the host is however not broken by definition, but it does cause the mentioned problem.
Who said anything about short circuiting? A PCI device requests an interrupt from the processor by holding one of the interrupt lines low. The motherboard has a pull-up resistor in order to hold it high when no interrupt is being requested. A device holding the line low is just requesting an interrupt.

Of course, it's still broken if it does it when the host hasn't configured it.

Re: Extraneous IRQ

Posted: Wed Mar 03, 2010 10:54 am
by Combuster
Oh well, must have misread somewhere when I thought that PCI interrupts were level triggered, active high... My bad. #-o

Anyway, care saves you from the following cases (in order of probability):
  • Broken or crashed drivers that can't turn off the interrupt signal
  • Devices configured by the firmware giving off an interrupt later, or have the pending interrupt line masked off
  • Devices raising an interrupt without intervention
  • Faulty hardware
So even if we wipe out the points covered by Owen's and my previous arguments, we're still left with the most probable cause still in place... :wink:

Re: Extraneous IRQ

Posted: Wed Mar 03, 2010 11:24 am
by Brendan
Hi,

Did anyone notice that there's an "interrupt disable" flag in the "device control register" (in the device's PCI configuration space)?

My suggestion is, when enumerating devices disable all of them that you can (including setting the "interrupt disable" flag and clearing the bits that control the device's ability to respond to I/O space accesses, respond to memory space accesses and act as a bus master). Then, when you start device drivers you re-enable these things for each device.

You could also disable the device if the device driver crashes or causes IRQ floods or is "unloaded" for any other reason.

In this case, if you get an IRQ but none of the device drivers claims it, then I'd be tempted to ignore the IRQ (maybe the IRQ disappeared for some reason). If the IRQ didn't disappear for some reason then ignoring the IRQ would cause an IRQ flood, and your "IRQ flood detection" could notice and kill the device drivers using that IRQ, one by one (while disabling the devices) until the IRQ flood stops (until you know which device driver was borked). Then you'd be able to restart the "innocent" device drivers again.

Also note that my normal "device detection" advice involves manually probing for old ISA devices (if necessary) after all PCI devices have been disabled (but before any of them have been enabled again), to minimise the chance of conflicts/problems during the manual probing.


Cheers,

Brendan

Re: Extraneous IRQ

Posted: Wed Mar 03, 2010 11:50 am
by Combuster
Brendan wrote:Did anyone notice that there's an "interrupt disable" flag in the "device control register" (in the device's PCI configuration space)?
No I didn't, and in my copy (v2.2), it isn't there - neither the flag nor the register, which makes that approach unusable if the hardware isn't compatible.

Re: Extraneous IRQ

Posted: Wed Mar 03, 2010 1:27 pm
by Owen
Combuster wrote:Oh well, must have misread somewhere when I thought that PCI interrupts were level triggered, active high... My bad. #-o

Anyway, care saves you from the following cases (in order of probability):
  • Broken or crashed drivers that can't turn off the interrupt signal
  • Devices configured by the firmware giving off an interrupt later, or have the pending interrupt line masked off
  • Devices raising an interrupt without intervention
  • Faulty hardware
So even if we wipe out the points covered by Owen's and my previous arguments, we're still left with the most probable cause still in place... :wink:
If you look at the pinouts, they tend to be called "INTA#", etc. A signal name postfixed by a hash or a lower case n, prefixed with a /, or with an overbar is probably active low. As you can tell, there are lots of different conventions for this!

As for why active low: In general, an output transistor has historically been able to pull low stronger than it can high

Re: Extraneous IRQ

Posted: Thu Mar 04, 2010 12:52 am
by Brendan
Hi,
Combuster wrote:
Brendan wrote:Did anyone notice that there's an "interrupt disable" flag in the "device control register" (in the device's PCI configuration space)?
No I didn't, and in my copy (v2.2), it isn't there - neither the flag nor the register, which makes that approach unusable if the hardware isn't compatible.
You're right - the "interrupt disable" flag didn't exist in the PCI Local Bus Specification Revision 2.2 (December 18, 1998), or in any older version of the specification. It does exist in PCI Local Bus Specification Revision 2.3 (March 29, 2002) and all newer versions of the specification (as far as I can tell).

This would imply that:
  • For PCI 2.3 or later you can disable the device's ability to generate IRQs.
  • For PCI 2.2, if the device supports MSI and you're using I/O APIC you can enable MSI and configure it to generate a very low priority interrupt (which disables the device's ability to generate IRQs using it's "INTx# pin") and then never send an EOI for the MSI (which prevents the device's MSI from generating more than one interrupt).
  • For PCI 2.2 devices that don't support MSI (or devices that do support MSI when an I/O APIC isn't being used), and for older devices (which don't support MSI or the interrupt disable), you'd need to mask the IRQ line at the PIC or I/O APIC and kill all devices that share that IRQ line.
Of course it's possible to have a mixture of (old, newer and new) PCI cards sharing the same interrupt line. This complicates things a little more. If you assume that the OS uses MSI to avoid IRQ sharing whenever possible, then you can disable (non-MSI) IRQs on newer devices first to see if they're causing the IRQ flood, and only mask the IRQ in the PIC or I/O APIC if it wasn't a newer device causing the IRQ flood.

Also, it should be possible to have a "device manager" that handles all of this (without needing each device driver to support it) - the device driver/s would only need a more generic "unload the device driver" feature. This is important because dodgy device drivers are probably the most common reason for IRQ floods.


Cheers,

Brendan