PCI devices and irq flooding

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
MollenOS
Member
Member
Posts: 202
Joined: Wed Oct 26, 2011 12:00 pm

PCI devices and irq flooding

Post by MollenOS »

Hey everyone,

I'm having the issue on the real hardware only, can't reproduce this problem in emulators and that is as soon as I unmask IRQ's belonging to INTA, INTB, INTC and INTD pins, I get flooded with IRQ's and my kernel basicly locks up (everything takes forever to process because it gets interrupted all the time).

At first I thought; hey I don't disable irqs in the PCI Control, so I went ahead and made sure I disabled pci-devices ability to generate interrupts by writing 0x400 to pci command register while doing my device enumeration, and then only unmasking devices for which I have drivers. However this made no difference, as it seems they don't respond to the disabling. So now I'm at loss, I use ACPICA for device enumeration (and pci routing), so I was looking into the ACPI method _DIS (disable), however I see no way to enable them again?

Anyone have any thoughts?
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: PCI devices and irq flooding

Post by Combuster »

PCI interrupts are level-triggered by default, so if you don't silence the device causing it, they will retrigger the moment you acknowledge the interrupt. PCI 2.3 has an option to force disable interrupts, in other cases you can try to disable the device or alter the interrupt line to see if you can identify the device in question.
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
MollenOS
Member
Member
Posts: 202
Joined: Wed Oct 26, 2011 12:00 pm

Re: PCI devices and irq flooding

Post by MollenOS »

Yes exactly, you are correct, however I do force disable interrupts by writing 0x400 to the command register (bit 10 is interrupt disable), and they still seem to occur, I should mention im using the I/O apic and thus the interrupt_line should not be relevant, right?

My problem is I don't know which device is producing the interrupts
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: PCI devices and irq flooding

Post by Combuster »

That bit was only added in PCI 2.3 and isn't universally supported. You can change a device around INTA-INTD and see if the interrupt number changes and isolate it that way.
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
MollenOS
Member
Member
Posts: 202
Joined: Wed Oct 26, 2011 12:00 pm

Re: PCI devices and irq flooding

Post by MollenOS »

I'll try to isolate the interrupt then when I get home, but I don't think it's because PCI 2.3 is not supported, because I accidentally disabled pci-interrupts for the VGA device through BIT 10 on the PCI bus and my drawing routine stopped working :p
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: PCI devices and irq flooding

Post by Brendan »

Hi,
MollenOS wrote:Yes exactly, you are correct, however I do force disable interrupts by writing 0x400 to the command register (bit 10 is interrupt disable), and they still seem to occur, I should mention im using the I/O apic and thus the interrupt_line should not be relevant, right?
The "interrupt line" field in PCI configuration space is irrelevant when using IO APICs (it's only relevant when using PIC chips).

If you've disabled the ability to generate IRQs in the devices themselves and still get an IRQ flood; then maybe you've got them configured as "level triggered active high" instead of "level triggered active low" in the IO APIC (which would cause the IO APIC to think there's an IRQ whenever there isn't one).

For disabling devices; I'd recommend that early during boot you:
  • Mask every "PIC IRQ" and disable every "IO APIC input" when first configuring the PIC and IO APIC/s; and install the "spurious IRQ" handlers (2 for PIC and one for each IO APIC)
  • Write 0x00000000 to every device's Device Control register. This should disable the devices completely, regardless of whether they're PCI 2.3 (with the "interrupt disable" flag) or not.
Later during boot, when installing/initialising the device's device driver and not before; you'd re-enable the device (in its Device Control register) and enable the device's IRQ/s in the IO APIC if necessary (e.g. if it needs one and/or isn't using MSI instead, and if the IO APIC input wasn't already enabled in the IO APIC due to a different device driver and PCI IRQ sharing). Of course this also means that if there's no device drivers then IRQs are impossible, and if there's only one device driver (for one device) it's obvious which device is causing problems if there's IRQ flooding.

Finally, I'd be tempted to put sanity checks in place such that, in case of hardware failures (and/or driver bugs), the OS will detect IRQ flooding and (if/when detected) forcibly disable the effected IO APIC input's IRQ, disable effected device/s (write 0x00000000 to the devices' Device Control register), unload/terminate any effected device's driver, mark the device/s as "potentially faulty" (in whatever the OS uses to track the state of each PCI device) and inform the user. To detect flooding, you'd probably need to track the amount of time between sending the EOI and receiving the next IRQ; and either increment a "back to back IRQs" counter (if the time was below a threshold) or zero the "back to back IRQs" counter (if the time was greater than the threshold). Then, if the "back to back IRQs" counter exceeds a max. value (e.g. 1000 "back to back IRQs" in a row) you've detected an IRQ flood.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
MollenOS
Member
Member
Posts: 202
Joined: Wed Oct 26, 2011 12:00 pm

Re: PCI devices and irq flooding

Post by MollenOS »

Thank you really much Brendan for that detailed post, you gave me quite a few pointers and I'll go make sure I follow them. I will try to write 0x00000000 to all device registers during enumeration (except video device & bridges?), and make sure that my pci interrupts are not installed as level triggered active high. I'll give a follow up post later as I'm at work. I have masked PIC and IO apic's and installed spurious, so hopefully that would not be the problem.
MollenOS
Member
Member
Posts: 202
Joined: Wed Oct 26, 2011 12:00 pm

Re: PCI devices and irq flooding

Post by MollenOS »

Yes, you were correct Brendan, I had installed them as Level triggered active high instead of level triggered active low. That caused the issue, my driver now works perfectly :-)
Post Reply