Too many IRQs from RTL8139
- NickJohnson
- Member
- Posts: 1249
- Joined: Tue Mar 24, 2009 8:11 pm
- Location: Sunnyvale, California
Too many IRQs from RTL8139
Hi,
I'm writing an RTL8139 NIC driver, and currently I'm able to transmit packets without any difficulty if I disable IRQs on transmit (i.e. don't set the TOK flag in the IMR) and instead poll the transmit status descriptor register. However, if I enable those IRQs and do a transmit operation, the system hangs (and not because it's waiting for an IRQ.) I've found that the RTL8139 is repeatedly (and very rapidly) sending IRQs after performing the transmit, and this is causing the problems. Is it possible to make the chip only send a single IRQ to notify the system that the transmit has completed?
I'm not directly handling these IRQs, and am instead putting them in a queue and waking up threads waiting for them. The latency between receiving an IRQ and handling it is quite high, which I'm okay with at the moment and can optimize later, but seems to be part of the problem. I'm hoping for a solution that does not require making my IRQ handling mechanism faster.
Also, this is not on real hardware. It's QEMU's emulated RTL8139 that I'm dealing with here.
I'm writing an RTL8139 NIC driver, and currently I'm able to transmit packets without any difficulty if I disable IRQs on transmit (i.e. don't set the TOK flag in the IMR) and instead poll the transmit status descriptor register. However, if I enable those IRQs and do a transmit operation, the system hangs (and not because it's waiting for an IRQ.) I've found that the RTL8139 is repeatedly (and very rapidly) sending IRQs after performing the transmit, and this is causing the problems. Is it possible to make the chip only send a single IRQ to notify the system that the transmit has completed?
I'm not directly handling these IRQs, and am instead putting them in a queue and waking up threads waiting for them. The latency between receiving an IRQ and handling it is quite high, which I'm okay with at the moment and can optimize later, but seems to be part of the problem. I'm hoping for a solution that does not require making my IRQ handling mechanism faster.
Also, this is not on real hardware. It's QEMU's emulated RTL8139 that I'm dealing with here.
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: Too many IRQs from RTL8139
Since you're probably dealing with a native PCI device, you should take care not to EOI the interrupt until you've actually dealt with it. If you just send the message to userland, the interrupt line will remain high and the next thing the processor does after completing the interrupt is restarting it - essentially because it thinks you dealt with one device and since the line hasn't gone down another device sharing that interrupt line wants attention too.
- NickJohnson
- Member
- Posts: 1249
- Joined: Tue Mar 24, 2009 8:11 pm
- Location: Sunnyvale, California
Re: Too many IRQs from RTL8139
Ah, okay. I was EOI-ing the IRQ immediately after storing it in the queue. I suppose I'm going to need to figure out a mechanism to perform EOIs from userspace properly.
Re: Too many IRQs from RTL8139
You can reprogram related I/O APIC pin from level-triggered to edge-triggered. But you may miss interrupts from another devices if this line is shared by several devices.
- NickJohnson
- Member
- Posts: 1249
- Joined: Tue Mar 24, 2009 8:11 pm
- Location: Sunnyvale, California
Re: Too many IRQs from RTL8139
So, in general, does sending an EOI to the PIC or masking an IRQ ever cause a change in the state of the interrupt line coming from the corresponding device driver? In addition, if I receive an IRQ, then mask it, then send an EOI, will that cause any issues?
I'm basically trying to make the IRQ system level-triggered with edge-triggered notifications (and with the falling edge determined by a driver-originated reset.) Sort of like this pseudocode:
I'm just trying to make sure that this sort of thing will work before I start ripping out chunks of event code.
Edit: also, I'm not currently using the APIC (just the old PIC) but I'd like to be able to implement something like this on any type of interrupt source.
I'm basically trying to make the IRQ system level-triggered with edge-triggered notifications (and with the falling edge determined by a driver-originated reset.) Sort of like this pseudocode:
Code: Select all
// driver (in userspace)
send request
wait_for_irq(irq)
do interrupty stuff
reset_irq(irq)
// called from userspace (but in kernelspace)
wait_for_irq(irq):
if fired[irq]:
return
else:
put self on wait queue for irq
reset_irq(irq):
fired[irq] = false
unmask irq
send EOI for irq
// in kernelspace
irq_handler(irq):
if some thread waiting for irq:
wake up thread on wait queue for irq
else:
fired[irq] = true
mask irq
send EOI for irq
Edit: also, I'm not currently using the APIC (just the old PIC) but I'd like to be able to implement something like this on any type of interrupt source.
- NickJohnson
- Member
- Posts: 1249
- Joined: Tue Mar 24, 2009 8:11 pm
- Location: Sunnyvale, California
Re: Too many IRQs from RTL8139
I hacked together an implementation of this system, and it appears to work (at least for interrupts from the keyboard and NIC.)
Re: Too many IRQs from RTL8139
No, sending and EOI doesn't affect the line between the device and the PIC. Masking the interrupt (in the PIC) doesn't affect it either.So, in general, does sending an EOI to the PIC or masking an IRQ ever cause a change in the state of the interrupt line coming from the corresponding device driver?
It depends. If the interrupt is level triggering and if you haven't serviced all the 'things' that caused the interrupt then the PIC will just think that the interrupt has happened again. Now of course you've masked the interrupt, but as soon as you unmask it you will receive the interrupt again.In addition, if I receive an IRQ, then mask it, then send an EOI, will that cause any issues?
My feeling is that there isn't a simple 100% reliable way to convert interrupts to events. If you try to solve the problem by allowing the userspace to send EOI then support for shared interrupts is difficult because individual tasks don't know when all the interrupts events have been handled. Perhaps userspace should have an INT_ACK() API and the kernel could track the events that are sent and when all the events for a particular interrupt are ACK'ed it could safely send the EOI.The PIC manual wrote:The interrupt request must be removed before the EOI command is issued or the CPU interrupts is enabled to prevent a second interrupt from occurring
In the end I relented in my microkernel and allowed my userspace programs to install ISRs.
If a trainstation is where trains stop, what is a workstation ?
- NickJohnson
- Member
- Posts: 1249
- Joined: Tue Mar 24, 2009 8:11 pm
- Location: Sunnyvale, California
Re: Too many IRQs from RTL8139
That's the behavior I wanted/expected.gerryg400 wrote:It depends. If the interrupt is level triggering and if you haven't serviced all the 'things' that caused the interrupt then the PIC will just think that the interrupt has happened again. Now of course you've masked the interrupt, but as soon as you unmask it you will receive the interrupt again.In addition, if I receive an IRQ, then mask it, then send an EOI, will that cause any issues?
Under the assumption that my new code is actually working the way I think it is, I think my mechanism is a reasonable solution to that problem. It also does not hang if an IRQ is waited on multiple times, and does not waste time on IRQs that are not being used.gerryg400 wrote:My feeling is that there isn't a simple 100% reliable way to convert interrupts to events. If you try to solve the problem by allowing the userspace to send EOI then support for shared interrupts is difficult because individual tasks don't know when all the interrupts events have been handled. Perhaps userspace should have an INT_ACK() API and the kernel could track the events that are sent and when all the events for a particular interrupt are ACK'ed it could safely send the EOI.
Re: Too many IRQs from RTL8139
If another interrupt of higher priority comes along while your user-space is servicing the current interrupt, then the EOI that it issues will clear the new interrupt, not the original one. Won't that be an issue ?
[edit] I think you'll have to EOI in the kernel before the event is sent to the user process. You'll need to mask the interrupt at that point. And then the userspace can unmask the interrupt when its done. That might work.....
[edit again] Ooops, I now see that's what you are doing. My bad.
[edit] I think you'll have to EOI in the kernel before the event is sent to the user process. You'll need to mask the interrupt at that point. And then the userspace can unmask the interrupt when its done. That might work.....
[edit again] Ooops, I now see that's what you are doing. My bad.
If a trainstation is where trains stop, what is a workstation ?
Re: Too many IRQs from RTL8139
Sorry, I got distracted at work and couldn't make a proper reply. Your code looks okay to me but I recommend that you don't do the EOI from userspace. It's just too difficult with nested interrupts. And they are unavoidable in your design. I think the best thing to do it to
Code: Select all
// driver (in userspace)
send request
wait_for_irq(irq)
do interrupty stuff
reset_irq(irq)
// called from userspace (but in kernelspace)
wait_for_irq(irq):
if fired[irq]:
return
else:
put self on wait queue for irq
reset_irq(irq):
fired[irq] = false
unmask irq /********** And when the counter reached zero actually unmask the interrupt */
/* send EOI for irq <<<<<<<<<<<<<<<<<<********** remove this */
// in kernelspace
irq_handler(irq):
if some thread waiting for irq:
wake up thread on wait queue for irq
else:
fired[irq] = true
mask irq /********** with a counter equal to the number of events that will be sent to shared ints */
fi
send EOI for irq /********** Do this unconditionally */
If a trainstation is where trains stop, what is a workstation ?
- NickJohnson
- Member
- Posts: 1249
- Joined: Tue Mar 24, 2009 8:11 pm
- Location: Sunnyvale, California
Re: Too many IRQs from RTL8139
But if you perform an EOI in the kernel's interrupt handler and the device hasn't been reset by the driver (like my issue with the RTL8139) then it will keep generating interrupts. There also isn't a problem with shared interrupts in this system, because waiting for an IRQ is idempotent. I think the fix is something more like this:
This is also nice because it has an additional invariant. When a thread is doing interrupt-related activities (between watiting and resetting), fired[irq] is always true, and the IRQ is always masked. Basically, it's impossible to distinguish between the cases where the thread waited for the IRQ and when the IRQ had already fired and the wait call just returned directly.
Code: Select all
// driver (in userspace)
send request
wait_for_irq(irq)
do interrupty stuff
reset_irq(irq)
// called from userspace (but in kernelspace)
wait_for_irq(irq):
if fired[irq]:
return
else:
put self on wait queue for irq
reset_irq(irq):
fired[irq] = false
unmask irq
// in kernelspace
irq_handler(irq):
if some thread waiting for irq:
wake up thread on wait queue for irq
fired[irq] = true
mask irq
send EOI for irq
Re: Too many IRQs from RTL8139
Isn't that the same as my suggestion ?
If a trainstation is where trains stop, what is a workstation ?
- NickJohnson
- Member
- Posts: 1249
- Joined: Tue Mar 24, 2009 8:11 pm
- Location: Sunnyvale, California
Re: Too many IRQs from RTL8139
No: it also masks the IRQ and sets the fired[irq] flag unconditionally, along with doing the EOI.
Re: Too many IRQs from RTL8139
Oh, okay. I was only really reading the part of your code that masks the interrupt and does the EOI. I figured you had the other bit under control so didn't look at it.
Also, you will need to have a counter in the "reset_irq" routine so that it knows when to really unmask the interrupt. i.e. when all the events generated for the interrupt have been acted on.
[Edit] I just looked up idempotent. I mean interrupts that might be shared between 2 devices. Say the kernel sends the event to 2 drivers because it doesn't know which one actually caused the interrupt. It shouldn't EOI until each has acked its event.
Also, you will need to have a counter in the "reset_irq" routine so that it knows when to really unmask the interrupt. i.e. when all the events generated for the interrupt have been acted on.
[Edit] I just looked up idempotent. I mean interrupts that might be shared between 2 devices. Say the kernel sends the event to 2 drivers because it doesn't know which one actually caused the interrupt. It shouldn't EOI until each has acked its event.
If a trainstation is where trains stop, what is a workstation ?
- NickJohnson
- Member
- Posts: 1249
- Joined: Tue Mar 24, 2009 8:11 pm
- Location: Sunnyvale, California
Re: Too many IRQs from RTL8139
Each driver should be able to determine whether its device was the one that generated the interrupt, and then only reset the IRQ if it is supposed to. Even in the event of rogue IRQ resets, the interrupt line should stay high because one of the devices hasn't been dealt with, so the IRQ will be immediately regenerated.gerryg400 wrote:I mean interrupts that might be shared between 2 devices. Say the kernel sends the event to 2 drivers because it doesn't know which one actually caused the interrupt. It shouldn't EOI until each has acked its event.