IRQ acks/masks

durand · Post by **durand** » Fri Sep 16, 2005 2:56 am

Hey,

Are there any recommended methods of servicing IRQ's?

I'm having a lot of trouble getting my system to run across different types of machines. It all appears to be caused by the IRQ's and seems to happen on the slow machines. So, this leads me to believe my IRQ handling is not fast enough.

Basically, these are my steps:

1. IRQ fires and current thread jumps into the kernel IRQ handling routine - non-interruptable
2. kernel ACK's the IRQ with the PIC.
3. kernel masks the IRQ.
4. kernel locates the thread which is supposed to handle the IRQ.
5. kernel queues the handling thread to run.
6. kernel does a sched_yield() .. triggering the handling thread to run.
7. system runs as per normal except the handling thread gets to run on every scheduler call.
8. handling thread finishes and then tells the kernel to unmask the IRQ and deschedule itself.
9. kernel deschedules, unmasks IRQ.

If there are multiple IRQ's firing at once, they are all ACK'd, masked and then queued for running. The handling threads run sequentially in FIFO (wrt irq firing) order.

Seems straight forward... The IRQ is acknowleged and masked almost immediately. It just takes it's time to be handled.

However, my problem might not be an IRQ issue. I'm just trying to figure stuff out.

Durand.

distantvoices · Post by **distantvoices** » Fri Sep 16, 2005 3:06 am

Hi, durand!

I have the handler merely send a message or set some variable to have the timer handler send a message (which might be removed/altered for what needs the timer handler to know about mouse or keyboard ...).

the irq stubs send eoi to the pic after calling the handler.

irq handlers are expected to be short and straight. The message wakes up the driver thread which has registered the irq handler. the driver thread runs on a higher priority than services or user tasks but can be preempted by other driver threads of the same priority. This makes mouse handling stuff pretty smooth. *gg*

As long as no other driver gets scheduled - the driver thread can run as long as it needs to fullfill its task. (which is per se required to be short and straight) especially, the driver takes care of its device and takes care of that it can generate an interrupt again if necessary.

Hope this helps.

stay safe

durand · Post by **durand** » Fri Sep 16, 2005 3:16 am

hey

It sounds like the only difference between the two is the method of invoking the real IRQ handler (apart from the masking). Which is quite cool.. because it kinda verifies my methods as well.

By the way, I discovered that some IRQ's keep firing until they're handled - specifically the network card. If an IRQ fires, gets ACK'ed and is not immediately handled before the EOI, the IRQ fires again - regardless of whether or not something is servicing it.

That's why I mask. ( "K, I got it. Be quiet until you're serviced" )

Brendan · Post by **Brendan** » Fri Sep 16, 2005 3:58 am

Hi,

durand wrote:Basically, these are my steps:

1. IRQ fires and current thread jumps into the kernel IRQ handling routine - non-interruptable
2. kernel ACK's the IRQ with the PIC.
3. kernel masks the IRQ.
4. kernel locates the thread which is supposed to handle the IRQ.
5. kernel queues the handling thread to run.
6. kernel does a sched_yield() .. triggering the handling thread to run.
7. system runs as per normal except the handling thread gets to run on every scheduler call.
8. handling thread finishes and then tells the kernel to unmask the IRQ and deschedule itself.
9. kernel deschedules, unmasks IRQ.

If there are multiple IRQ's firing at once, they are all ACK'd, masked and then queued for running. The handling threads run sequentially in FIFO (wrt irq firing) order.

If interrupts are left enabled you could get the same IRQ again after it's EOI is sent but before you've had time to mask it. In this case it'd make more sense to mask it then send the EOI.

For PCI IRQ's several devices may be sharing the same IRQ line. In this case you'd need to try one IRQ handler at a time until one of them says "It was me!". For performance reasons it'd be best if the IRQ handlers are tried in order of how often they receive an IRQ (and for multi-CPU it could be good to try N IRQ handlers at a time, where N is the number of CPUs).

I've also never quite understood why anyone would mask the IRQ and send the EOI, and then unmask the IRQ again later - it's almost the same as just sending the EOI later (without masking/unmasking), except for the PIC/APIC IRQ priorities (which could be very useful if they aren't messed up like they are if the PIC chips are used, but don't do any real harm in any case).

durand wrote:That's why I mask. ( "K, I got it. Be quiet until you're serviced" )

That's what EOI is meant for...

durand wrote:I'm having a lot of trouble getting my system to run across different types of machines. It all appears to be caused by the IRQ's and seems to happen on the slow machines. So, this leads me to believe my IRQ handling is not fast enough.

It may not be your IRQ handling - instead you might have interrupts disabled for too long in something that has nothing to do with IRQ handling, or perhaps there's a timer (e.g. PIT) that's set for a frequency that older/slower computers can't handle (or perhaps a combination of both, or something else entirely)...

Cheers,

Brendan

Candy · Post by **Candy** » Fri Sep 16, 2005 5:19 am

Brendan wrote: For PCI IRQ's several devices may be sharing the same IRQ line. In this case you'd need to try one IRQ handler at a time until one of them says "It was me!". For performance reasons it'd be best if the IRQ handlers are tried in order of how often they receive an IRQ (and for multi-CPU it could be good to try N IRQ handlers at a time, where N is the number of CPUs).

I think that multi-cpu approach is only going to be good for people that would use Windows anyway. It'd wreck your caches, utterly screw up TLB's and only increase the response time of the interrupt if some bit of the detection code was badly programmed. And even then, only slightly.

It may not be your IRQ handling - instead you might have interrupts disabled for too long in something that has nothing to do with IRQ handling, or perhaps there's a timer (e.g. PIT) that's set for a frequency that older/slower computers can't handle (or perhaps a combination of both, or something else entirely)...

The one I've seen was my interrupt handler for the timer being nailed in bochs, since ips was set to 500000 and the timer was set to 1000hz. That just kind of overshot the allocated time slice, so it kept building up stack (somebody screwed up irq handling too...) until it overflowed and then threw a stack exception.

Does your OS work in bochs?

durand · Post by **durand** » Fri Sep 16, 2005 5:44 am

If interrupts are left enabled ... more sense to mask it then send the EOI.

Fortunately, my IRQ events are interrupt free. But I used to do it like that until about yesterday. I remember reading somewhere (a long time ago) that on some PICS, the effects of masking a "raised" IRQ event before EOI'ing it was undefined. So, I recalled that yesterday and swapped my MASK/EOI things around.

I don't know how true that is but in a non-interruptable state (as my handler is), it doesn't matter that I EOI it first and then mask.

I've also never quite understood why anyone would mask the IRQ and send the EOI, and then unmask the IRQ again later.

Also, I started doing it this way when I added my first network card. I seem to recall that the IRQ kept firing despite not being EOI'd. I'll need to confirm that again, though.

Does your OS work in bochs?

Nope, I don't support Bochs. Good point though. The slower machines might not support the smaller PIT intervals. I'll check that out.

durand · Post by **durand** » Fri Sep 16, 2005 8:40 am

Thanks for the help everyone.

I found some silly race conditions which could produce the same problems that people have been experiencing. Tsk. Basically a deadlock because switching becomes disabled and the active thread tries to acquire a locked lock.

There are a few of them in the code so I know what I can spend the next few hours doing... oh well.

Brendan · Post by **Brendan** » Fri Sep 16, 2005 12:58 pm

Hi,

Candy wrote:
Brendan wrote:For PCI IRQ's several devices may be sharing the same IRQ line. In this case you'd need to try one IRQ handler at a time until one of them says "It was me!". For performance reasons it'd be best if the IRQ handlers are tried in order of how often they receive an IRQ (and for multi-CPU it could be good to try N IRQ handlers at a time, where N is the number of CPUs).
I think that multi-cpu approach is only going to be good for people that would use Windows anyway. It'd wreck your caches, utterly screw up TLB's and only increase the response time of the interrupt if some bit of the detection code was badly programmed. And even then, only slightly.

I remember doing a pile of theory on this a while back...

Half of the problem is that, for my OS, when a thread becomes ready to run (for e.g. when it receives an "IRQ occured" message) the thread will be scheduled to run on the CPU that has the least load. When one IRQ handler tells the kernel that it wasn't the device responsible for the IRQ it's thread will still be running, so when the kernel schedules the next "IRQ handling thread" it's extremely likely that it'll be run on a different CPU (i.e. a CPU that, at that exact moment, has less CPU load).

For example, for a dual CPU computer where 4 devices share an IRQ, if the fourth device caused the IRQ then the first IRQ handler may run on CPU #0, the second on CPU #1, the third on CPU #0 and the fourth on CPU #1. This gives a pattern like this (where 'I' is an IRQ handler and '-' is some other task):

CPU #0: -I-I--
CPU #1: --I-I-

This is bad for interrupt latency and bad for caches/TLBs. In the same situation, if the kernel runs N IRQ handlers at a time you'd get this:

CPU #0: -II---
CPU #1: -II---

This is good for interrupt latency and average for caches/TLBs. One solution would be to allow the scheduler to ignore it's load balancing when an IRQ handler becomes ready to run. In this case you'd get:

CPU #0: -IIII-
CPU #1: ------

This is bad for interrupt latency and good for caches/TLBs.

I'd have to admit, I haven't found a "perfect" solution. However, keeping track of how many IRQs are resolved by each IRQ handler and using this "probability rating" to sort the list of IRQ handlers is a good start (so that the most likely IRQ handler is the first one tried).

As for working out when it's best run multiple IRQ handlers on multiple CPUs, it depends on a large number of things (actual probability ratings, messaging & thread switch delays, what sort of interrupt latency is acceptable and the maximum frequency of IRQs, how many devices share the IRQ, how many CPUs are present, how important the work the CPUs are doing is, etc).

That's why I wrote "it could be good to try N IRQ handlers at a time" (which is IMHO both accurate and non-specific) rather than "it is best to try N IRQ handlers at a time" (which IMHO isn't accurate at all), or "it is best to try N IRQ handlers at a time if A and B and C, depending on D and E and F" (which IMHO would be accurate and specific, but also rather lengthy and difficult to quantify).

BTW - I'm not sure what you mean by "only increase the response time of the interrupt if some bit of the detection code was badly programmed"?

Cheers,

Brendan

Colonel Kernel · Post by **Colonel Kernel** » Sat Sep 17, 2005 1:36 am

Brendan wrote:I've also never quite understood why anyone would mask the IRQ and send the EOI, and then unmask the IRQ again later - it's almost the same as just sending the EOI later (without masking/unmasking), except for the PIC/APIC IRQ priorities (which could be very useful if they aren't messed up like they are if the PIC chips are used, but don't do any real harm in any case).

Apparently we've had this discussion before and I think we implicitly agreed to disagree.

I'd link directly to the first relevant post, but I'm not sure how...

OSDev.org

IRQ acks/masks

IRQ acks/masks

Re:IRQ acks/masks

Re:IRQ acks/masks

Re:IRQ acks/masks

Re:IRQ acks/masks

Re:IRQ acks/masks

Re:IRQ acks/masks

Re:IRQ acks/masks

Re:IRQ acks/masks