Hey,
Are there any recommended methods of servicing IRQ's?
I'm having a lot of trouble getting my system to run across different types of machines. It all appears to be caused by the IRQ's and seems to happen on the slow machines. So, this leads me to believe my IRQ handling is not fast enough.
Basically, these are my steps:
1. IRQ fires and current thread jumps into the kernel IRQ handling routine - non-interruptable
2. kernel ACK's the IRQ with the PIC.
3. kernel masks the IRQ.
4. kernel locates the thread which is supposed to handle the IRQ.
5. kernel queues the handling thread to run.
6. kernel does a sched_yield() .. triggering the handling thread to run.
7. system runs as per normal except the handling thread gets to run on every scheduler call.
8. handling thread finishes and then tells the kernel to unmask the IRQ and deschedule itself.
9. kernel deschedules, unmasks IRQ.
If there are multiple IRQ's firing at once, they are all ACK'd, masked and then queued for running. The handling threads run sequentially in FIFO (wrt irq firing) order.
Seems straight forward... The IRQ is acknowleged and masked almost immediately. It just takes it's time to be handled.
However, my problem might not be an IRQ issue. I'm just trying to figure stuff out.
Durand.
IRQ acks/masks
-
- Member
- Posts: 1600
- Joined: Wed Oct 18, 2006 11:59 am
- Location: Vienna/Austria
- Contact:
Re:IRQ acks/masks
Hi, durand!
I have the handler merely send a message or set some variable to have the timer handler send a message (which might be removed/altered for what needs the timer handler to know about mouse or keyboard ...).
the irq stubs send eoi to the pic after calling the handler.
irq handlers are expected to be short and straight. The message wakes up the driver thread which has registered the irq handler. the driver thread runs on a higher priority than services or user tasks but can be preempted by other driver threads of the same priority. This makes mouse handling stuff pretty smooth. *gg*
As long as no other driver gets scheduled - the driver thread can run as long as it needs to fullfill its task. (which is per se required to be short and straight) especially, the driver takes care of its device and takes care of that it can generate an interrupt again if necessary.
Hope this helps.
stay safe
I have the handler merely send a message or set some variable to have the timer handler send a message (which might be removed/altered for what needs the timer handler to know about mouse or keyboard ...).
the irq stubs send eoi to the pic after calling the handler.
irq handlers are expected to be short and straight. The message wakes up the driver thread which has registered the irq handler. the driver thread runs on a higher priority than services or user tasks but can be preempted by other driver threads of the same priority. This makes mouse handling stuff pretty smooth. *gg*
As long as no other driver gets scheduled - the driver thread can run as long as it needs to fullfill its task. (which is per se required to be short and straight) especially, the driver takes care of its device and takes care of that it can generate an interrupt again if necessary.
Hope this helps.
stay safe
... the osdever formerly known as beyond infinity ...
BlueillusionOS iso image
BlueillusionOS iso image
Re:IRQ acks/masks
hey
It sounds like the only difference between the two is the method of invoking the real IRQ handler (apart from the masking). Which is quite cool.. because it kinda verifies my methods as well.
By the way, I discovered that some IRQ's keep firing until they're handled - specifically the network card. If an IRQ fires, gets ACK'ed and is not immediately handled before the EOI, the IRQ fires again - regardless of whether or not something is servicing it.
That's why I mask. ( "K, I got it. Be quiet until you're serviced" )
It sounds like the only difference between the two is the method of invoking the real IRQ handler (apart from the masking). Which is quite cool.. because it kinda verifies my methods as well.
By the way, I discovered that some IRQ's keep firing until they're handled - specifically the network card. If an IRQ fires, gets ACK'ed and is not immediately handled before the EOI, the IRQ fires again - regardless of whether or not something is servicing it.
That's why I mask. ( "K, I got it. Be quiet until you're serviced" )
Re:IRQ acks/masks
Hi,
For PCI IRQ's several devices may be sharing the same IRQ line. In this case you'd need to try one IRQ handler at a time until one of them says "It was me!". For performance reasons it'd be best if the IRQ handlers are tried in order of how often they receive an IRQ (and for multi-CPU it could be good to try N IRQ handlers at a time, where N is the number of CPUs).
I've also never quite understood why anyone would mask the IRQ and send the EOI, and then unmask the IRQ again later - it's almost the same as just sending the EOI later (without masking/unmasking), except for the PIC/APIC IRQ priorities (which could be very useful if they aren't messed up like they are if the PIC chips are used, but don't do any real harm in any case).
Cheers,
Brendan
If interrupts are left enabled you could get the same IRQ again after it's EOI is sent but before you've had time to mask it. In this case it'd make more sense to mask it then send the EOI.durand wrote:Basically, these are my steps:
1. IRQ fires and current thread jumps into the kernel IRQ handling routine - non-interruptable
2. kernel ACK's the IRQ with the PIC.
3. kernel masks the IRQ.
4. kernel locates the thread which is supposed to handle the IRQ.
5. kernel queues the handling thread to run.
6. kernel does a sched_yield() .. triggering the handling thread to run.
7. system runs as per normal except the handling thread gets to run on every scheduler call.
8. handling thread finishes and then tells the kernel to unmask the IRQ and deschedule itself.
9. kernel deschedules, unmasks IRQ.
If there are multiple IRQ's firing at once, they are all ACK'd, masked and then queued for running. The handling threads run sequentially in FIFO (wrt irq firing) order.
For PCI IRQ's several devices may be sharing the same IRQ line. In this case you'd need to try one IRQ handler at a time until one of them says "It was me!". For performance reasons it'd be best if the IRQ handlers are tried in order of how often they receive an IRQ (and for multi-CPU it could be good to try N IRQ handlers at a time, where N is the number of CPUs).
I've also never quite understood why anyone would mask the IRQ and send the EOI, and then unmask the IRQ again later - it's almost the same as just sending the EOI later (without masking/unmasking), except for the PIC/APIC IRQ priorities (which could be very useful if they aren't messed up like they are if the PIC chips are used, but don't do any real harm in any case).
That's what EOI is meant for...durand wrote:That's why I mask. ( "K, I got it. Be quiet until you're serviced" )
It may not be your IRQ handling - instead you might have interrupts disabled for too long in something that has nothing to do with IRQ handling, or perhaps there's a timer (e.g. PIT) that's set for a frequency that older/slower computers can't handle (or perhaps a combination of both, or something else entirely)...durand wrote:I'm having a lot of trouble getting my system to run across different types of machines. It all appears to be caused by the IRQ's and seems to happen on the slow machines. So, this leads me to believe my IRQ handling is not fast enough.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re:IRQ acks/masks
I think that multi-cpu approach is only going to be good for people that would use Windows anyway. It'd wreck your caches, utterly screw up TLB's and only increase the response time of the interrupt if some bit of the detection code was badly programmed. And even then, only slightly.Brendan wrote: For PCI IRQ's several devices may be sharing the same IRQ line. In this case you'd need to try one IRQ handler at a time until one of them says "It was me!". For performance reasons it'd be best if the IRQ handlers are tried in order of how often they receive an IRQ (and for multi-CPU it could be good to try N IRQ handlers at a time, where N is the number of CPUs).
The one I've seen was my interrupt handler for the timer being nailed in bochs, since ips was set to 500000 and the timer was set to 1000hz. That just kind of overshot the allocated time slice, so it kept building up stack (somebody screwed up irq handling too...) until it overflowed and then threw a stack exception.It may not be your IRQ handling - instead you might have interrupts disabled for too long in something that has nothing to do with IRQ handling, or perhaps there's a timer (e.g. PIT) that's set for a frequency that older/slower computers can't handle (or perhaps a combination of both, or something else entirely)...
Does your OS work in bochs?
Re:IRQ acks/masks
Fortunately, my IRQ events are interrupt free. But I used to do it like that until about yesterday. I remember reading somewhere (a long time ago) that on some PICS, the effects of masking a "raised" IRQ event before EOI'ing it was undefined. So, I recalled that yesterday and swapped my MASK/EOI things around.If interrupts are left enabled ... more sense to mask it then send the EOI.
I don't know how true that is but in a non-interruptable state (as my handler is), it doesn't matter that I EOI it first and then mask.
Also, I started doing it this way when I added my first network card. I seem to recall that the IRQ kept firing despite not being EOI'd. I'll need to confirm that again, though.I've also never quite understood why anyone would mask the IRQ and send the EOI, and then unmask the IRQ again later.
Nope, I don't support Bochs. Good point though. The slower machines might not support the smaller PIT intervals. I'll check that out.Does your OS work in bochs?
Re:IRQ acks/masks
Thanks for the help everyone.
I found some silly race conditions which could produce the same problems that people have been experiencing. Tsk. Basically a deadlock because switching becomes disabled and the active thread tries to acquire a locked lock.
There are a few of them in the code so I know what I can spend the next few hours doing... oh well.
I found some silly race conditions which could produce the same problems that people have been experiencing. Tsk. Basically a deadlock because switching becomes disabled and the active thread tries to acquire a locked lock.
There are a few of them in the code so I know what I can spend the next few hours doing... oh well.
Re:IRQ acks/masks
Hi,
Half of the problem is that, for my OS, when a thread becomes ready to run (for e.g. when it receives an "IRQ occured" message) the thread will be scheduled to run on the CPU that has the least load. When one IRQ handler tells the kernel that it wasn't the device responsible for the IRQ it's thread will still be running, so when the kernel schedules the next "IRQ handling thread" it's extremely likely that it'll be run on a different CPU (i.e. a CPU that, at that exact moment, has less CPU load).
For example, for a dual CPU computer where 4 devices share an IRQ, if the fourth device caused the IRQ then the first IRQ handler may run on CPU #0, the second on CPU #1, the third on CPU #0 and the fourth on CPU #1. This gives a pattern like this (where 'I' is an IRQ handler and '-' is some other task):
CPU #0: -I-I--
CPU #1: --I-I-
This is bad for interrupt latency and bad for caches/TLBs. In the same situation, if the kernel runs N IRQ handlers at a time you'd get this:
CPU #0: -II---
CPU #1: -II---
This is good for interrupt latency and average for caches/TLBs. One solution would be to allow the scheduler to ignore it's load balancing when an IRQ handler becomes ready to run. In this case you'd get:
CPU #0: -IIII-
CPU #1: ------
This is bad for interrupt latency and good for caches/TLBs.
I'd have to admit, I haven't found a "perfect" solution. However, keeping track of how many IRQs are resolved by each IRQ handler and using this "probability rating" to sort the list of IRQ handlers is a good start (so that the most likely IRQ handler is the first one tried).
As for working out when it's best run multiple IRQ handlers on multiple CPUs, it depends on a large number of things (actual probability ratings, messaging & thread switch delays, what sort of interrupt latency is acceptable and the maximum frequency of IRQs, how many devices share the IRQ, how many CPUs are present, how important the work the CPUs are doing is, etc).
That's why I wrote "it could be good to try N IRQ handlers at a time" (which is IMHO both accurate and non-specific) rather than "it is best to try N IRQ handlers at a time" (which IMHO isn't accurate at all), or "it is best to try N IRQ handlers at a time if A and B and C, depending on D and E and F" (which IMHO would be accurate and specific, but also rather lengthy and difficult to quantify).
BTW - I'm not sure what you mean by "only increase the response time of the interrupt if some bit of the detection code was badly programmed"?
Cheers,
Brendan
I remember doing a pile of theory on this a while back...Candy wrote:I think that multi-cpu approach is only going to be good for people that would use Windows anyway. It'd wreck your caches, utterly screw up TLB's and only increase the response time of the interrupt if some bit of the detection code was badly programmed. And even then, only slightly.Brendan wrote:For PCI IRQ's several devices may be sharing the same IRQ line. In this case you'd need to try one IRQ handler at a time until one of them says "It was me!". For performance reasons it'd be best if the IRQ handlers are tried in order of how often they receive an IRQ (and for multi-CPU it could be good to try N IRQ handlers at a time, where N is the number of CPUs).
Half of the problem is that, for my OS, when a thread becomes ready to run (for e.g. when it receives an "IRQ occured" message) the thread will be scheduled to run on the CPU that has the least load. When one IRQ handler tells the kernel that it wasn't the device responsible for the IRQ it's thread will still be running, so when the kernel schedules the next "IRQ handling thread" it's extremely likely that it'll be run on a different CPU (i.e. a CPU that, at that exact moment, has less CPU load).
For example, for a dual CPU computer where 4 devices share an IRQ, if the fourth device caused the IRQ then the first IRQ handler may run on CPU #0, the second on CPU #1, the third on CPU #0 and the fourth on CPU #1. This gives a pattern like this (where 'I' is an IRQ handler and '-' is some other task):
CPU #0: -I-I--
CPU #1: --I-I-
This is bad for interrupt latency and bad for caches/TLBs. In the same situation, if the kernel runs N IRQ handlers at a time you'd get this:
CPU #0: -II---
CPU #1: -II---
This is good for interrupt latency and average for caches/TLBs. One solution would be to allow the scheduler to ignore it's load balancing when an IRQ handler becomes ready to run. In this case you'd get:
CPU #0: -IIII-
CPU #1: ------
This is bad for interrupt latency and good for caches/TLBs.
I'd have to admit, I haven't found a "perfect" solution. However, keeping track of how many IRQs are resolved by each IRQ handler and using this "probability rating" to sort the list of IRQ handlers is a good start (so that the most likely IRQ handler is the first one tried).
As for working out when it's best run multiple IRQ handlers on multiple CPUs, it depends on a large number of things (actual probability ratings, messaging & thread switch delays, what sort of interrupt latency is acceptable and the maximum frequency of IRQs, how many devices share the IRQ, how many CPUs are present, how important the work the CPUs are doing is, etc).
That's why I wrote "it could be good to try N IRQ handlers at a time" (which is IMHO both accurate and non-specific) rather than "it is best to try N IRQ handlers at a time" (which IMHO isn't accurate at all), or "it is best to try N IRQ handlers at a time if A and B and C, depending on D and E and F" (which IMHO would be accurate and specific, but also rather lengthy and difficult to quantify).
BTW - I'm not sure what you mean by "only increase the response time of the interrupt if some bit of the detection code was badly programmed"?
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
- Colonel Kernel
- Member
- Posts: 1437
- Joined: Tue Oct 17, 2006 6:06 pm
- Location: Vancouver, BC, Canada
- Contact:
Re:IRQ acks/masks
Apparently we've had this discussion before and I think we implicitly agreed to disagree.Brendan wrote:I've also never quite understood why anyone would mask the IRQ and send the EOI, and then unmask the IRQ again later - it's almost the same as just sending the EOI later (without masking/unmasking), except for the PIC/APIC IRQ priorities (which could be very useful if they aren't messed up like they are if the PIC chips are used, but don't do any real harm in any case).
I'd link directly to the first relevant post, but I'm not sure how...
Top three reasons why my OS project died:
- Too much overtime at work
- Got married
- My brain got stuck in an infinite loop while trying to design the memory manager