OSDev.org

Posted: **Wed Dec 02, 2009 7:25 pm**

Hi all,

Yesterday on IRC I asked whether in SMP systems, there's only one instance of a specific IRQ handler at any point in time. Travis kindly replied that

it's the responsibility of the interrupt controller(s) to not trigger the same irq simultaneously on different cpus, at least global, hardware interrupts.

So I can happily code external interrupt handlers with no need of making each one concurrency-safe with respect to itself. Searching the IO-apic sheet and Intel manuals though, I found that when a local apic receives an End-of-Interrupt (EOI), it informs the IO-apics about this EOI only in case of level interrupts. Upon receiving an EOI for edge-triggered interrupt, the local apic does not inform the IO-apic of this event. Quoting from Intel's chapter:

The act of writing to the EOI register causes the local APIC to delete the interrupt from its ISR queue and (for level-triggered interrupts) send a message on the bus indicating that the interrupt handling has been completed.

And quoting another paragraph from the same chapter which says it more explicitly:

Upon acceptance of an interrupt into the IRR, the corresponding TMR [Trigger Mode Register] bit is cleared for edge-triggered interrupts and set for level-triggered interrupts. If a TMR bit is set when an EOI cycle for its corresponding interrupt vector is generated, an EOI message is sent to all I/O APICs.

So in the time between we ACK a hardware device and send an EOI, how does the IOAPIC know that this edge-triggered handler is still in execution, and avoids forwarding the device next interrupt to another core?

Thanks a lot!

Edit: per forum rules, avoid using colored text.

Posted: **Thu Dec 03, 2009 2:08 am**

Hi,

Darwish wrote:So in the time between we ACK a hardware device and send an EOI, how does the IOAPIC know that this edge-triggered handler is still in execution, and avoids forwarding the device next interrupt to another core?

As I understand things (and not necessarily how things actually are)...

If a second edge-triggered IRQ occurs (while the first is being handled), then the I/O APIC would send the IRQ to the local APIC/s. If the local APIC is already handling the same IRQ then it'll set the corresponding IRR bit, and if the IRR bit is set when you do the EOI for the first IRQ then the local APIC will send the second IRQ to the CPU (and you'd get 2 IRQs, one after the other). If the local APIC is not already handling the same IRQ (e.g. the first IRQ is being handled by a different CPU) then the local APIC will send the second IRQ to the CPU as soon as it can (e.g. before the other CPU has sent the EOI to it's local APIC for the first IRQ). This means that for "broadcast to many" edge-triggered interrupts you need to handle re-entrancy.

For "send to lowest priority" the actual behaviour depends on a few things. P6 and older CPU's support the "focus processor" feature. If this is enabled, then the second IRQ will be sent to the same CPU as the first CPU (even if that CPU isn't the lowest priority CPU anymore), and you won't need to worry about re-entrancy (you'd get 2 IRQs on the same CPU, one after the other). Also, once the IRR is set then setting it again won't make any difference, which means that a third (or fourth, fifth, etc) IRQ will be ignored by the local APIC. To be honest, I like the "focus processor" idea (mostly because the IRQ handler and it's data will still be in the CPUs cache, and there's less chance of lock contention), so I'd recommend using this feature if it's present.

For newer CPUs (and older CPUs with the "focus processor" feature disabled) it's likely that the CPU that's handling the first IRQ will not be the lowest priority CPU anymore, and therefore it's likely that the second IRQ will be handled by a different CPU.

Cheers,

Brendan

Posted: **Thu Dec 03, 2009 6:19 pm**

Hi Brendan,

Brendan wrote: As I understand things (and not necessarily how things actually are)...

If a second edge-triggered IRQ occurs (while the first is being handled), then the I/O APIC would send the IRQ to the local APIC/s. If the local APIC is already handling the same IRQ then it'll set the corresponding IRR bit, and if the IRR bit is set when you do the EOI for the first IRQ then the local APIC will send the second IRQ to the CPU (and you'd get 2 IRQs, one after the other).

Great, the local APICs are forwarding interrupts to their cores according to the APICs IRR and ISR bits as expected.

For "send to lowest priority" the actual behaviour depends on a few things. P6 and older CPU's support the "focus processor" feature. If this is enabled, then the second IRQ will be sent to the same CPU as the first CPU (even if that CPU isn't the lowest priority CPU anymore), and you won't need to worry about re-entrancy (you'd get 2 IRQs on the same CPU, one after the other). Also, once the IRR is set then setting it again won't make any difference, which means that a third (or fourth, fifth, etc) IRQ will be ignored by the local APIC. To be honest, I like the "focus processor" idea (mostly because the IRQ handler and it's data will still be in the CPUs cache, and there's less chance of lock contention), so I'd recommend using this feature if it's present.

For newer CPUs (and older CPUs with the "focus processor" feature disabled) it's likely that the CPU that's handling the first IRQ will not be the lowest priority CPU anymore, and therefore it's likely that the second IRQ will be handled by a different CPU.

This is exactly the delivery mode I'm setting up in the IOAPIC entries, and you've indeed asserted my fears about this re-rentrancy issue. For the sake of avoiding useless locking complexities, I guess I'll setup the legacy edge-triggered ISA IOAPIC routing entries with 'Fixed' delivery mode to the bootstrap core. AFAIK, they're already very slow devices, and won't bias irq handling balance very much. Any further suggestion?

From the Intel quotes of the first post, I guess I do not need to worry about re-entrancy for level-triggered irq handlers. It's not very explicitly stated, but my understanding is that even in lowest-priority delivery mode, the IOAPIC won't forward another IRQ till it knowns that the first handler has sent an EOI (the irq's ioapic routing entry 'Remote IRR' bit = 0). Am I missing something critical?

Cheers,

Superb help as always, really thanks a lot

Posted: **Thu Dec 03, 2009 8:42 pm**

Hi,

Darwish wrote:For the sake of avoiding useless locking complexities, I guess I'll setup the legacy edge-triggered ISA IOAPIC routing entries with 'Fixed' delivery mode to the bootstrap core. AFAIK, they're already very slow devices, and won't bias irq handling balance very much. Any further suggestion?

One more thing to be aware of is that MSI (Message Signalled IRQs) are edge-triggered. This means that it's not just older/slower devices, but also modern high speed devices (e.g. all PCI express devices support MSI).

You could share the load a little - e.g. use "fixed delivery" to send different edge-triggered IRQs to different CPUs. You could even reprogram the IRQ while the OS is running and change which CPU the IRQ is sent to (for power management, or for "hot-plug CPUs" support, or for IRQ load balancing).

There are other ways of handling it too. For my OS, the kernel's IRQ handlers just send message/s to the device driver/s that use the IRQ. The kernel's IRQ handlers are re-entrant (as they need to handle many different IRQs at the same time anyway), and the device driver only receives one message at a time. Basically the IRQs are "serialised" by my IPC/messaging.

A simpler way to serialise the IRQs would be something like:

Code: Select all

IRQ_count:  dd 0xFFFFFFFF

IRQ_handler:
    lock add dword [IRQ_count],1
    je .handle_IRQ
    iretd

.handle_IRQ:
    * stuff *
    call send_EOI
    lock sub dword [IRQ_count],1
    jnc .handle_IRQ
    iretd

This would serialise the IRQs (so you don't need to care about re-entrancy within "* stuff *"). It also emulates the behaviour of the "focus processor" feature, so you'd get better cache locality (fewer cache misses)...

Cheers,

Brendan

Posted: **Sun Mar 28, 2010 6:02 am**

I don´t know if this is the right thread fir my problem.

I´m trying to get the PIT to work with the io-apic and it´s edge-triggered. The problem I have is that I only get 1 interrupt. I send the eoi to the local apic and thought that this is enough, but seems not so.

I also read something that there is an eoi register in newer io-apics and that I need to send the eoi´s there instead of the local apic.

So what has to be done to get the edge-triggered interrupts working?

Posted: **Wed Mar 31, 2010 6:46 am**

Unfortunately I can´t test my code on a real smp machine at the moment. So the only result I have so far is, that it works on bochs if it has 8 cpus and it doesn´t work on qemu (I test till 12 cpus). Doesn´t work means my pit handler code is called only once.

As it works with bochs and 8 cpus, I would say it´s not a problem with my io-apic code. So what could it be?

Posted: **Wed Mar 31, 2010 4:49 pm**

FlashBurn wrote:Unfortunately I can´t test my code on a real smp machine at the moment. So the only result I have so far is, that it works on bochs if it has 8 cpus and it doesn´t work on qemu (I test till 12 cpus). Doesn´t work means my pit handler code is called only once. As it works with bochs and 8 cpus, I would say it´s not a problem with my io-apic code. So what could it be?

Little details are given, but I can speculate:

* Don't depend on the BIOS setting up the PIT for you
* Make sure you've programmed it correctly to monotonic mode (mode 3)
* Remember that in mode 3, the OUT-2 pin actually starts high and stands high half the programmed period, go low the other half, then return high. Thus, it's necessary to setup it as edge-triggered
* Test your IRQs setup with the PICs first since it's hard to program the PIC wrongly
* Make sure you've read where the PIT is connected to the IOAPICs correctly from either MP or ACPI tables
* Make sure you enable the local APIC using both, the apic_enable bit in the spurious interrupt vector, and using the APICBASE_ENABLE MSR
* Just in case, set the destination core local APIC 'Task Priority Register' (TPR) to zero

Finally try to setup your PIT IRQ with the simplest possible I/O APIC entry, namely:

Code: Select all

vector = PIT_VECTOR;
delivery_mode = FIXED;
destination_mode = PHYSICAL;
polarity = HIGH;
trigger = EDGE_TRIGGERED;
mask = UNMASK;
destination = bootstrap core APIC ID

And enable the bootstrap core local APIC using the two methods outlined above.

Best of luck

Posted: **Wed Mar 31, 2010 11:39 pm**

OK, I setup the PIT myself (the same way I´m setting it up for using in my bootloader), so it should be programmed right (ICW= 0x34). The local APIC is enable, because I use it for my scheduler it this is working.

But I will look if I setup the IO-APIC right and if it is the irq of the PIT (the thing is, that it works with Bochs and 8 cpus should mean it works under some conditions).

Posted: **Thu Apr 01, 2010 1:40 am**

FlashBurn wrote:OK, I setup the PIT myself (the same way I´m setting it up for using in my bootloader), so it should be programmed right (ICW= 0x34). The local APIC is enable, because I use it for my scheduler it this is working.

So, is it working with the PIC? Are you sure the PIT is setup monotonically? You mention the PIT timer, then jump talking about the PIC's ICW register.

(the thing is, that it works with Bochs and 8 cpus should mean it works under some conditions).

Assuming things work nicely with the PIC, test the APICs by directing the PIT IRQ only to the bootstrap local APIC; number of CPUs will then become irrelevant.

Finally, testing IRQs is simple (not much dependencies): if you're completely stuck, strip your kernel to the smallest part possible and test it with a manually-setup PIT handler. Things should be clearer then.

(Also try to organize your thoughts a bit before posting)

Posted: **Thu Apr 01, 2010 11:26 am**

Darwish wrote: (Also try to organize your thoughts a bit before posting)

Sorry, I was in a hurry.

In my bootloader I use the PIT with the pic for timing things. So it means I know that it works with the PIC. I also use the PIT with the pic when there is no APIC and no IO-APIC for my scheduler and so I know that it works.
The symptom is that If I run my OS in Bochs with 2 till 7 CPUs the PIT irq will fire only once and then not anymore, but if I run Bochs with 8 CPUs it fires all the time. So from this I know that it works under some conditions.

So I hope that I find some time over the weekend to look if my IO-APIC is initialized right and if anything changes if I set it up so that the irq will be send only to the BSP.

Darwish wrote: Are you sure the PIT is setup monotonically? You mention the PIT timer, then jump talking about the PIC's ICW register.

Yeah, I programmed it right, because I select the counter # 0, modus 3, and writing 1st the lower byte and then the high byte and I use binary numbers (=0x34). I mixed the ICW and init byte for the PIT.

Posted: **Sun Apr 04, 2010 12:48 pm**

Shame on me

It was my fault (as always

). The IO-APIC was setup right and the code was working, the only fault I made was, that "<< 16" is not "* 16"

So now it is working as planed!

OSDev.org

SMP: how APICs force serial execution of an edge irq handler

SMP: how APICs force serial execution of an edge irq handler

Re: SMP: how APICs force serial execution of an edge irq handler

Re: SMP: how APICs force serial execution of an edge irq handler

Re: SMP: how APICs force serial execution of an edge irq handler

Re: SMP: how APICs force serial execution of an edge irq han

Re: SMP: how APICs force serial execution of an edge irq han

Re: SMP: how APICs force serial execution of an edge irq han

Re: SMP: how APICs force serial execution of an edge irq han

Re: SMP: how APICs force serial execution of an edge irq han

Re: SMP: how APICs force serial execution of an edge irq han

Re: SMP: how APICs force serial execution of an edge irq han