Hi,
This looks a bit backwards to me. Typically you'd:
- Make sure the CPU has finished sending the previous IPI using the "Delivery Status" flag (with special attention to race conditions - you don't want to see "Delivery Status = idle" and be interrupted by something that sends an IPI and causes the flag to become set before you've sent your IPI)
- Write to the ICR; making sure that you set the high dword first and then the low dword. Note: There's no need to read the old high dword from the ICR (e.g. just do "ICR.high = dest << 24;" and don't bother with "ICR.high = (dest << 24;) | (ICR.high & 0x00FFFFFF);").
Never send EOI when sending an IPI. The CPU that receives the IPI does EOI after it has finished handling the IPI.
When the other CPU's local APIC receives the IPI, it sets the corresponding flag in its IRR register and starts trying to deliver the interrupt to the CPU (according to interrupt priortity rules, if the CPU has interrupts enabled, etc). If/when the local APIC delivers the interrupt to CPU it clears that flag in the IRR and sets the flag in the ISR. The EOI clears the flag in the ISR.
If the other CPU's local APIC receives a second IPI for the same interrupt vector while the corresponding flag in the IRR is already set, then the second IPI is ignored.
This means that you have to wait until after the first IPI has been delivered to the CPU (and not just received by the CPU's local APIC) before you send a second IPI for the same interrupt vector. In practice there's no way to do that; so the interrupt handler has to set some sort of "interrupt being serviced" flag in memory (e.g. a volatile variable or something), and the sending CPU has to wait until this variable has been modified by the other CPUs interrupt handler before it can send another IPI for the same interrupt vector.
More specifically; in practice typically you've got some other data to go along with the IPI (e.g. an address to invalidate) and/or multiple CPUs may send the IPI to the same CPU at the same time; and you have to use some sort of lock where you acquire the lock and send the IPI, then wait until the other CPU does something to indicate it has finished handling your IPI before you release the lock.
For an example, multi-CPU TLB shootdown code might look more like this:
Code: Select all
sendTLBinvalidation:
acquire spinlock for the interrupt vector
set the address to invalidate in a global variable
set "number of CPUs still handling IPI" counter to the number of CPUs that will be receiving the IPI
wait until "Delivery Status" flag says local APIC is idle (and disable IRQs to avoid race conditions)
send the IPI (then re-enable IRQs)
wait for "number of CPUs still handling IPI" counter to become zero
release the spinlock for the interrupt vector
Code: Select all
TLBinvalidationIPIhandler:
get the address to invalidate from the global variable
invalidate that TLB entry
atomically decrement the "number of CPUs still handling IPI" counter
send EOI to local APIC
Note that for decent scalability you're probably going to want to use multiple interrupt vectors for IPIs (each with their own "data" variables, their own "number of CPUs still handling IPI" counters and their own locks).
Also, for convenience (especially for the less frequent uses of IPIs) you can generalise it. For example, you can have a routine that takes an "address of code I want remote CPUs to execute" input parameter that gets stored in a global variable, where other CPU's IPI handler executes whatever you like.
Cheers,
Brendan