Simultaneous IPIs to same target
Posted: Fri Jul 15, 2011 2:45 pm
Hello, OSDevers!
I've run into a question: What should happen when (in my case) 3 CPUs/cores sends an IPI to the same LAPIC and int. vector simultaneously (or almost simultaneously)? To my understanding, target CPU should receive (and call corresponding interrupt handler) 3 times in a row. My kernel, however, disagrees and I receive only one interrupt. Tried it on QEMU and real quad-core computer, results are the same. When I add a slight (around 1ms, different for each core) delay, problem disappears.
Basically it's an AP waking process - BSP sets up trampoline environment, sends usual INIT-SIPI-SIPI (in my case INIT1-INIT2-INIT3-delay-SIPI1-SIPI2-SIPI3, no second SIPI) and then APs rush to the Long Mode. Once they are there and initialized things enough, they send an IPI back to BSP, telling "I'm alive". BSP is waiting for those IPIs and, when all 3 of them are received, proceeds to clean up trampoline and further to the scheduler.
AFAIK, other cores wakes and are set up correctly: they are able to print their LAPIC IDs, have unique stacks, can use their own LAPIC timers (to add delay I mentioned before). It looks like a race condition, so I tried to protect the whole IPI sending routine with spinlocks. No effect.
Of course I could come up with different way to detect when my APs are up: spin on (spinlock protected) variable, use a delay on BSP or keep those debugging delays on APs. But my current approach raises a few questions - is my assumption of 3 interrupts in-a-row wrong and those IPIs are somehow aggregated into one? Or there's a bug somewhere in my code?
Have read Intel's manuals about APIC couple of times, still no clue. Please advise.
I've run into a question: What should happen when (in my case) 3 CPUs/cores sends an IPI to the same LAPIC and int. vector simultaneously (or almost simultaneously)? To my understanding, target CPU should receive (and call corresponding interrupt handler) 3 times in a row. My kernel, however, disagrees and I receive only one interrupt. Tried it on QEMU and real quad-core computer, results are the same. When I add a slight (around 1ms, different for each core) delay, problem disappears.
Basically it's an AP waking process - BSP sets up trampoline environment, sends usual INIT-SIPI-SIPI (in my case INIT1-INIT2-INIT3-delay-SIPI1-SIPI2-SIPI3, no second SIPI) and then APs rush to the Long Mode. Once they are there and initialized things enough, they send an IPI back to BSP, telling "I'm alive". BSP is waiting for those IPIs and, when all 3 of them are received, proceeds to clean up trampoline and further to the scheduler.
AFAIK, other cores wakes and are set up correctly: they are able to print their LAPIC IDs, have unique stacks, can use their own LAPIC timers (to add delay I mentioned before). It looks like a race condition, so I tried to protect the whole IPI sending routine with spinlocks. No effect.
Of course I could come up with different way to detect when my APs are up: spin on (spinlock protected) variable, use a delay on BSP or keep those debugging delays on APs. But my current approach raises a few questions - is my assumption of 3 interrupts in-a-row wrong and those IPIs are somehow aggregated into one? Or there's a bug somewhere in my code?
Have read Intel's manuals about APIC couple of times, still no clue. Please advise.