Hi,
Venkatesh wrote:Do any of you know how long an IPI takes on any relatively modern (P2 or greater) x86 system? Specifically, from CPU0 firing an IPI to CPU1 executing the first instruction of the IPI ISR.
There's 2 costs and one delay here - telling the local APIC to send the IPI, waiting for the IPI to be transfered to the other CPU and handling a received IPI.
Telling the local APIC to send the IPI is fast - it's one or 2 MOV instructions that don't reference anything outside the CPU (and don't involve the front-side bus), so that probably costs about 4 cycles.
How long it takes for the hardware to transfer the IPI to the receiver is hard to predict - I'm guessing it depends on front-side bus speed or hyper-transport link speed (for modern CPUs) or the "APIC bus" (for Pentium III and older?); and how busy the bus is. As a very rough estimate I'd expect it'd take around 50 cycles.
The overhead of receiving the IPI would be no different to the overhead of receiving any interrupt - about 20-ish cycles (longer if there's cache misses or TLB misses involved, or if the receiver has interrupts disabled).
That gives a (very rough) total time of around 75 cycles from before you send the IPI to the first instruction in the IPI interrupt handler.
Note: I did (sort of) test IPI overhead once, but at the time I was trying to measure the overhead of receiving an IRQ. What I did was send an IPI to the same CPU, and measured (with RDTSC) the time from sending the IPI to the first instruction in the interrupt handler, which (from memory) worked out to 22 cycles without any cache misses, etc for a Pentium 4. IIRC I also found out that the IPI is sent and received almost immediately (e.g. I wasn't able to execute more than one instruction after sending the IPI before the code got interrupted), but I assume that sending an IPI to the same CPU is "fast-tracked" inside the local APIC (as it doesn't need to be sent to another CPU or anywhere else).
Of course most of the above is just guesses - you could measure it properly. The way I'd do this is to mask all IRQs in the PIC or I/O APIC (so you can use HLT and know that an IPI is the only thing that will interrupt), then use something like:
Code: Select all
IPIhandler:
send_EIO();
iretd
second_CPU:
hlt
sendIPI();
jmp second_CPU
first_CPU:
start_time = RDTSC();
sendIPI();
hlt
time = (RDTSC() - start_time) / 2;
return time;
The idea is to measure the time it takes to send an IPI and get an IPI back, and then halve it to find the time it took for one IPI (so you don't need to make sure both CPUs time stamp counters are synchronized, and don't need to worry about shared cache lines).
Cheers,
Brendan