It turns out that this is only a problem in the "clearing dirty bit" case, just like on x86. Imagine you've implemented your TLB shootdown scheme by modifying the PTE, then sending an IPI, instead of sending the IPI first. In this case, the other CPUs don't need to spin -- they just get the IPI, do invlpg, then decrement the "ack counter". This is the moral equivalent of changing the PTE and using tlbie on PowerPC. The race condition inherent in this scheme has different consequences depending on the case:Brendan wrote:That sounds like a race condition to me - on 80x86 you'd need to atomically change the page table entry and invalidate at the same time (in which case you could probably send a TLB correction rather than a TLB invalidation).Colonel Kernel wrote:In the PowerPC architecture, the tlbie instruction sends a special "invalidate" message over the address bus to guarantee that all CPUs flush the right entry immediately. This is the kind of thing that should be hardware-accelerated, not A/D stuff.
- For marking a page as "not present" in the case where a page is going to be freed, a few writes to that page by other threads might happen before the TLB shootdown. In the case of freeing a user page, this is a logic error in the other threads. The OS need not make any guarantees about when its "VMFree()" syscall is going to fully take effect. In the case of freeing kernel pages, I think some external synchronization is needed, like your proposed "maintenance mode".
- For marking a page as "not present" in the case where a page is going to be swapped out (either written out to disk if it's dirty, or re-allocated to another process if it's clean), any extra writes by other threads will not be lost because the swapping thread (the one that marked it as "not present") isn't going to do anything until the TLB shootdown is complete anyway. The only problem is the possibility of a lost dirty bit in this case.
- For marking a page as "read-only" that was formerly "read-write", I don't see why the OS needs to make any guarantees to other threads about when this change takes effect. User code should be using external synchronization in cases like this. A few extra writes snuck in by another thread in the same process isn't going to hurt the kernel, as far as I can tell.
What do you think? Did I miss any cases? Analyzing these things hurts my head...