Hi,
atagunov wrote:- how likely do you think is AMD to introduce PCID?
I'd assume that it's very likely that AMD (and VIA) will also implement PCID when they can.
atagunov wrote:- it's not coming in the Bulldozers is it?
It takes a relatively long time (several years) for a CPU to go from "concept" to "initial design", then through all the testing/validation, into reference chips, and then end up in production. It's cheap and easy to make changes in the beginning, and hard and expensive to make changes near the end.
I wouldn't be too surprised if AMD didn't find out about PCID soon enough, and when they did find out about PCID it was too late (and too expensive to add to Bulldozer). I'd also assume that if it's not in "1st Generation Bulldozer" it will be in "2nd Generation Bulldozer".
However...
TLBs aren't infinitely large. If one process does enough work, then the "least recently used" eviction algorithm will cause TLB entries for other processes to be evicted (despite PCID). The advantages of PCID may not be as much as you're expecting (and really depends on how many TLB entries each process uses, the order that processes use the same CPU, the size of the TLB, etc - it's hard to predict how much it might help under various workloads).
When there's multiple CPUs, you need to keep the TLBs synchronised. If one CPU changes the paging structures it needs to invalidate the effected TLB entry/s on that CPU, but also needs to tell other CPUs to invalidate the effected TLB entry/s. This is called "multi-CPU TLB shootdown", and is typically done with IPIs. It's also expensive. There are a lot of ways to avoid it in certain situations. For example, (without PCID) if you change a page table entry for a single-threaded process that is currently running on one CPU, then you know that all other CPUs can't have that single-threaded process' TLB entries and therefore you can safely avoid the "multi-CPU TLB shootdown". With PCID, the number of situations where "multi-CPU TLB shootdown" can be avoided is reduced. For the same "single-threaded process" example, you can't assume that other CPUs don't have the effected TLB entry/s (even though you know that the process can't be running on other CPUs) and therefore you can't easily avoid the expensive "multi-CPU TLB shootdown".
Also, the "process context identifiers" are 12-bit numbers, so you get 4096 process context IDs. If the OS supports more than 4096 processes at the same time, then it has to have some sort of dynamic ID management (for e.g. use the IDs for the most recently used processes, where less recently used processes have no ID and the IDs are reassigned when a less recently used process is given some CPU time). Even if the OS doesn't support more than 4096 processes at the same time it would still need to track which IDs are currently in use, because it still needs to support 4096 processes at different times. This "ID management" adds some overhead to the OS somewhere.
Basically, (depending on a very large number of things), the disadvantages of PCID might out-weight the advantages, and PCID may actually make performance worse in some situations.
The best way to avoid overhead is to avoid task switches.
atagunov wrote:wanted:
- supa-fast IPC avoiding TLB flushing if possible
"Supa-fast IPC" (e.g. rendezvous/synchronous messaging) typically doesn't avoid task switches. Slower IPC (e.g. asynchronous, where the message data is only put onto the receiver's queue) can avoid task switches. Therefore an OS that uses "Supa-fast IPC" may be slower than an OS that uses "slower IPC".
Cheers,
Brendan