Process-Context Identifiers (PCIDs)

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
JasonBond
Posts: 2
Joined: Sun Dec 27, 2015 6:23 am

Process-Context Identifiers (PCIDs)

Post by JasonBond »

I read about Process-Context Identifiers (PCIDs) for TLB/paging structure caches in Intel's manual but don't understand exactly how it should be used. For one thing, are there any real life OS (windows 10?) that is actually using it?

I suppose it is to prevent flushing some TLBs when we switch to a new CR3 and re-use the same TLB entries when switching back to a previous CR3 value. But the processor operation outlined in the intel's manual does not seem to support this. Exactly how does it benefit the performance?

Are there anything similar in AMD?
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Process-Context Identifiers (PCIDs)

Post by Brendan »

Hi,
JasonBond wrote:I read about Process-Context Identifiers (PCIDs) for TLB/paging structure caches in Intel's manual but don't understand exactly how it should be used. For one thing, are there any real life OS (windows 10?) that is actually using it?
I don't know if any OS (Windows, OS X, Linux, *BSD, ..) supports it yet. It would be a relatively difficult thing to retro-fit into an existing kernel design (without breaking corner-cases, etc).
JasonBond wrote:I suppose it is to prevent flushing some TLBs when we switch to a new CR3 and re-use the same TLB entries when switching back to a previous CR3 value. But the processor operation outlined in the intel's manual does not seem to support this. Exactly how does it benefit the performance?
Imagine the same CPU is rapidly switching between 5 different processes. In this case the performance benefit should be obvious - instead of blowing away all of a process' TLB entries every time you switch between processes, you don't (and should get a huge decrease in the number of TLB misses caused by task switching).

The problem is multi-CPU TLB invalidation, which can get expensive even without PCID (the more CPUs you have the worse it gets, in an exponential way). With PCID you can't assume that a CPU that is no longer running a process still doesn't have a TLB entry for that process; so PCID (if implemented in a simple/bad way) can make multi-CPU TLB invalidation overhead significantly worse.

To avoid making multi-CPU TLB invalidation overhead significantly worse you need something clever/complex; and it's this "clever/complex" that would make it hard to retro-fit into existing kernels that were never designed for it.
JasonBond wrote:Are there anything similar in AMD?
That depends what you mean by "similar". AMD's virtualisation has had "Address Space IDs" for a long time, but they can only be used for guests running inside VMs.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
JasonBond
Posts: 2
Joined: Sun Dec 27, 2015 6:23 am

Re: Process-Context Identifiers (PCIDs)

Post by JasonBond »

Also, the Intel manual says bit 0-11 of CR3 is used as the PCID. Does it somehow related to the usual process id user mode code see? If yes, does it mean it imposes a limit on the # of user processes (4096) allowed ?
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Process-Context Identifiers (PCIDs)

Post by Brendan »

Hi,
JasonBond wrote:Also, the Intel manual says bit 0-11 of CR3 is used as the PCID. Does it somehow related to the usual process id user mode code see? If yes, does it mean it imposes a limit on the # of user processes (4096) allowed ?
There's 3 alternatives:
  • Have a limit of 4095 processes, and use "PCID = OS process ID". This is probably fine for small systems (embedded?)
  • Have some sort of PCID recycling (e.g. so that only the 4095 most recently used processes have one of the CPU's PCIDs and the others don't), plus some way to determine "CPU's PCID" from "OS process ID" where PCIDs are global (same on all CPUs). This is probably fine for medium systems (e.g. typical "single 4-core chip").
  • Have some sort of PCID recycling, plus some way to determine "CPU's PCID on CPU #N" from "OS process ID" where the same process uses a different PCID on different CPUs. This might be the only sane option for large/huge/NUMA systems.

Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Owen
Member
Member
Posts: 1700
Joined: Fri Jun 13, 2008 3:21 pm
Location: Cambridge, United Kingdom
Contact:

Re: Process-Context Identifiers (PCIDs)

Post by Owen »

Linux, Windows and OS X (Well, XNU) supported "PCID"s (ASIDs) long before x86 did. While the x86 PCID extension is new, ASIDs are old hat to other architectures; for example, ARM has supported them for close to two decades. Linux implemented support for the PCID extension before Intel shipped it (this is of course usual - CPU vendors upstream feature support to Linux before they ship features)

The performance characteristics vary depending upon the architecture. For example, ARM architecture CPUs support broadcast TLB invalidate instructions and therefore the overhead of multi-core TLB invalidations is orders of magnitudes lower than for x86 where interrupts are required
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Process-Context Identifiers (PCIDs)

Post by Brendan »

Hi,
Owen wrote:Linux, Windows and OS X (Well, XNU) supported "PCID"s (ASIDs) long before x86 did. While the x86 PCID extension is new, ASIDs are old hat to other architectures; for example, ARM has supported them for close to two decades. Linux implemented support for the PCID extension before Intel shipped it (this is of course usual - CPU vendors upstream feature support to Linux before they ship features)
I haven't been able to find one single thing online that suggests Linux supports PCID on 80x86. The closest I found is emails on the Linux kernel mailing list (from April 2015) talking about maybe implementing support for it one day, where (as far as I can tell) they all forgot about it and didn't implement anything.

Apparently Intel did add support for it, but it didn't help performance and nobody ever saw the patch.
Owen wrote:The performance characteristics vary depending upon the architecture. For example, ARM architecture CPUs support broadcast TLB invalidate instructions and therefore the overhead of multi-core TLB invalidations is orders of magnitudes lower than for x86 where interrupts are required
I'd assume that making the TLB's cache coherent (e.g. check all TLB entries whenever any CPU does any write) would be a massive performance disaster (but after you've already got that massive performance disaster, there's no additional pain involved when adding support for ASIC/PCID).


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Owen
Member
Member
Posts: 1700
Joined: Fri Jun 13, 2008 3:21 pm
Location: Cambridge, United Kingdom
Contact:

Re: Process-Context Identifiers (PCIDs)

Post by Owen »

Brendan wrote:
Owen wrote:The performance characteristics vary depending upon the architecture. For example, ARM architecture CPUs support broadcast TLB invalidate instructions and therefore the overhead of multi-core TLB invalidations is orders of magnitudes lower than for x86 where interrupts are required
I'd assume that making the TLB's cache coherent (e.g. check all TLB entries whenever any CPU does any write) would be a massive performance disaster (but after you've already got that massive performance disaster, there's no additional pain involved when adding support for ASIC/PCID).


Cheers,

Brendan
TLBs aren't cache coherent, but the TLBI instruction has variants which broadcast a TLB invalidation to all CPUs in a cache coherency domain; for example, "TLBI ASIDE1IS" invalidates the TLB entry for the given ASID and VA on all CPUs in the same inner shareable domain (all cores running a given OS must be in the same inner shareable domain)
Post Reply