using Process Context Identifiers on x86-64

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
xmm15
Member
Member
Posts: 27
Joined: Mon Dec 16, 2013 6:50 pm

using Process Context Identifiers on x86-64

Post by xmm15 »

Hi,

I'm trying to implement the PCID feature in my OS. My CPU does support PCID but doesn't support INVPCID. In my mind, that doesn't make any sense. How would that work exactly? My initial thought was that, when a process is created, I should invalidate all TLB entries associated with its PCID in case the PCID was in use by another, now dead, process.

Am I misinterpretting something? Is it really possible to have support for PCID and not INVPCID?

But what am I supposed to do with no such instruction? Flush the entire TLB? Is this really the way to do it?

Thank you.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: using Process Context Identifiers on x86-64

Post by Brendan »

Hi,
xmm15 wrote:I'm trying to implement the PCID feature in my OS. My CPU does support PCID but doesn't support INVPCID. In my mind, that doesn't make any sense. How would that work exactly? My initial thought was that, when a process is created, I should invalidate all TLB entries associated with its PCID in case the PCID was in use by another, now dead, process.

Am I misinterpretting something? Is it really possible to have support for PCID and not INVPCID?
While it doesn't make that much sense to me either; Intel gave PCID and INVPCID different feature flags for a reason.
xmm15 wrote:But what am I supposed to do with no such instruction? Flush the entire TLB? Is this really the way to do it?
Fortunately...
Intel wrote:INVLPG. This instruction takes a single operand, which is a linear address. The instruction invalidates any TLB entries that are for a page number corresponding to the linear address and that are associated with the current PCID. It also invalidates any global TLB entries with that page number, regardless of PCID (see Section 4.10.2.4).1 INVLPG also invalidates all entries in all paging-structure caches associated with the current PCID, regardless of the linear addresses to which they correspond.
Also...
Intel wrote:MOV to CR3. The behavior of the instruction depends on the value of CR4.PCIDE:
  • If CR4.PCIDE = 0, the instruction invalidates all TLB entries associated with PCID 000H except those for global pages. It also invalidates all entries in all paging-structure caches associated with PCID 000H.
  • If CR4.PCIDE = 1 and bit 63 of the instruction’s source operand is 0, the instruction invalidates all TLB entries associated with the PCID specified in bits 11:0 of the instruction’s source operand except those for global pages. It also invalidates all entries in all paging-structure caches associated with that PCID. It is not required to invalidate entries in the TLBs and paging-structure caches that are associated with other PCIDs.
  • If CR4.PCIDE = 1 and bit 63 of the instruction’s source operand is 1, the instruction is not required to invalidate any TLB entries or entries in paging-structure caches.
This means you've basically got 3 choices:
  • Modify CR4 (e.g. enabled then disable either global pages or PCID) and invalidate everything for all PCIDs (including TLBs for global pages)
  • Use INVLPG to invalidate one TLB entry for the current PCID and all TLB entries that aren't global for all other PCIDs
  • Reload CR3 (with bit 63 set and not clear) to invalidate all TLB entries (except for global pages) for the PCID being loaded (and not other PCIDs)
Of course the INVPCID instruction gives you even more options, which would be better for performance (if it was supported).


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
xmm15
Member
Member
Posts: 27
Joined: Mon Dec 16, 2013 6:50 pm

Re: using Process Context Identifiers on x86-64

Post by xmm15 »

Hmmm, I missed the part about bit 63 in cr3. That should work just fine. So basically, a invpcid could be emulated with:

mov $TARGET_PCID | (1<<63),%rbx
mov %cr3,%rax
cli
mov %rbx,%cr3
mov %rax,%cr3
sti

Assuming that the current executed code resides in a global page.
I could probably emulate that in the #UD handler.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: using Process Context Identifiers on x86-64

Post by Brendan »

Hi,
xmm15 wrote:Hmmm, I missed the part about bit 63 in cr3. That should work just fine. So basically, a invpcid could be emulated with:

mov $TARGET_PCID | (1<<63),%rbx
mov %cr3,%rax
cli
mov %rbx,%cr3
mov %rax,%cr3
sti

Assuming that the current executed code resides in a global page.
I could probably emulate that in the #UD handler.
Don't forget that INVPCID has 4 different types:
  • Type 0: Invalidate one address for one PCID (unless that address is in a global page). Can't be emulated exactly. Best case might be INVLPG (which wipes non-global TLBs for all other PCIDs).
    Type 1: Invalidate all TLB entries for one PCID (except for global pages). Could be emulated by reloading CR3 with bit 64 set.
    Type 2: Invalidate all TLB entries for all PCIDs (including global pages). Could be emulated by modifying CR4.
    Type 3: Invalidate all TLB entries for all PCIDs (except global pages). Can't be emulated exactly. Best case would be to reload CR3 with bit 64 set to switch to a temporary virtual address space (and wipe all non-global TLBs for all PCIDs except one), then switch back to the original virtual address space in the same way (to wipe non-global TLBs that weren't already wiped).
Note that for some of these (type = 0 and type = 1) you need to switch virtual address spaces whenever the current PCID isn't the one in the "INVPCID Descriptor" (e.g. load CR3 with bit 64 clear) and then switch back after.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
xmm15
Member
Member
Posts: 27
Joined: Mon Dec 16, 2013 6:50 pm

Re: using Process Context Identifiers on x86-64

Post by xmm15 »

Yes, thank you. I only need one mode for now so I'll keep it simple.

Thanks
xmm15
Member
Member
Posts: 27
Joined: Mon Dec 16, 2013 6:50 pm

Re: using Process Context Identifiers on x86-64

Post by xmm15 »

For those interested, I documented my experience with PCID here: http://www.dumaisnet.ca/index.php?artic ... 6b3fbe37e7
wangt13
Posts: 20
Joined: Fri Nov 17, 2017 7:02 am

Re: using Process Context Identifiers on x86-64

Post by wangt13 »

xmm15 wrote:Hmmm, I missed the part about bit 63 in cr3. That should work just fine. So basically, a invpcid could be emulated with:

mov $TARGET_PCID | (1<<63),%rbx
mov %cr3,%rax
cli
mov %rbx,%cr3
mov %rax,%cr3
sti

Assuming that the current executed code resides in a global page.
I could probably emulate that in the #UD handler.
Sorry for not getting your and Brendan's following points (neither by reading Intel's SDM).
"If CR4.PCIDE = 1 and bit 63 of the instruction’s source operand is 0, the instruction invalidates all TLB entries associated with the PCID specified in bits 11:0 of the instruction’s source operand except those for global pages. It also invalidates all entries in all paging-structure caches associated with that PCID. It is not required to invalidate entries in the TLBs and paging-structure caches that are associated with other PCIDs.
If CR4.PCIDE = 1 and bit 63 of the instruction’s source operand is 1, the instruction is not required to invalidate any TLB entries or entries in paging-structure caches."

My understanding is "If CR4.PCIDE=1, and bit 63 of CR3 is 0 (not 1), it will invalidate all TLB entries associated with the PCID specified in bits 11:0."

If my understanding is correct, above code should be changed to not to set bit 63 of CR3 to invalidate the TLB entries of the switched-out process.

Thanks,
-Tao
Post Reply