As far I know there are several methods:
- Provide different page table mappings for different cores and store different values in the same virtual address, then read those values. This should be fast, but I've read that with HyperThreading, the 2 threads of the same core share the same page tables, so it wouldn't work in that case.
- CPUID eax=1 gives in ebx the APIC ID. But I've read that the CPUID instruction is quite slow, could spend 100 cycles?
- Read from the APIC tables: APIC_BASE (usually 0xFEE00000) + 0x20 which is the APIC ID Register, and should return the same value as CPUID. For this to work, all cores should share the same APIC_BASE address (as obtained from the corresponding bits of rdmsr(0x01B)). Is that guaranteed? Is there much latency when reading from that memory-mapped area?
- The RDTSCP instruction that also loads IA32_TSC_AUX into ecx (that value could be used to store a per-cpu value).
- Some value stored in the GDT.
- Some other processor specific register that can be quickly checked.