I have something to share about delays used in SMP initialization.
The delays in cpu manuals are too long-lasting, modern CPUs manage to perform the init much faster than old CPUs. Shortening the delays on modern CPUs makes startup faster (boot but also resume from ACPI sleep states). There is also another speed-up approach, that the bootstrap CPU does not initialize all application cpus but only few of them and these activated AP CPUs activate other AP CPUs (principle similar to avalanche or nuclear chain reaction or branching tree)
I'm hypervisor developer for the past cca 10 years, this is somewhat similar to OS development. First versions of hypervisor were loaded from running OS, later I developed loading before OS (using UEFI / BIOS). So hypervisor is loaded first (this includes using UEFI MP protocols to start at AP CPUs or old good way by sending INIT-SIPI if UEFI fails) and this always ran flawlessly. Then OS is loaded which early initializes AP CPUs again, this ran again flawlessly e.g. at Fedora 22 (so this my experience is few years old). But later when testing Fedora 25 the OS ended up running single CPU, application CPUs failed to activate and I saw reported error messages during OS startup like "smpboot: do_boot_cpu failed(-1) to wakeup CPU#1" (which was something like 20 second delay for each AP CPU).
So I compared kernels used in the 2 versions and this important thing changed:
apic_icr_write(APIC_INT_LEVELTRIG | APIC_INT_ASSERT | APIC_DM_INIT, phys_apicid);
mdelay(10);
apic_icr_write(APIC_INT_LEVELTRIG | APIC_DM_INIT, phys_apicid);
apic_icr_write(APIC_INT_LEVELTRIG | APIC_INT_ASSERT | APIC_DM_INIT, phys_apicid);
udelay(init_udelay);
apic_icr_write(APIC_INT_LEVELTRIG | APIC_DM_INIT, phys_apicid);
and also this important code:
/*
* The Multiprocessor Specification 1.4 (1997) example code suggests
* that there should be a 10ms delay between the BSP asserting INIT
* and de-asserting INIT, when starting a remote processor.
* But that slows boot and resume on modern processors, which include
* many cores and don't require that delay.
*
* Cmdline "init_cpu_udelay=" is available to over-ride this delay.
* Modern processor families are quirked to remove the delay entirely.
*/
#define UDELAY_10MS_DEFAULT 10000
static unsigned int init_udelay = UINT_MAX;
static int __init cpu_init_udelay(char *str)
{
get_option(&str, &init_udelay);
return 0;
}
early_param("cpu_init_udelay", cpu_init_udelay);
static void __init smp_quirk_init_udelay(void)
{
/* if cmdline changed it from default, leave it alone */
if (init_udelay != UINT_MAX)
return;
/* if modern processor, use no delay */
if (((boot_cpu_data.x86_vendor == X86_VENDOR_INTEL) && (boot_cpu_data.x86 == 6)) ||
((boot_cpu_data.x86_vendor == X86_VENDOR_AMD) && (boot_cpu_data.x86 >= 0xF))) {
init_udelay = 0;
return;
}
/* else, use legacy delay */
init_udelay = UDELAY_10MS_DEFAULT;
}
so the fixup was trivial - setting cpu_init_udelay=10 as a boot parameter
Even cpu_init_udelay=1 always worked at all testing PCs.
Setting cpu_init_udelay=0 again caused the legendary error message "smpboot: do_boot_cpu failed(-1) to wakeup CPU#1"
While cpu_init_udelay=0 ran flawlessly at baremetal, the CPU did not manage to process the INIT on time when running under virtualization as there was vm exit which slowed down finishing the initialization at AP CPUs so BSP CPU did not wait long enough and fired INIT deassert while the AP was still not finishing INIT assert. The INIT deassert was necessary at very old CPUs but still persists in CPU manuals and so in OS kernels.
So yes, there are possible improvements for modern CPUs which are much faster but improvements can cause some unexpected behavior in specific situations.
And also one old curiosity, which is not too much useful today, because it is suitable only for OS loaded in real mode by BIOS via MBR and is unsuitable for UEFI:
I saw an example where the SMP initialization was not done using INIT-SIPI, but using NMI. The INIT-SIPI is done by firmware in boot phase when counting CPUs and initializing them, excluding defective CPUs/cores or activating only CPUs enabled by user in setup menu (the info which is stored in CMOS or in NVRAM for UEFI with CSM). After this is done the BSP CPU puts AP CPUs usually into HLT state with interrupts disabled from which they could be woken up not only by INIT-SIPI but also by NMI.
So the activation of halted AP CPUs from real mode (that's why it is not suitable for UEFI) was by hooking interrupt 2 (#NMI) in realmode IDT and then sending #NMI from BSP CPU to AP CPUs using APIC ICR.