Adding SMP support

CristiV · Post by **CristiV** » Fri May 14, 2021 11:10 am

Hello!
I'm currently trying to implement the SMP architecture in an Operating System, with the x86_64 architecture. I managed to find and parse the ACPI tables, I wrote the wake up sequence and the trampoline code, but the AP's do not wake up. i should mention that the OS is loaded by Multiboot, and it runs on KVM.

I use the APIC in the x2APIC mode. When I enable the x2APIC mode, the EN bit in the Spurious Interrupt Register is already set, as seen in the debug messages.

This is the wake up sequence, with the values of the ICR. As I've read, in the x2APIC mode, I don't need to check the Delivery Bit in ICR, but I do, for debug purposes.

Going with gdb, I see that the processors are not even entering the trampoline code, but are in the state where the booting process puts them

Also, one weird thing: when I send an INIT IPI with the Destination Shorthand 11, "All Excluding Self", the bootstrapping processor reboots.

What am I missing?

Octocontrabass · Post by **Octocontrabass** » Sat May 15, 2021 12:33 am

Please don't post pictures of code. They're very difficult to work with.

CristiV wrote:This is the wake up sequence, with the values of the ICR. As I've read, in the x2APIC mode, I don't need to check the Delivery Bit in ICR, but I do, for debug purposes.

There's no delivery status bit in the ICR in x2APIC mode.

Don't read the ICR at all, just write zeroes to the reserved bits.

In x2APIC mode, the destination field is 32 bits, not 8 bits. Your IPIs are probably never reaching the target processors in the first place. Also, the destination is a logical x2APIC ID, which is not the same as the local x2APIC ID. The Intel SDM volume 3A section 10.12.10.2 explains how to find the logical x2APIC ID if you only have the local x2APIC ID.

In all modes, the vector field must be 0 for an INIT IPI.

CristiV wrote:Also, one weird thing: when I send an INIT IPI with the Destination Shorthand 11, "All Excluding Self", the bootstrapping processor reboots.

Actual OSes never do this, since it might affect APs that were disabled by firmware, so I'd guess KVM doesn't support it.

CristiV · Post by **CristiV** » Sat May 15, 2021 7:22 am

I calculated the Logical Address, using the Local APIC ID, because the MADT table doesn't have x2APIC entries. The ID for the BSP is the same as the one in LDR. After the INIT - SIPI sequence, the OS restarts.

[ 0.248968] dbg: [libkvmplat] <smp.c @ 88> eax: 0x4d00, edx: 0x2
[ 0.254468] dbg: [libkvmplat] <smp.c @ 94> eax: 0xd00, edx: 0x2
[ 0.277321] dbg: [libkvmplat] <smp.c @ 109> eax: 0x4e08, edx: 0x2

I enabled 2 cores through kvm

Ethin · Post by **Ethin** » Sat May 15, 2021 1:26 pm

Could you post your code here if possible?

Octocontrabass · Post by **Octocontrabass** » Sun May 16, 2021 12:43 am

CristiV wrote:I calculated the Logical Address, using the Local APIC ID, because the MADT table doesn't have x2APIC entries.

Even if it did have x2APIC entries, the MADT still only gives you local APIC IDs and not logical APIC IDs. The x2APIC entries are just for local APIC IDs that don't fit in 8 bits.

I realize now looking at the spec again that the x2APIC still supports physical destinations, so you don't actually have to use logical x2APIC IDs if you don't want to - physical ones should still work fine.

CristiV wrote:After the INIT - SIPI sequence, the OS restarts.

A triple fault on the AP could cause this.

CristiV wrote:[ 0.248968] dbg: [libkvmplat] <smp.c @ 88> eax: 0x4d00, edx: 0x2
[ 0.254468] dbg: [libkvmplat] <smp.c @ 94> eax: 0xd00, edx: 0x2

INIT IPIs should have the trigger mode set to level. I'm sure it works fine either way, but the MP spec says to use level and not edge.

CristiV · Post by **CristiV** » Sun May 16, 2021 3:07 am

Ethin wrote:Could you post your code here if possible?

The whole code can be found in my git repo, if you want to test it yourself:
https://github.com/cristian-vijelie/unikraft/tree/smp

To run it: make menuconfig -> Platform Configuration -> KVM guest, Platform Interface Options -> SMP support
Then, to enable debug messages: in the main config menu: Library Configuration -> ukdebug -> Enable debug messages, Kernel message level -> Show all types of messages
At last, to run it:
make && sudo qemu-system-x86_64 -smp 2 -enable-kvm -m 128 -cpu host -kernel build/unikraft_kvm-x86_64 -serial stdio

To use gdb: sudo qemu-system-x86_64 -s -S -smp 2 -enable-kvm -m 128 -cpu host -kernel build/unikraft_kvm-x86_64 -serial stdio
and, in another terminal: gdb --eval-command="target remote :1234" ./build/unikraft_kvm-x86_64.dbg

I'll also post parts of the code:
Defines:

Code: Select all

#define IA32_APIC_BASE 0x1b
#define x2APIC_BASE 0x800
#define x2APIC_SPUR 0x80F
#define x2APIC_ESR 0x828
#define x2APIC_ICR 0x830

#define x2APIC_BASE_EXTD 10
#define x2APIC_BASE_EN 11
#define x2APIC_CPUID_BIT 21
#define x2APIC_SPUR_EN 8

#define x2APIC_ICR_DMODE_SMI 0x200
#define x2APIC_ICR_DMODE_NMI 0x400
#define x2APIC_ICR_DMODE_INIT 0x500
#define x2APIC_ICR_DMODE_SUP 0x600

#define x2APIC_ICR_DESTMODE_LOGICAL 0x800
#define x2APIC_ICR_LEVEL_ASSERT     0x4000
#define x2APIC_ICR_TRIGGER_LEVEL    0x8000

#define x2apic_logical_dest(x) ((((x) & 0xfff0) << 16) | (1 << ((x) & 0x000f)))

The function that enables the APs:

Code: Select all

void enable_cores(__u8 numcores)
{
	__u8 bspid, ret;
	int i, j;
	__u32 ecx, eax, edx;

	bspid = ukplat_lcpu_id();
	uk_pr_info("Bootstrapping processor has the ID %d\n", bspid);

	if (numcores > smp_numcores) {
		uk_pr_info("Too many cores have been selected to be enabled. "
			   "Truncating to %d!\n",
			   smp_numcores);
		numcores = smp_numcores;
	}

	memcpy((void *)0x8000, &_lcpu_start16, 4096);
	uk_pr_info("Copied AP boot code to 0x8000\n");

	uk_pr_debug("Computed logical ID for core 0: %d\n",
		    ((lapic_ids[0] & 0xff00) << 16)
			| (1 << (lapic_ids[0] & 0x00ff)));
	rdmsr(0x80D, &eax, &edx);
	uk_pr_debug("Logical ID from LDR: %d\n", eax);

	for (i = 0; i < numcores; i++) {
		if (i == bspid)
			continue;

		/* clear APIC errors */
		wrmsr(x2APIC_ESR, 0, 0);

		/* select AP and trigger INIT IPI */
		eax = x2APIC_ICR_LEVEL_ASSERT | x2APIC_ICR_DESTMODE_LOGICAL
		      | x2APIC_ICR_DMODE_INIT;
		edx = x2apic_logical_dest(lapic_ids[i]);
		uk_pr_debug("eax: 0x%x, edx: 0x%x\n", eax, edx);
		wrmsr(x2APIC_ICR, eax, edx);

		/* deassert */
		eax = x2APIC_ICR_DESTMODE_LOGICAL | x2APIC_ICR_DMODE_INIT;
		edx = x2apic_logical_dest(lapic_ids[i]);
		uk_pr_debug("eax: 0x%x, edx: 0x%x\n", eax, edx);
		wrmsr(x2APIC_ICR, eax, edx);

		/* wait 10 msec */
		mdelay(10);

		for (j = 0; j < 2; j++) {
			/* clear APIC errors */
			wrmsr(x2APIC_ESR, 0, 0);

			/* select AP and trigger STARTUP IPI for 0x8000 */
			eax = x2APIC_ICR_TRIGGER_LEVEL | x2APIC_ICR_LEVEL_ASSERT
			      | x2APIC_ICR_DESTMODE_LOGICAL
			      | x2APIC_ICR_DMODE_SUP | 0x08;
			edx = x2apic_logical_dest(lapic_ids[i]);
			uk_pr_debug("eax: 0x%x, edx: 0x%x\n", eax, edx);
			wrmsr(x2APIC_ICR, eax, edx);

			/* wait 200 usec */
			udelay(200);
		}

		mdelay(10);
	}

	bspdone = 1;
}

Initialization:

Code: Select all

__u8 smp_init()
{
	__u8 ret;
	__u32 eax, edx;

	ret = enable_x2apic();
	if (ret) {
		uk_pr_err("x2APIC could not be enabled!\n");
		return -1;
	}

	rdmsr(x2APIC_SPUR, &eax, &edx);
	uk_pr_debug(
	    "Spurious Interrupt Register has the values %x; EN bit: %d\n", eax,
	    (eax & (1 << x2APIC_SPUR_EN)) != 0);

	if ((eax & (1 << x2APIC_SPUR_EN)) == 0) {
		eax |= (1 << x2APIC_SPUR_EN);
		wrmsr(x2APIC_SPUR, eax, edx);
		uk_pr_debug("Spurious interrupt enabled\n");
	}

	find_madt();
	if (madt == NULL)
		return -1;

	get_lapicid();

	return 0;
}

Enabling the x2APIC mode:

Code: Select all

static __u8 enable_x2apic(void)
{
	__u32 eax, edx, ecx;

	__asm__ __volatile__("mov $1, %%eax; cpuid;" : "=c"(ecx) : :);
	if (ecx & (1 << x2APIC_CPUID_BIT))
		uk_pr_debug("x2APIC is supported; enabling\n");
	else {
		uk_pr_info("x2APIC is not supported\n");
		return 1;
	}

	rdmsr(IA32_APIC_BASE, &eax, &edx);
	uk_pr_debug(
	    "IA32_APIC_BASE has the value %x; EN bit: %d, EXTD bit: %d\n", eax,
	    (eax & (1 << x2APIC_BASE_EN)) != 0,
	    (eax & (1 << x2APIC_BASE_EXTD)) != 0);

	/* set the x2APIC enable bit */
	eax |= (1 << x2APIC_BASE_EXTD);
	wrmsr(IA32_APIC_BASE, eax, edx);
	uk_pr_info("x2APIC is enabled\n");

	return 0;
}

The trampoline code, which doesn't seem to be reached:

Code: Select all

#define ENTRY(x) .globl x; .type x,%function; x:
#define END(x)   .size x, . - x

.code16
ENTRY(_lcpu_start16)
r_base = .
    cli
    cld
	wbinvd
    mov 	%cs, %ax
    mov    	%ax, %ds
	mov 	%ax, %es
	mov 	%ax, %ss

	movw	$(trampoline_stack_end - r_base), %sp

    movl    %cr0, %eax
    orl     $1, %eax
    movl    %eax, %cr0
    ljmpl 	*(_lcpu_start32_vector - r_base)
END(_lcpu_start16)

.code32
.align 32
ENTRY(_lcpu_start32)
    cld

	/* 1: enable pae */
	movl    %cr4, %eax
	orl     $X86_CR4_PAE, %eax
	movl    %eax, %cr4

	/* 2: enable long mode */
	movl    $0xc0000080, %ecx
	rdmsr
	orl     $X86_EFER_LME, %eax
	orl     $X86_EFER_NXE, %eax
	wrmsr

	/* 3: load pml4 pointer */
	movl $cpu_pml4, %eax
	movl %eax, %cr3

	/* 4: enable paging */
	movl    %cr0, %eax
	orl     $X86_CR0_PG, %eax
	movl    %eax, %cr0

	jmp     _lcpu_start64

	/* NOTREACHED */
haltme2:
	cli
	hlt
	jmp     haltme2
END(_lcpu_start32)

.align 64
gdt64:
	.quad 0x0000000000000000
gdt64_cs:
	.quad GDT_DESC_CODE_VAL		/* 64bit CS		*/
gdt64_ds:
	.quad GDT_DESC_DATA_VAL		/* DS			*/
	.quad 0x0000000000000000	/* TSS part 1 (via C)	*/
	.quad 0x0000000000000000	/* TSS part 2 (via C)	*/
gdt64_end:
.align 64

.type gdt64_ptr, @object
gdt64_ptr:
	.word gdt64_end-gdt64-1
	.quad gdt64

.type mxcsr_ptr, @object
mxcsr_ptr:
	.long 0x1f80			/* Intel SDM power-on default */

#include "pagetable.S"

.code64
.align 32
ENTRY(_lcpu_start64)
	lgdt (gdt64_ptr)
	/* let lret jump just one instruction ahead, but set %cs
	 * to the correect GDT entry while doing that.
	 */
	pushq $(gdt64_cs-gdt64)
	pushq $1f
	lretq

1:
	/* Set up the remaining segment registers */
	movq $(gdt64_ds-gdt64), %rax
	movq %rax, %ds
	movq %rax, %es
	movq %rax, %ss
	xorq %rax, %rax
	movq %rax, %fs
	movq %rax, %gs

    /* spinlock, wait for the BSP to finish */
spin:  
    pause
    cmpb    $0, bspdone
    jz      spin
	lock    incb smp_aprunning

	movq 	$_lcpu_entry_default, %rax
    jmp    	*%rax
END(_lcpu_start64)

.align 32
_lcpu_start32_vector:
	.long	_lcpu_start32 - r_base
	.word	8, 0

.align 32
_lcpu_start64_vector:
	.long	_lcpu_start64 - r_base
	.word	16, 0

trampoline_stack:
	.space 0x1000
trampoline_stack_end:

CristiV · Post by **CristiV** » Sun May 16, 2021 4:50 am

Octocontrabass wrote:A triple fault on the AP could cause this.

Shouldn't kvm send me an error, or anything, if this happened?

Ethin · Post by **Ethin** » Sun May 16, 2021 12:24 pm

CristiV wrote:
Octocontrabass wrote:A triple fault on the AP could cause this.
Shouldn't kvm send me an error, or anything, if this happened?

If I remember right, no, it doesn't. I'd recommend you set up an IDT before you do SMP initialization so you can figure out the problem. Until you do, figuring this out is going to be painful. If you set up an IDT you'll at least be able to, hopefully, figure out the problem just based on the fired interrupt.

Octocontrabass · Post by **Octocontrabass** » Sun May 16, 2021 10:02 pm

CristiV wrote:The trampoline code, which doesn't seem to be reached:

If it's reached, it will probably triple fault because it switches to protected mode without setting the GDTR.

CristiV wrote:Shouldn't kvm send me an error, or anything, if this happened?

I would expect it to do the same thing real hardware will do, unless you specifically configure otherwise. Real hardware will usually reboot if any CPU triple faults.

Ethin wrote:I'd recommend you set up an IDT before you do SMP initialization so you can figure out the problem.

An IDT on the BSP won't help you when it's an AP triple faulting.

You might try disabling KVM and using "-d int" and "-no-reboot" to see if it's really a triple fault. (Unfortunately, it seems "-d int" isn't reliable with KVM.)

Ethin · Post by **Ethin** » Sun May 16, 2021 11:56 pm

Octocontrabass wrote:
CristiV wrote:The trampoline code, which doesn't seem to be reached:
If it's reached, it will probably triple fault because it switches to protected mode without setting the GDTR.

CristiV wrote:Shouldn't kvm send me an error, or anything, if this happened?
I would expect it to do the same thing real hardware will do, unless you specifically configure otherwise. Real hardware will usually reboot if any CPU triple faults.

Ethin wrote:I'd recommend you set up an IDT before you do SMP initialization so you can figure out the problem.
An IDT on the BSP won't help you when it's an AP triple faulting.

You might try disabling KVM and using "-d int" and "-no-reboot" to see if it's really a triple fault. (Unfortunately, it seems "-d int" isn't reliable with KVM.)

I should've clarified that I meant an IDT on each AP. As well as a GDT.

CristiV · Post by **CristiV** » Mon May 17, 2021 12:03 am

Octocontrabass wrote: You might try disabling KVM and using "-d int" and "-no-reboot" to see if it's really a triple fault. (Unfortunately, it seems "-d int" isn't reliable with KVM.)

I was afraid of this. Without the "-cpu host" option, which cannot exist without KVM, I have to use the xAPIC mode, and I must mess with the page table. I'll update you when I manage to do this.

CristiV · Post by **CristiV** » Mon May 17, 2021 1:48 am

I've done some more digging, and I found out that the AP starts, and right after the Startup IPI, it starts executing code at address 0x1, even though the Vector field specifies the address 0x8000. Any idea why this happens?

gdb-peda$ info threads
Id Target Id Frame
* 1 Thread 1.1 (CPU#0 [running]) rdmsr (hi=<synthetic pointer>, lo=<synthetic pointer>, msr=0x828)
at cpu.h:175
2 Thread 1.2 (CPU#1 [running]) 0x0000000000000019 in ?? ()

Octocontrabass · Post by **Octocontrabass** » Mon May 17, 2021 6:48 am

Are you sure the problem isn't GDB? Last I checked, it assumes the CS base is always 0, so it displays nonsense when the CPU is in real mode with CS set to any nonzero value.

CristiV · Post by **CristiV** » Mon May 17, 2021 1:14 pm

Octocontrabass wrote:Are you sure the problem isn't GDB? Last I checked, it assumes the CS base is always 0, so it displays nonsense when the CPU is in real mode with CS set to any nonzero value.

I didn't know this.

Well, I've loaded a GDT, using the example on this forum. Not an IDT, yet. But it still breaks somewhere.

Code: Select all

.section .text
.code16
ENTRY(_lcpu_start16)
r_base = .
    cli
    cld
    ljmp    $0, $0x8040
.align 16
_L8010_GDT_table:
    .long 0, 0
    .long 0x0000FFFF, 0x00CF9A00    /* flat code */
    .long 0x0000FFFF, 0x008F9200    /* flat data */
    .long 0x00000068, 0x00CF8900    /* tss */
_L8030_GDT_value:
    .word _L8030_GDT_value - _L8010_GDT_table - 1
    .long 0x8010
    .long 0, 0
    .align 64
_L8040:
    xorw    %ax, %ax
    movw    %ax, %ds
    lgdtl   0x8030
	movw	$(trampoline_stack_end - r_base), %sp
    movl    %cr0, %eax
    orl     $1, %eax
    movl    %eax, %cr0
    ljmp    *(_lcpu_start32_vector - r_base)
END(_lcpu_start16)

Ethin · Post by **Ethin** » Mon May 17, 2021 2:59 pm

What vector are you sending? Remember that the vector of the SIPI determines where the processor begins initialization. The vector is 000VV000H, where VV is the initialization vector. So if you didn't send a vector or specified 0 for it, you'd be starting at 00000000H. That code can jump to your actual init code if you want it to. (Section 8.4 of the Intel SDMs provides more info on MP init.)

OSDev.org

Adding SMP support

Adding SMP support

Re: Adding SMP support

Re: Adding SMP support

Re: Adding SMP support

Re: Adding SMP support

Re: Adding SMP support

Re: Adding SMP support

Re: Adding SMP support

Re: Adding SMP support

Re: Adding SMP support

Re: Adding SMP support

Re: Adding SMP support

Re: Adding SMP support

Re: Adding SMP support

Re: Adding SMP support