AP Trampoline Crashing After Setting the PG Bit

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
jteerice
Posts: 1
Joined: Tue Dec 19, 2023 3:58 pm

AP Trampoline Crashing After Setting the PG Bit

Post by jteerice »

Hello everyone! This is my first time diving into SMP. I have the initialization and startup sequence working, and the APs jumping into the trampoline code. I am using jmp $ instructions as pseudo-breakpoints in gdb, but there seems to be a bug somewhere that prevents the code from executing past the instruction to set the PG bit.

While I can't say for certain that the APs are crashing, gdb indicates they are "running" but are locked at

Code: Select all

0x0000000000000000 in ?? ()
. I haven't been able to figure out definitively why this is occurring, but with my limited experience, I believe it is a page fault. It is especially puzzling because the PM14 being loaded into cr3 in the trampoline code is the same PM14 being used by the BSP. For context, I am copying relevant pointers to a location under 1MB with the following prep function:

Code: Select all

void prepare_core(uint64_t stack, uint64_t pm14, uint64_t ap_entry, uint64_t idt, uint64_t gdt) {
    uint64_t* copy_addr = (uint64_t*)(0x500 + KERNEL_VIRT_BASE_ADDR);
    copy_addr[0] = 0;
    copy_addr[1] = stack;
    copy_addr[2] = pm14;
    copy_addr[3] = ap_entry;
    copy_addr[4] = idt;
    copy_addr[5] = gdt; 
}
And here is the trampoline code. Execution seems to stop right after the instruction to set the PG bit.

Code: Select all

[bits 16]

global ap_trampoline

ap_trampoline:
    cli
    cld
    jmp 0x0:(0x1000 + init_32 - ap_trampoline)

gdt_start:
gdt_null:
    dd 0x0
    dd 0x0

gdt_code:
    dw 0xffff
    dw 0
    db 0
    db 0x9a
    db 11001111b
    db 0

gdt_data:
    dw 0xffff
    dw 0
    db 0
    db 0x92
    db 11001111b
    db 0
gdt_end:

gdt_ptr:
    dw (0x1000 + gdt_end - ap_trampoline) - (0x1000 + gdt_start - ap_trampoline) - 1
    dd (0x1000 + gdt_start - ap_trampoline)

init_32:
    mov word [0x500], 1
    mov eax, cr0
    or eax, 1
    mov cr0, eax
    lgdt [0x1000 + gdt_ptr - ap_trampoline]
    jmp 0x08:(0x1000 + setup_32 - ap_trampoline)

[bits 32]
setup_32:
    mov eax, dword [0x500 + 16] ; pm14
    mov cr3, eax

    mov eax, cr4
    or eax, 1 << 5              ; PAE bit
    mov cr4, eax

    mov ecx, 0xc0000080
    rdmsr
    or eax, 1 << 8              ; LM bit
    wrmsr
    mov eax, cr0
    or eax, 1 << 31             ; PG bit
    mov cr0, eax
    jmp $
    
    jmp 0x08:(0x1000 + setup_64 - ap_trampoline)

[bits 64]
setup_64:
    lgdt [0x500 + 40]
    lidt [0x500 + 32]
    mov ax, 0x10
    mov ds, ax
    mov es, ax
    mov ss, ax
    mov ax, 0x20
    mov gs, ax
    mov fs, ax

    mov rsp, qword [0x500 + 8]  ; Stack
    mov rax, qword [0x500 + 24] ; AP entry point

    jmp rax
Any insight would be greatly appreciated! Thank you!
nullplan
Member
Member
Posts: 1789
Joined: Wed Aug 30, 2017 8:24 am

Re: AP Trampoline Crashing After Setting the PG Bit

Post by nullplan »

So I see you load the GDT only after setting protected mode. That is pretty sus to me, as are those address calculations. Using fixed addresses limits your design such that you can only start one CPU at a time. I have a self-contained trampoline design, so I can start as many CPUs as there are free pages in low memory.

One thing I notice directly is that the PML4 address is a 64-bit number. However, with the legacy SMP startup method, you must ensure that the new CPU's PML4 is in the low 4GB, as you can initially only set 32 bits of CR3. Is your initial PML4 in the low 4GB? Because setting PG and then having all hell break loose is normally a sign that something didn't work out right in the page table. Is the trampoline identity mapped in the initial PML4? My SMP booting code creates a copy of the PML4 for use in the booting CPU, and sets up an identity map in it by setting its first entry equal to the 256th (because that is where I put the linear memory map). The AP startup code must then undo this hack and flush all global pages (which means clearing CR4.PGE and setting it again). Your code must also identity map the first page as that is where you've hidden all the data.

One more thing: Your 64-bit code is misaligning the stack. If the function pointer given as "ap_entry" is a C function, you are violating the ABI. But this can be easily fixed by replacing the jmp at the end with a call. Nobody expects that function to return, anyway, but you can put a ud2 as terminator afterwards if you're scared.
Why, Octo, if we wanted things to be easy, we wouldn't be at this hobby. Using Limine to do our dirty work will hardly teach us anything.
Carpe diem!
Post Reply