[AArch64 / Bare metal] Need help with CPU communication

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
dublevsky
Posts: 7
Joined: Wed Jun 06, 2018 5:45 am

[AArch64 / Bare metal] Need help with CPU communication

Post by dublevsky »

Been trying to solve this for a week, decided to reach out for help.

What I have is code running on bare metal (RPI Model B 3+).
I'm trying to initialize every CPU with general stuff and then wait for CPU0 to zero out the BSS (and stuff like MMU setup in the future).
After CPU0 initialized all the stuff it needed to it is supposed to release secondary CPUs and then every single CPU jumps into the kernel by calling kmain.
kmain containts very primitive waiting function (for now, just to check if other CPUs get there) and prints out every CPU's id.
The problem is only CPU0 gets to kmain.

start.S

Code: Select all

#include "asm/macros.h"
#include "arch/arch.h"
#include "board/spec.h"

.section .bss.stack
        .align 8
        .skip ARCH_STACK_SIZE * BOARD_NUM_CPUS
DATA(___stack_end)
    
.section .data
        .align 8
DATA(cpu_barrier)
        .long 1

.section .text
cpuid .req x9

FUNCTION(_start)
        // ----------------------------------------
        // Initialization to carry out on every CPU
        // ----------------------------------------

        // Find out which CPU we are running at
        mrs cpuid, mpidr_el1
        and cpuid, cpuid, #0xff

        // Set up the stack
        adr x0, ___stack_end
        ldr x1, =ARCH_STACK_SIZE
        mul x1, x1, cpuid
        sub sp, x0, x1

        // -----------------------------------
        // Initialization to carry out on CPU0
        // -----------------------------------
        cbnz cpuid, .Lwait_for_primary_cpu

        // Zero out the bss section
        // Note: relies on ___bss and ___bss_end being 16 byte aligned
        adr x0, ___bss
        adr x1, ___bss_end
        sub x1, x1, x0
        cbz x1, .Lbss_init_done
.Lbss_init_loop:
        stp xzr, xzr, [x0], #16
        sub x1, x1, #16
        cbnz x1, .Lbss_init_loop
.Lbss_init_done:

        // Release secondary cpus
        adr x0, cpu_barrier
        str xzr, [x0]
        b .Lwait_for_primary_cpu_done

        // Wait for primary cpu
.Lwait_for_primary_cpu:
        adr x0, cpu_barrier
.Lwait_for_primary_cpu_loop:
        ldr x1, [x0]
        cbnz x1, .Lwait_for_primary_cpu_loop
.Lwait_for_primary_cpu_done:

        // Jump into the kernel
.Lkernel_entry:
        mov x0, cpuid
        bl kmain
    
.Lhang:
        wfe
        b .Lhang
asm/macros.h

Code: Select all

#ifndef INCLUDE_ASM_MACROS_H
#define INCLUDE_ASM_MACROS_H
 
#define FUNCTION(x)             .global x; .type x, STT_FUNC; x:
#define DATA(x)                 .global x; .type x, STT_OBJECT; x:
    
#define LOCALFUNCTION(x)        .type x, STT_FUNC; x:
#define LOCALDATA(x)            .type x, STT_OBJECT; x:
    
#endif /*INCLUDE_ASM_MACROS_H*/
kmain.c

Code: Select all

#include <stdint.h>
#include "peripherals/mu.h"
    
static void wait(const uint64_t c)
{
        for (uint64_t i = 0; i < c; ++i) {
                __asm__ volatile("nop");
        }
}

void kmain(const uint64_t cpuid)
{
        if (cpuid) {
                wait(1000000 * cpuid);
        } else {
                mu_init(9600);
        }
        mu_putc((char)cpuid + '0');
}
link64.ld

Code: Select all

ENTRY(_start)

SECTIONS
{
	. = 0x80000;

	.text : {
		*(.text)
	}

	.rodata : {
		*(.rodata)
	}

	.data : {
		*(.data)
	}

	.bss : {
		. = ALIGN(8);
		*(.bss.stack)
		. = ALIGN(16);
		___bss = .;
		*(.bss)
		. = ALIGN(16);
		___bss_end = .;
	}
}
Would love some help, because I'm going crazy at this point.
Octocontrabass
Member
Member
Posts: 5586
Joined: Mon Mar 25, 2013 7:01 pm

Re: [AArch64 / Bare metal] Need help with CPU communication

Post by Octocontrabass »

How are you telling all of the CPUs to jump to _start?
User avatar
zaval
Member
Member
Posts: 659
Joined: Fri Feb 17, 2017 4:01 pm
Location: Ukraine, Bachmut
Contact:

Re: [AArch64 / Bare metal] Need help with CPU communication

Post by zaval »

I honestly haven't even touched multiprocessing stuff yet, so hardly I'd be helpful, but, seriously, looking at your code, I am wondering - why do you think secondary CPUs are even running? Where is it seen? They won't run just because your bootstrap cpu writes 0 into some variable, you need to wake them up first! And it all goes to the way it's done on RPi. With all that VC things... who knows. But I guess your secondary CPUs aren't running. Firmware starts on CPU0, your code takes control over on it too and that's all. No secondary CPUs on the scene. Learn more on secondary CPU bring up for RPi.
ANT - NT-like OS for x64 and arm64.
efify - UEFI for a couple of boards (mips and arm). suspended due to lost of all the target park boards (russians destroyed our town).
dublevsky
Posts: 7
Joined: Wed Jun 06, 2018 5:45 am

Re: [AArch64 / Bare metal] Need help with CPU communication

Post by dublevsky »

Octocontrabass wrote:How are you telling all of the CPUs to jump to _start?
zaval wrote:I honestly haven't even touched multiprocessing stuff yet, so hardly I'd be helpful, but, seriously, looking at your code, I am wondering - why do you think secondary CPUs are even running? Where is it seen? They won't run just because your bootstrap cpu writes 0 into some variable, you need to wake them up first! And it all goes to the way it's done on RPi. With all that VC things... who knows. But I guess your secondary CPUs aren't running. Firmware starts on CPU0, your code takes control over on it too and that's all. No secondary CPUs on the scene. Learn more on secondary CPU bring up for RPi.
RPI bootloader does the stuff it needs then every single core enters _start.
Octocontrabass
Member
Member
Posts: 5586
Joined: Mon Mar 25, 2013 7:01 pm

Re: [AArch64 / Bare metal] Need help with CPU communication

Post by Octocontrabass »

Which bootloader are you using that sends every CPU to _start? The official bootloaders only start one CPU and leave the others halted.
dublevsky
Posts: 7
Joined: Wed Jun 06, 2018 5:45 am

Re: [AArch64 / Bare metal] Need help with CPU communication

Post by dublevsky »

Octocontrabass wrote:Which bootloader are you using that sends every CPU to _start? The official bootloaders only start one CPU and leave the others halted.
I'm using the official RPi one. Pretty sure every single CPU is awake as this simple code works.
nullplan
Member
Member
Posts: 1801
Joined: Wed Aug 30, 2017 8:24 am

Re: [AArch64 / Bare metal] Need help with CPU communication

Post by nullplan »

Cache coherency problems? Is it possible the other cores never see the update to cpu_barrier? Do you need a barrier in that loop and at the point where you write the variable?
Carpe diem!
dublevsky
Posts: 7
Joined: Wed Jun 06, 2018 5:45 am

Re: [AArch64 / Bare metal] Need help with CPU communication

Post by dublevsky »

nullplan wrote:Cache coherency problems? Is it possible the other cores never see the update to cpu_barrier? Do you need a barrier in that loop and at the point where you write the variable?
Neither MMU nor I/D caches are enabled yet.
Octocontrabass
Member
Member
Posts: 5586
Joined: Mon Mar 25, 2013 7:01 pm

Re: [AArch64 / Bare metal] Need help with CPU communication

Post by Octocontrabass »

dublevsky wrote:I'm using the official RPi one. Pretty sure every single CPU is awake as this simple code works.
That code uses "kernel_old=1" in config.txt to bypass the boot stub.

You are not using "kernel_old=1" in your config.txt, so the firmware's default boot stub is running (or armstub8.bin from your SD card), and that boot stub is halting all but one of the CPUs.
dublevsky
Posts: 7
Joined: Wed Jun 06, 2018 5:45 am

Re: [AArch64 / Bare metal] Need help with CPU communication

Post by dublevsky »

Octocontrabass wrote:
dublevsky wrote:I'm using the official RPi one. Pretty sure every single CPU is awake as this simple code works.
That code uses "kernel_old=1" in config.txt to bypass the boot stub.

You are not using "kernel_old=1" in your config.txt, so the firmware's default boot stub is running (or armstub8.bin from your SD card), and that boot stub is halting all but one of the CPUs.
You have a point. BRB, checking this out.

EDIT: It's midnight for me, but I checked some resources and Octocontrabass's answer seems to be right. I will work on this tomorrow and will reply with a full solution if it works.
User avatar
bzt
Member
Member
Posts: 1584
Joined: Thu Oct 13, 2016 4:55 pm
Contact:

Re: [AArch64 / Bare metal] Need help with CPU communication

Post by bzt »

Hi,

Your code will be executed on all cores no matter what you do in config.txt. This is the case even if config.txt does not exists (recommended).

The memory cache is wired per core, but you have one RAM. Therefore if you change the memory from one core, you need to refresh the cache in other cores. To do that, either map the memory as non-cacheable, outter-shareable or implicitly use a data barrier (dsb).

Cheers,
bzt
dublevsky
Posts: 7
Joined: Wed Jun 06, 2018 5:45 am

Re: [AArch64 / Bare metal] Need help with CPU communication

Post by dublevsky »

bzt wrote:Hi,

Your code will be executed on all cores no matter what you do in config.txt. This is the case even if config.txt does not exists (recommended).

The memory cache is wired per core, but you have one RAM. Therefore if you change the memory from one core, you need to refresh the cache in other cores. To do that, either map the memory as non-cacheable, outter-shareable or implicitly use a data barrier (dsb).

Cheers,
bzt
Hi,

I/D caches are not enabled yet.

I'm currently working on Octocontrabass's answer. I revised this asnwer on /r/asm and decided to google for 'raspberry pi cpu-release-addr', which led me to Device Tree Blobs. After compiling bcm2710-rpi-3-b-plus.dtb back to .dts format and looking into it there's indeed a cpu-release-addr parameter for every cpu. I'm currently writing a quick and dirty mailbox interface implementation to check this, so no progress yet.

EDIT: ok, I checked the code with kernel_old=1 and disable_commandline_tags=1 in config.txt and it still doesn't work, so I'm gonna try using dsb and report back.
EDIT2: wrapping every single load and store into with 'dsb sy' didn't work either.
dublevsky
Posts: 7
Joined: Wed Jun 06, 2018 5:45 am

Re: [AArch64 / Bare metal] Need help with CPU communication

Post by dublevsky »

OK. After being a complete idiot for ~1 week I finally got it working.

Big thanks to Octocontrabass and /u/TNorthover.

Solution:
If you have no custom boot options in config.txt RPI bootloader will load your image at 0x8000 for kernel7.img (32-bit kernel) or 0x80000 for kernel8.img (64-bit kernel).
The stubs that are used for loading in that case are armstub7.S and armstub8.S. As I'm writing a 64-bit kernel for AArch64 I looked into the process of booting in armstub8.S.

After some minimal CPU initialization the bootloader loads Device Tree Blob (Flattened Device Tree) address to x0 and kernel entry address (_start in my case) to x4 for CPU0 and CPU0 jumps to the specified address.
CPU[1:3], on the other hand, load x4 with their respective barrier's address and sit in a loop, which consists of 2 steps: Waiting For Event (WFEing), then checking x4 for a non-zero value.

x4 = x5 + (x6 << 3), where
x5 = spin_cpu0 address - basically a base address for cpu 'barriers'. Equals to 0xd8
x6 = cpu id

So by writing value '0x80000' or '&_start' to 0xe0, 0xe8 and 0xf0 and then Sending an EVent (SEVing) from CPU0 CPU[1:3] wakes up and jumpts to _start.
Attachments
Screenshot from 2019-01-19 16-56-01.png
Screenshot from 2019-01-19 16-56-01.png (13.48 KiB) Viewed 4563 times
Post Reply