Hello. I've been working through https://github.com/s-matyukevich/raspberry-pi-os to learn bare metal programming on the Raspberry Pi and have run into a issue. One of the exercises for lesson 01 is to get your code running on all 4 cores, which I've done (using some help from exercise solutions), but I ran into an odd problem:
I ran the code and it worked fine with on all 4 cores and I get the correct output:
Code: Select all
❯ qemu-system-aarch64 -M raspi3b -kernel ./build/kernel8.elf -smp 4 -serial null -serial stdio
primary: 0
secondary: 1
secondary: 2
secondary: 3
I should note, that I'm passing
Code: Select all
-kernel kernel8.elf
I later went to make small changes and noticed that adding/removing opcodes (even nop or mov x0, x0) would cause the code to get caught in an infinite loop and never send output to the uart. I've done some digging, but am not sure where else to look. While adding/removing nops in my code base might be one way to work around this, it feels like I've done something wrong and figuring this out is a great learning opportunity.
The multi-core uart printing works with this code at the top of `_start`:
Code: Select all
.globl _start
_start:
nop
nop
mrs x0, mpidr_el1
Code: Select all
.globl _start
_start:
nop
nop
nop // Including this line will cause an exception
mrs x0, mpidr_el1
The README in /lesson01/ has the commands I'm using to build, run, and debug the code. boot.S has 3 `nop` codes beneath `_start`, with one being commented out. None of those nops are required, but I've been using them to test various conditions. Uncommenting the 3rd `nop` results in the unexpected (to me) behavior. It also doesn't matter where the nops are, and the failure occurs even when I add/remove opcodes throughout the file.
There are two cases:
1. Failure case (uncommenting //nop)
2. Success case (leaving //nop commented out)
I've done some digging with lldb and have found a few things:
1. In all cases, when lldb connects and the qemu execution is stopped, the cores are all waiting at `uart_send()`, which is the first line of code in the `.text` section, so it shouldn't surprise me that it starts there, but I expeced it to start at 0x00 where the `.text.boot` code is. The code in `.text.boot` does eventually get executed in the success case and breakpoints get hit in that code.
2. In the failure case, the cores get to the `ret` opcode at the bottom of `uart_init()` and jump to an unknown address (x30 is updated right before by `ldp x29, x30, [sp], #32`), which then causes an exception, and the next opcode executed is a `UDF` at 0x0000000000000000.
3. In the success case, the cores get to the `ret` opcode at the bottom of `uart_init()` and jump to an unknown address (for the same reason as before), but the next step is that the PC jumps to 0x200, which is back in the `uart_init()` code. I'm not sure why the behavior is different.
I've included a bunch of other debugging notes in boot.S to help me stay organized, so perhaps someone with more experience in this will be able to see the problem immediately.
At this point, I'm not quite sure how to move forward in tracking down how adding a nop instruction changes the behavior to this extent. My suspicions are that the solution isn't actually working properly even in the success case, and somehow adding in this extra cycle is making that apparent, but I'd love some help to figure out where to go from here.
Thanks!