Page 1 of 1

Weird AARCH64 behaviours

Posted: Mon Oct 24, 2022 7:09 pm
by cris10293
So I've been giving a try to develop a kernel myself to refresh OS concepts that I've seen back in school. Plus I like system programming :)
Side note: I gave it a try in the x86 architecture, but getting the interrupts working has been hellish, so I moved on to arm.

I wrote a minimalist hello world kernel for AARCH64 and it works fine. However, when I try to implement variable argument for a custom printf using va_list, things start misbehaving.
* If I compile my code with -O2, it works.
* If I compile with -O0, it doesn't.

Here is what I've been working with: https://gist.github.com/cmaruan/de60bf7 ... f42235f9fb

I don't understand ARM assembly that well to make sense of that the compiler is generating that makes it work when optimisation is enabled.

I'd appreciate help here. Thanks!

Re: Weird AARCH64 behaviours

Posted: Wed Oct 26, 2022 12:00 am
by Octocontrabass
It's hard to say exactly what's going on without stepping through it in a debugger, but I did spot one problem: your stack may not be aligned correctly. The AArch64 System V psABI requires the stack pointer to be 16-byte aligned at all times.

Re: Weird AARCH64 behaviours

Posted: Wed Oct 26, 2022 12:22 am
by klange
Octocontrabass wrote:It's hard to say exactly what's going on without stepping through it in a debugger, but I did spot one problem: your stack may not be aligned correctly. The AArch64 System V psABI requires the stack pointer to be 16-byte aligned at all times.
Yeah, it does look like the stack is getting only 8-byte alignment out of its configuration in the linker script, so there's a coin toss on whether that ends up 16-byte aligned, depending on the size of the data before it.

Re: Weird AARCH64 behaviours

Posted: Wed Oct 26, 2022 7:42 am
by cris10293
So, aligning to 0x10 or 16 didn't change the behavior.

Here is my output:

Code: Select all

$ make qemu
aarch64-elf-gcc -O0 -g -ffreestanding -c kernel.c -o kernel.o
aarch64-elf-as boot.s -o boot.o
aarch64-elf-ld -nostdlib -Tlinker.ld boot.o kernel.o -o kernel.elf
aarch64-elf-ld: warning: kernel.elf has a LOAD segment with RWX permissions
qemu-system-aarch64 -cpu cortex-a53 -kernel kernel.elf -M virt -nographic
Result: make: *** [qemu] Killed: 9
Nothing is printed after "Result: " except the message from my "kill -9" command

I have updated the gist repo to reflect my changes and the output above

Re: Weird AARCH64 behaviours

Posted: Wed Oct 26, 2022 9:16 am
by kzinti
Have you tried printing something using putchar() directly instead of print? Maybe the problem isn't with print(). Could it be that the access to the UART is being optimized out? I know you are using volatile...

One thing that strikes be as odd is that you are not waiting for the UART to be ready before writing characters to it. Normally you have to poll some status bit that tells you the UART is done with the last character.

Re: Weird AARCH64 behaviours

Posted: Wed Oct 26, 2022 1:51 pm
by cris10293
kzinti wrote:Have you tried printing something using putchar() directly instead of print? Maybe the problem isn't with print(). Could it be that the access to the UART is being optimized out? I know you are using volatile...

One thing that strikes be as odd is that you are not waiting for the UART to be ready before writing characters to it. Normally you have to poll some status bit that tells you the UART is done with the last character.

I have added two files to the gist: the disassemble for kernel.c with -O0 and -O2
The printf version with -O0 has lots of str to push data onto the stack whilst -O2 does not as displayed here https://gist.github.com/cmaruan/de60bf7 ... sm-L53-L73

Maybe that is the issue?

Re: Weird AARCH64 behaviours

Posted: Wed Oct 26, 2022 2:29 pm
by Octocontrabass
It could be, if all four cores are trying to run the same code.

Try adjusting your startup code to park three of the four cores:

Code: Select all

_start:
    mrs x30, mpidr_el1
    and x30, x30, #3
    cbz x30, 1f
0:  wfe
    b 0b
1:  ldr x30, =stack_top
    mov sp, x30
    bl kmain
    b 0b

Re: Weird AARCH64 behaviours

Posted: Wed Oct 26, 2022 3:44 pm
by cris10293
Octocontrabass wrote:It could be, if all four cores are trying to run the same code.

Try adjusting your startup code to park three of the four cores:

Code: Select all

_start:
    mrs x30, mpidr_el1
    and x30, x30, #3
    cbz x30, 1f
0:  wfe
    b 0b
1:  ldr x30, =stack_top
    mov sp, x30
    bl kmain
    b 0b
No change. Even using #896 (which is 1110000000b) to select the three bits for Aff1 in a cortex-a76 didn't change the outcome.

Re: Weird AARCH64 behaviours

Posted: Wed Oct 26, 2022 5:11 pm
by linuxyne
With -O0, the compiler saves FP registers q0 to q7. I think that that might be causing an undefined instruction exception, if the FP unit is not yet setup. You may want to ask the compiler to not generate FP instructions: "-march=armv8-a+nofp", or similar.

Edit: Perhaps this behaviour is seen because printf is a variadic function.

Re: Weird AARCH64 behaviours

Posted: Wed Oct 26, 2022 5:31 pm
by klange
linuxyne wrote:With -O0, the compiler saves FP registers q0 to q7. I think that that might be causing an undefined instruction exception, if the FP unit is not yet setup. You may want to ask the compiler to not generate FP instructions: "-march=armv8-a+nofp", or similar.
I recommend -mgeneral-regs-only as a more, well, general approach - also applicable on other architectures.

Re: Weird AARCH64 behaviours

Posted: Wed Oct 26, 2022 5:33 pm
by qookie
linuxyne wrote:With -O0, the compiler saves FP registers q0 to q7. I think that might be causing an undefined instruction exception, if the FP unit is not yet setup. You may want to ask the compiler to not generate FP instructions: "-march=armv8-a+nofp", or similar.
I don't think FP registers are a problem in this case. Passing an elf file to -kernel will have qemu put you in the highest supported EL (so EL3 in this case) since it thinks you're giving it firmware instead of a kernel. And since you're in the highest EL possible, no one had the chance to set up trapping the regs (reset value is in this case to not trap as per the Cortex-A53 TRM).

EDIT: while technically true, to get to EL2 or EL3 one would need to add a flag to the machine, and indeed at EL1 it is set to trap by default, my bad!

Also, as klange mentioned, "-mgeneral-regs-only" works as well to disable the use of FP regs.

Also also, you can pass "-d int" to qemu to have it tell you what faults have happened, perhaps with the help of https://esr.arm64.dev/ to decode the ESR.

Re: Weird AARCH64 behaviours

Posted: Wed Oct 26, 2022 5:37 pm
by klange
The first exception OP's code takes is an undefined instruction trap:

Code: Select all

$ qemu-system-aarch64 -cpu cortex-a53 -kernel kernel.elf -M virt -nographic -d int -no-reboot
Result: Taking exception 1 [Undefined Instruction] on CPU 0
...from EL1 to EL1
...with ESR 0x7/0x1fe00000
...with ELR 0x400000b0
...to EL1 PC 0x200 PSTATE 0x3c5
And it is from trying to store q0:

Code: Select all

    400000b0:   3d8017e0        str     q0, [sp, #80]
I can also confirm that adding -mgeneral-regs-only produced working code under -O0:

Code: Select all

$ qemu-system-aarch64 -cpu cortex-a53 -kernel kernel.elf -M virt -nographic
Result: from printf

Re: Weird AARCH64 behaviours

Posted: Wed Oct 26, 2022 6:48 pm
by cris10293
Adding

Code: Select all

-mgeneral-regs-only
did the trick. Thanks everyone!