Page 1 of 1

Qemu with -accel parameter: different program behaviour

Posted: Tue Jan 26, 2021 2:47 pm
by zdychaczech
Hello everyone,

I am doing a hobby 32bit kernel and few days ago I've encountered different behaviour when running my kernel with `-accel` parameter.

The kernel is working OK when run with:

Code: Select all

qemu-system-x86_64 -drive file=release/disk.img,format=raw,index=1,media=disk
but when it is run with acceleration enabled ("hvf" on macos or "kvm" on linux):

Code: Select all

qemu-system-x86_64 -drive file=release/disk.img,format=raw,index=1,media=disk -accel hvf
something goes wrong and qemu start resetting repeatedly.

I've found out that problem is in code after setting up paging, specifically when jumping to higher half kernel entry point.

I am attaching my code and I would be very happy if someone could tell me what is wrong and why is the program acting differently when running accelerated or not.

I can provide more information if needed.

Thank you very much.

Ondrej

Code: Select all

[bits 32]

STACKSIZE equ 0x4000

global _loader
extern kmain

KERNEL_VIRTUAL_BASE equ 0xc0000000 ; 3GB
KERNEL_PAGE_NUMBER equ (KERNEL_VIRTUAL_BASE >> 22) ; Page directory index of kernel's 4MB PTE.

FRAMEBUFFER_VIRTUAL_BASE equ 0xff400000
FRAMEBUFFER_PAGE_NUMBER equ (FRAMEBUFFER_VIRTUAL_BASE >> 22) ; Page directory index of framebuffer's 4MB PTE.

section .data
align 0x1000
boot_page_directory:
  ; This page directory entry identity-maps the first 4MB of the 32-bit physical address space.
  ; This entry must be here -- otherwise the kernel will crash immediately after paging is
  ; enabled because it can't fetch the next instruction! It's ok to unmap this page later.
  dd 0x00000083
  times (KERNEL_PAGE_NUMBER - 1) dd 0 ; Pages before kernel space.
  ; This page directory entry defines a 4MB page containing the kernel.
  dd 0x00000083
  times (FRAMEBUFFER_PAGE_NUMBER - KERNEL_PAGE_NUMBER - 1) dd 0
  ; Two page directory entries define a 8MB space for framebuffer.
  dd 0xfd000083
  dd 0xfd400083
  times (1024 - FRAMEBUFFER_PAGE_NUMBER - 2) dd 0 ; Pages after the framebuffer space.

; setting up entry point for linker
loader equ (_loader - 0xc0000000)
global loader

_loader:
  cli
  ; NOTE: Until paging is set up, the code must be position-independent and use physical addresses, not virtual ones!
  mov edx, (boot_page_directory - KERNEL_VIRTUAL_BASE)
  mov cr3, edx ; Load Page Directory Base Register.

  mov edx, cr4
  or edx, 0x00000010 ; Set PSE bit in CR4 to enable 4MB pages.
  mov cr4, edx

  mov edx, cr0
  or edx, 0x80000000 ; Set PG bit in CR0 to enable paging.
  mov cr0, edx

  ; Start fetching instructions in kernel space.
  ; Since eip at this point holds the physical address of this command (approximately 0x00100000)
  ; we need to do a long jump to the correct virtual address of start_in_higher_half which is approximately 0xc0100000.
  lea edx, [start_in_higher_half]
  jmp edx ; ===================> HERE, after this jump program crashes and qemu keeps resetting

start_in_higher_half:
  mov esp, stack + STACKSIZE

  call kmain ; call kernel

  hlt

section .bss
align 32
stack:
  resb STACKSIZE ; reserve 16k stack

Re: Qemu with -accel parameter: different program behaviour

Posted: Tue Jan 26, 2021 8:01 pm
by Octocontrabass
Try adding "-no-reboot -d int,cpu_reset" to your QEMU command line to log the CPU state when it reboots. The log can help you (or us) track down exactly what's going wrong.

Re: Qemu with -accel parameter: different program behaviour

Posted: Tue Jan 26, 2021 8:30 pm
by xeyes
is _loader the first entry into your code? If so, it may have to do with segmentation, even if you don't care about it you still need a flat descriptor and proper selectors pointing to it.

One difference I've noticed:

qemu's emu mode does not check segments, so you can have DS points to an invalid base/limit and it will still run fine.

But KVM or real hardware will enforce segmentation checks in protected mode, and will triple fault without exception handlers in place.

Qemu's wiki also listed some unsupported items for HVF, so you might want to make it working under KVM first, so you don't have to keep 2nd guessing is it HVF or is it me.

Re: Qemu with -accel parameter: different program behaviour

Posted: Wed Jan 27, 2021 2:06 pm
by zdychaczech
hi Octocontrabass and xeyes,

@Octocontrabass: I tried to run with parameters you recommended:

Code: Select all

qemu-system-x86_64 -drive file=release/disk.img,format=raw,index=1,media=disk -accel hvf -no-reboot -d int,cpu_reset -D ./log.txt`
and I can't see anything useful from log when the option `-no-reboot` is specified :(. There are only zeroes in registers (log_no_reboot.txt). So I tried to run without `-no-reboot` option and captured more frames (log_without_no_reboot.txt) – I am attaching the file with ouput I got.

@xeyes: "_loader" is not the first entry into my code. I use custom code for booting up (legacy BIOS). I have a booting pipeline with stage 1 and stage 2 bootloaders.

If you want to look at my code, here is the link: https://github.com/zdychacek/my-os

My booting process starts with loading stage 1.5 bootloader, which in turns loads stage 2 bootloader, which in turns loads my kernel. Kernel entry point is `_loader`.

- `src/bootloader/bootsector/boots.asm` is the code called by BIOS (located at 0x7c00), it loads stage 1.5 from ext2 FS
- `src/bootloader/stage1/main.asm` is stage 1.5 loader, which enters protected mode, gets memory map, set up VESA framebuffer, etc. and loads and call stage 2 loader from ext2 FS
- `src/bootloader/stage2/main.c` is stage2 loader which loads my kernel

I attached my bin file `disk.img.zip` (zippped ) as an attachement if you want to try it run.

Thank you xeyes for pointing out that the HVF is not fully supported (gdbstub if really missing feature). I can't test my code on KVM accelerator because I have no linux machine at my disposal right now.

Every advice will be appreciated.

Thank you.

Ondrej

Re: Qemu with -accel parameter: different program behaviour

Posted: Wed Jan 27, 2021 3:16 pm
by Octocontrabass

Code: Select all

CS =0008 00000000 0fffffff 00c09b00 DPL=0 CS32 [-RA]
Your CS limit is too low. It looks like you missed the upper four bits of the limit when you set up your GDT. (QEMU's software emulation ignores segment limits.)

Re: Qemu with -accel parameter: different program behaviour

Posted: Thu Jan 28, 2021 2:02 am
by xeyes
zdychaczech wrote:hi Octocontrabass and xeyes,

@Octocontrabass: I tried to run with parameters you recommended:

Code: Select all

qemu-system-x86_64 -drive file=release/disk.img,format=raw,index=1,media=disk -accel hvf -no-reboot -d int,cpu_reset -D ./log.txt`
and I can't see anything useful from log when the option `-no-reboot` is specified :(. There are only zeroes in registers (log_no_reboot.txt). So I tried to run without `-no-reboot` option and captured more frames (log_without_no_reboot.txt) – I am attaching the file with ouput I got.

@xeyes: "_loader" is not the first entry into my code. I use custom code for booting up (legacy BIOS). I have a booting pipeline with stage 1 and stage 2 bootloaders.

If you want to look at my code, here is the link: https://github.com/zdychacek/my-os

My booting process starts with loading stage 1.5 bootloader, which in turns loads stage 2 bootloader, which in turns loads my kernel. Kernel entry point is `_loader`.

- `src/bootloader/bootsector/boots.asm` is the code called by BIOS (located at 0x7c00), it loads stage 1.5 from ext2 FS
- `src/bootloader/stage1/main.asm` is stage 1.5 loader, which enters protected mode, gets memory map, set up VESA framebuffer, etc. and loads and call stage 2 loader from ext2 FS
- `src/bootloader/stage2/main.c` is stage2 loader which loads my kernel

I attached my bin file `disk.img.zip` (zippped ) as an attachement if you want to try it run.

Thank you xeyes for pointing out that the HVF is not fully supported (gdbstub if really missing feature). I can't test my code on KVM accelerator because I have no linux machine at my disposal right now.

Every advice will be appreciated.

Thank you.

Ondrej
Keep in mind that Qemu doesn't behave in a fully reasonable/controllable way with accelerations like KVM (HVF seems similar esp. on AMD64 hardware).

The cpu_reset flag used without accelerations is very detailed and will tell you a lot about the triple fault, but with accelerations, as you can see, it wasn't not even able to log the fact that there has been a triple fault.

An easy way to check is to put a while(1) before the far jump, and take a look in the qemu monitor console ("info registers" cmd) to see whether CS is missing the higher part as Octocontrabass have pointed out.

About missing feature: unknown unknown is the bigger concern here. What if they didn't list gdb as not supported and you keep trying to make it work?

Re: Qemu with -accel parameter: different program behaviour

Posted: Thu Jan 28, 2021 1:00 pm
by zdychaczech
Octocontrabass wrote:

Code: Select all

CS =0008 00000000 0fffffff 00c09b00 DPL=0 CS32 [-RA]
Your CS limit is too low. It looks like you missed the upper four bits of the limit when you set up your GDT. (QEMU's software emulation ignores segment limits.)
Thank you, Octocontrabass! You are right, I forget to set higher part of the limit.

Problem solved :)

Re: Qemu with -accel parameter: different program behaviour

Posted: Thu Jan 28, 2021 1:02 pm
by zdychaczech
xeyes, thank you for your debugging advices! I will keep that in my mind next time I will be debugging.