QEMU hangs on M2 MacBook running Ventura
QEMU hangs on M2 MacBook running Ventura
Hi,
I installed QEMU from brew on M2 Macbook running Ventura 13.4
The Qemu version is 8.0.2
My OS is 32 bit protected mode OS for x86 arch. It is cross compiled using i686-elf-gcc. I am running it using qemu-system-x86_64
The same setup on WSL (windows) works fine but on MacOS, I traced that QEMU hangs when it encounters FPU instruction "FILDLL" or "FDIVRP" - either one of them
If I remove that floating point math code then my OS boots fine. I confirmed that the generated assembly on WSL also has same FILDLL and FDIVRP instructions and it is working fine there
When I looked at the 'out_asm' debug logs of Qemu on MacOS, the last few lines before hanging are a bunch of (I think around 8 ) .quad lines.
I have verified that there are no guest_errors, interrupts from debug logs etc.
Does anyone know what could be wrong here ?
My FPU initialisation is done fine. I verified that CPU has builtin FPU, EM is cleared, MP and NE bits are set in CR0. SSE is also enabled
This is my entire command line
qemu-system-x86_64 \
-serial file:serial_debug.log \
-pflash ./OVMF.fd \
-m 512 \
-smp 1 \
-usb \
-d guest_errors,int,out_asm \
-drive if=none,id=usbbootdrive,file=$BOOT_DRIVE \
-drive if=none,id=usbdrive1,file=$UPANIX_HOME/USBImage/300MUSB_ehci.img \
-device usb-ehci,id=ehci \
-device usb-storage,bus=ehci.0,drive=usbdrive1 \
-device nec-usb-xhci,id=xhci \
-device usb-storage,bus=xhci.0,port=1,drive=usbbootdrive \
-device usb-hub,bus=xhci.0,port=3
Regards
Prajwal
I installed QEMU from brew on M2 Macbook running Ventura 13.4
The Qemu version is 8.0.2
My OS is 32 bit protected mode OS for x86 arch. It is cross compiled using i686-elf-gcc. I am running it using qemu-system-x86_64
The same setup on WSL (windows) works fine but on MacOS, I traced that QEMU hangs when it encounters FPU instruction "FILDLL" or "FDIVRP" - either one of them
If I remove that floating point math code then my OS boots fine. I confirmed that the generated assembly on WSL also has same FILDLL and FDIVRP instructions and it is working fine there
When I looked at the 'out_asm' debug logs of Qemu on MacOS, the last few lines before hanging are a bunch of (I think around 8 ) .quad lines.
I have verified that there are no guest_errors, interrupts from debug logs etc.
Does anyone know what could be wrong here ?
My FPU initialisation is done fine. I verified that CPU has builtin FPU, EM is cleared, MP and NE bits are set in CR0. SSE is also enabled
This is my entire command line
qemu-system-x86_64 \
-serial file:serial_debug.log \
-pflash ./OVMF.fd \
-m 512 \
-smp 1 \
-usb \
-d guest_errors,int,out_asm \
-drive if=none,id=usbbootdrive,file=$BOOT_DRIVE \
-drive if=none,id=usbdrive1,file=$UPANIX_HOME/USBImage/300MUSB_ehci.img \
-device usb-ehci,id=ehci \
-device usb-storage,bus=ehci.0,drive=usbdrive1 \
-device nec-usb-xhci,id=xhci \
-device usb-storage,bus=xhci.0,port=1,drive=usbbootdrive \
-device usb-hub,bus=xhci.0,port=3
Regards
Prajwal
complexity is the core of simplicity
-
- Member
- Posts: 5560
- Joined: Mon Mar 25, 2013 7:01 pm
Re: QEMU hangs on M2 MacBook running Ventura
Did you run any other FPU instructions before these?prajwal wrote:The same setup on WSL (windows) works fine but on MacOS, I traced that QEMU hangs when it encounters FPU instruction "FILDLL" or "FDIVRP" - either one of them
How about CR0.TS?prajwal wrote:I verified that CPU has builtin FPU, EM is cleared, MP and NE bits are set in CR0.
Re: QEMU hangs on M2 MacBook running Ventura
Yes, I have run other FPU instructions before this. I modified the code to do the same math operation using function local variables instead of those that came as function parameters. The compiler generated instructions in this case, did not have FILDLL and FDIVRP but continued to include FSTPL and FLDL. (fyi: I am not doing code optimisation by passing -O0 param)
complexity is the core of simplicity
-
- Member
- Posts: 5560
- Joined: Mon Mar 25, 2013 7:01 pm
Re: QEMU hangs on M2 MacBook running Ventura
Does the behavior change according to the operands to the FILD or FDIVRP instructions? Is there enough room in the x87 stack for FILD?
Can you turn on one-insn-per-tb and log out_asm where QEMU hangs?
Can you run QEMU itself under a debugger to see why it hangs?
Can you turn on one-insn-per-tb and log out_asm where QEMU hangs?
Can you run QEMU itself under a debugger to see why it hangs?
Re: QEMU hangs on M2 MacBook running Ventura
I tried using LLDB but couldn't succeed is single stepping beyond the first breakpoint at main function.
In any case, I tried couple of other things. I downloaded and built qemu-7.2.3 on my Mac M2 Ventura 13.4 successfully and then used qemu-system-x86_64 from that build. That failed (got froze) at the same point
one-insn-per-tb is not supported in qemu-8.0.2, so I used -singlestep option with -d out_asm,in_asm and this is the output before qemu froze
Any clue from above code on what could be going wrong ?
In any case, I tried couple of other things. I downloaded and built qemu-7.2.3 on my Mac M2 Ventura 13.4 successfully and then used qemu-system-x86_64 from that build. That failed (got froze) at the same point
one-insn-per-tb is not supported in qemu-8.0.2, so I used -singlestep option with -d out_asm,in_asm and this is the output before qemu froze
Code: Select all
IN:
0x0019cadf: de f9 fdivrp %st(1)
OUT: [size=72]
-- guest addr 0x0000000000000adf + tb prologue
0x10ca72200: b85f0274 ldur w20, [x19, #-0x10]
0x10ca72204: 7100029f cmp w20, #0
0x10ca72208: 540001cb b.lt #0x10ca72240
0x10ca7220c: aa1303e0 mov x0, x19
0x10ca72210: 52800021 movz w1, #0x1
0x10ca72214: 9602645a bl #0x104b0b37c
0x10ca72218: aa1303e0 mov x0, x19
0x10ca7221c: 96026166 bl #0x104b0a7b4
0x10ca72220: b940d274 ldr w20, [x19, #0xd0]
0x10ca72224: 7905e674 strh w20, [x19, #0x2f2]
0x10ca72228: f9404274 ldr x20, [x19, #0x80]
0x10ca7222c: f9017e74 str x20, [x19, #0x2f8]
0x10ca72230: 91000a94 add x20, x20, #2
0x10ca72234: 2a1403f4 mov w20, w20
0x10ca72238: f9004274 str x20, [x19, #0x80]
0x10ca7223c: 16a3a77b b #0x10735c028
0x10ca72240: 70fff600 adr x0, #0x10ca72103
0x10ca72244: 16a3a77a b #0x10735c02c
complexity is the core of simplicity
Re: QEMU hangs on M2 MacBook running Ventura
In addition, if this information helps - my OS successfully boots (from USB) and runs on my real laptop, that has x86_64 processor
complexity is the core of simplicity
-
- Member
- Posts: 5560
- Joined: Mon Mar 25, 2013 7:01 pm
Re: QEMU hangs on M2 MacBook running Ventura
Instead of setting a breakpoint, hang QEMU and send SIGINT as your breakpoint. That should let you see what QEMU is doing when it hangs.prajwal wrote:I tried using LLDB but couldn't succeed is single stepping beyond the first breakpoint at main function.
Code: Select all
0x0019cadf: de f9 fdivrp %st(1)
Code: Select all
0x10ca72200: b85f0274 ldur w20, [x19, #-0x10]
0x10ca72204: 7100029f cmp w20, #0
0x10ca72208: 540001cb b.lt #0x10ca72240
Code: Select all
0x10ca7220c: aa1303e0 mov x0, x19
0x10ca72210: 52800021 movz w1, #0x1
0x10ca72214: 9602645a bl #0x104b0b37c
Code: Select all
0x10ca72218: aa1303e0 mov x0, x19
0x10ca7221c: 96026166 bl #0x104b0a7b4
Code: Select all
0x10ca72220: b940d274 ldr w20, [x19, #0xd0]
0x10ca72224: 7905e674 strh w20, [x19, #0x2f2]
0x10ca72228: f9404274 ldr x20, [x19, #0x80]
0x10ca7222c: f9017e74 str x20, [x19, #0x2f8]
0x10ca72230: 91000a94 add x20, x20, #2
0x10ca72234: 2a1403f4 mov w20, w20
0x10ca72238: f9004274 str x20, [x19, #0x80]
Code: Select all
0x10ca7223c: 16a3a77b b #0x10735c028
0x10ca72240: 70fff600 adr x0, #0x10ca72103
0x10ca72244: 16a3a77a b #0x10735c02c
I don't see anything wrong in the generated code. I don't think the helper functions are doing anything crazy enough to cause a problem either, especially since they seem to work fine on other CPUs.prajwal wrote:Any clue from above code on what could be going wrong ?
At this point you might have better luck coming up with the smallest program that replicates the problem and submitting a bug report to the QEMU developers.
Re: QEMU hangs on M2 MacBook running Ventura
Thank you. I looked at the qemu crash report @ /Library/Logs/DiagnosticReports and found below thread stack dump.
The line in qemu where it was hanging was "host-utils.h:576" - which is a call to function "__builtin_addcll"
I then modified "include/qemu/compiler.h" and forced (redefined) "#define __has_builtin(x) 0" and recompiled qemu-7.2.3. It threw a bunch of warnings for overriding the definition of __has_builtin(x) but compilation went through fine.
This time, it all worked fine - the OS ran successfully! So, at this point - the issue points to __builtin_addcll().
Appreciate if the details shared so far helps find the root cause - so, I don't have to go with this hack/workaround.
The line in qemu where it was hanging was "host-utils.h:576" - which is a call to function "__builtin_addcll"
I then modified "include/qemu/compiler.h" and forced (redefined) "#define __has_builtin(x) 0" and recompiled qemu-7.2.3. It threw a bunch of warnings for overriding the definition of __has_builtin(x) but compilation went through fine.
This time, it all worked fine - the OS ran successfully! So, at this point - the issue points to __builtin_addcll().
Appreciate if the details shared so far helps find the root cause - so, I don't have to go with this hack/workaround.
Code: Select all
static inline uint64_t uadd64_carry(uint64_t x, uint64_t y, bool *pcarry)
{
#if __has_builtin(__builtin_addcll)
unsigned long long c = *pcarry;
x = __builtin_addcll(x, y, c, &c); // This is the line with problem
*pcarry = c & 1;
return x;
#else
bool c = *pcarry;
/* This is clang's internal expansion of __builtin_addc. */
c = uadd64_overflow(x, c, &x);
c |= uadd64_overflow(x, y, &x);
*pcarry = c;
return x;
#endif
}
Code: Select all
Thread 0xecd2a 48 samples (1-48) priority 31 (base 31) cpu time 4.698s (16.4G cycles, 43.1G instructions, 0.38c/i)
<process frontmost, thread QoS default (requested default), process unclamped, process received importance donation from WindowServer [350], IO tier 0>
48 thread_start + 8 (libsystem_pthread.dylib + 7584) [0x18c196da0] 1-48
48 _pthread_start + 148 (libsystem_pthread.dylib + 28584) [0x18c19bfa8] 1-48
48 qemu_thread_start + 128 (qemu-thread-posix.c:505,9 in qemu-system-x86_64 + 5037844) [0x104d81f14] 1-48
48 rr_cpu_thread_fn + 480 (tcg-accel-ops-rr.c:223,21 in qemu-system-x86_64 + 3606504) [0x104c247e8] 1-48
48 tcg_cpus_exec + 44 (tcg-accel-ops.c:69,11 in qemu-system-x86_64 + 3603396) [0x104c23bc4] 1-48
48 cpu_exec + 1764 (cpu-exec.c:1032,13 in qemu-system-x86_64 + 3469648) [0x104c03150] 1-48
48 cpu_loop_exec_tb + 32 (cpu-exec.c:868,10 in qemu-system-x86_64 + 3469648) [0x104c03150] 1-48
48 qemu_thread_start + 128 (qemu-thread-posix.c:505,9 in qemu-system-x86_64 + 5037844) [0x104d81f14] 1-48
48 rr_cpu_thread_fn + 480 (tcg-accel-ops-rr.c:223,21 in qemu-system-x86_64 + 3606504) [0x104c247e8] 1-48
48 tcg_cpus_exec + 44 (tcg-accel-ops.c:69,11 in qemu-system-x86_64 + 3603396) [0x104c23bc4] 1-48
48 cpu_exec + 1764 (cpu-exec.c:1032,13 in qemu-system-x86_64 + 3469648) [0x104c03150] 1-48
48 cpu_loop_exec_tb + 32 (cpu-exec.c:868,10 in qemu-system-x86_64 + 3469648) [0x104c03150] 1-48
48 cpu_tb_exec + 148 (cpu-exec.c:438,11 in qemu-system-x86_64 + 3467428) [0x104c028a4] 1-48
48 ??? [0x10ca72218] 1-48
48 helper_fdiv_STN_ST0 + 76 (fpu_helper.c:578,10 in qemu-system-x86_64 + 2454472) [0x104b0b3c8] 1-48
48 helper_fdiv + 16 (fpu_helper.c:159,20 in qemu-system-x86_64 + 2454472) [0x104b0b3c8] 1-48
1 floatx80_div + 496 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389588) [0x104bef894] 1
1 parts128_div + 288 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389588) [0x104bef894] 1
1 frac128_div + 240 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389588) [0x104bef894] 1
1 add192 + 0 (softfloat-macros.h:459,14 in qemu-system-x86_64 + 3389588) [0x104bef894] 1
1 uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389588) [0x104bef894] (running) 1
1 floatx80_div + 512 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389604) [0x104bef8a4] 2
1 parts128_div + 304 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389604) [0x104bef8a4] 2
1 frac128_div + 256 (softfloat.c:1052,5 in qemu-system-x86_64 + 3389604) [0x104bef8a4] (running) 2
1 floatx80_div + 496 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389588) [0x104bef894] 3
1 parts128_div + 288 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389588) [0x104bef894] 3
1 frac128_div + 240 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389588) [0x104bef894] 3
1 add192 + 0 (softfloat-macros.h:459,14 in qemu-system-x86_64 + 3389588) [0x104bef894] 3
1 uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389588) [0x104bef894] (running) 3
1 floatx80_div + 512 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389604) [0x104bef8a4] 4
1 parts128_div + 304 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389604) [0x104bef8a4] 4
1 frac128_div + 256 (softfloat.c:1052,5 in qemu-system-x86_64 + 3389604) [0x104bef8a4] (running) 4
3 floatx80_div + 496 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389588) [0x104bef894] 5-7
3 parts128_div + 288 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389588) [0x104bef894] 5-7
3 frac128_div + 240 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389588) [0x104bef894] 5-7
3 add192 + 0 (softfloat-macros.h:459,14 in qemu-system-x86_64 + 3389588) [0x104bef894] 5-7
3 uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389588) [0x104bef894] (running) 5-7
1 floatx80_div + 512 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389604) [0x104bef8a4] 8
1 parts128_div + 304 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389604) [0x104bef8a4] 8
1 frac128_div + 256 (softfloat.c:1052,5 in qemu-system-x86_64 + 3389604) [0x104bef8a4] (running) 8
3 floatx80_div + 496 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389588) [0x104bef894] 9-11
3 parts128_div + 288 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389588) [0x104bef894] 9-11
3 frac128_div + 240 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389588) [0x104bef894] 9-11
3 add192 + 0 (softfloat-macros.h:459,14 in qemu-system-x86_64 + 3389588) [0x104bef894] 9-11
3 uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389588) [0x104bef894] (running) 9-11
1 floatx80_div + 500 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389592) [0x104bef898] 12
1 parts128_div + 292 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389592) [0x104bef898] 12
1 frac128_div + 244 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389592) [0x104bef898] 12
1 add192 + 4 (softfloat-macros.h:460,14 in qemu-system-x86_64 + 3389592) [0x104bef898] 12
1 uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389592) [0x104bef898] (running) 12
3 floatx80_div + 496 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389588) [0x104bef894] 13-15
3 parts128_div + 288 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389588) [0x104bef894] 13-15
3 frac128_div + 240 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389588) [0x104bef894] 13-15
3 add192 + 0 (softfloat-macros.h:459,14 in qemu-system-x86_64 + 3389588) [0x104bef894] 13-15
3 uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389588) [0x104bef894] (running) 13-15
1 floatx80_div + 500 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389592) [0x104bef898] 16
1 parts128_div + 292 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389592) [0x104bef898] 16
1 frac128_div + 244 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389592) [0x104bef898] 16
1 add192 + 4 (softfloat-macros.h:460,14 in qemu-system-x86_64 + 3389592) [0x104bef898] 16
1 uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389592) [0x104bef898] (running) 16
1 floatx80_div + 512 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389604) [0x104bef8a4] 17
1 parts128_div + 304 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389604) [0x104bef8a4] 17
1 frac128_div + 256 (softfloat.c:1052,5 in qemu-system-x86_64 + 3389604) [0x104bef8a4] (running) 17
11 floatx80_div + 496 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389588) [0x104bef894] 18-28
11 parts128_div + 288 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389588) [0x104bef894] 18-28
11 frac128_div + 240 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389588) [0x104bef894] 18-28
11 add192 + 0 (softfloat-macros.h:459,14 in qemu-system-x86_64 + 3389588) [0x104bef894] 18-28
11 uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389588) [0x104bef894] (running) 18-28
1 floatx80_div + 508 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389600) [0x104bef8a0] 29
1 parts128_div + 300 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389600) [0x104bef8a0] 29
1 frac128_div + 252 (softfloat.c:1053,11 in qemu-system-x86_64 + 3389600) [0x104bef8a0] (running) 29
2 floatx80_div + 496 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389588) [0x104bef894] 30-31
2 parts128_div + 288 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389588) [0x104bef894] 30-31
2 frac128_div + 240 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389588) [0x104bef894] 30-31
2 add192 + 0 (softfloat-macros.h:459,14 in qemu-system-x86_64 + 3389588) [0x104bef894] 30-31
2 uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389588) [0x104bef894] (running) 30-31
1 floatx80_div + 508 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389600) [0x104bef8a0] 32
1 parts128_div + 300 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389600) [0x104bef8a0] 32
1 frac128_div + 252 (softfloat.c:1053,11 in qemu-system-x86_64 + 3389600) [0x104bef8a0] (running) 32
1 floatx80_div + 500 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389592) [0x104bef898] 33
1 parts128_div + 292 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389592) [0x104bef898] 33
1 frac128_div + 244 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389592) [0x104bef898] 33
1 add192 + 4 (softfloat-macros.h:460,14 in qemu-system-x86_64 + 3389592) [0x104bef898] 33
1 uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389592) [0x104bef898] (running) 33
11 floatx80_div + 496 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389588) [0x104bef894] 34-44
11 parts128_div + 288 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389588) [0x104bef894] 34-44
11 frac128_div + 240 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389588) [0x104bef894] 34-44
11 add192 + 0 (softfloat-macros.h:459,14 in qemu-system-x86_64 + 3389588) [0x104bef894] 34-44
11 uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389588) [0x104bef894] (running) 34-44
1 floatx80_div + 508 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389600) [0x104bef8a0] 45
1 parts128_div + 300 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389600) [0x104bef8a0] 45
1 frac128_div + 252 (softfloat.c:1053,11 in qemu-system-x86_64 + 3389600) [0x104bef8a0] (running) 45
3 floatx80_div + 496 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389588) [0x104bef894] 46-48
3 add192 + 0 (softfloat-macros.h:459,14 in qemu-system-x86_64 + 3389588) [0x104bef894] 9-11
3 uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389588) [0x104bef894] (running) 9-11
1 floatx80_div + 500 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389592) [0x104bef898] 12
1 parts128_div + 292 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389592) [0x104bef898] 12
1 frac128_div + 244 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389592) [0x104bef898] 12
1 add192 + 4 (softfloat-macros.h:460,14 in qemu-system-x86_64 + 3389592) [0x104bef898] 12
1 uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389592) [0x104bef898] (running) 12
3 floatx80_div + 496 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389588) [0x104bef894] 13-15
3 parts128_div + 288 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389588) [0x104bef894] 13-15
3 frac128_div + 240 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389588) [0x104bef894] 13-15
3 add192 + 0 (softfloat-macros.h:459,14 in qemu-system-x86_64 + 3389588) [0x104bef894] 13-15
3 uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389588) [0x104bef894] (running) 13-15
1 floatx80_div + 500 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389592) [0x104bef898] 16
1 parts128_div + 292 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389592) [0x104bef898] 16
1 frac128_div + 244 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389592) [0x104bef898] 16
1 add192 + 4 (softfloat-macros.h:460,14 in qemu-system-x86_64 + 3389592) [0x104bef898] 16
1 uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389592) [0x104bef898] (running) 16
1 floatx80_div + 512 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389604) [0x104bef8a4] 17
1 parts128_div + 304 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389604) [0x104bef8a4] 17
1 frac128_div + 256 (softfloat.c:1052,5 in qemu-system-x86_64 + 3389604) [0x104bef8a4] (running) 17
11 floatx80_div + 496 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389588) [0x104bef894] 18-28
11 parts128_div + 288 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389588) [0x104bef894] 18-28
11 frac128_div + 240 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389588) [0x104bef894] 18-28
11 add192 + 0 (softfloat-macros.h:459,14 in qemu-system-x86_64 + 3389588) [0x104bef894] 18-28
11 uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389588) [0x104bef894] (running) 18-28
1 floatx80_div + 508 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389600) [0x104bef8a0] 29
1 parts128_div + 300 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389600) [0x104bef8a0] 29
1 frac128_div + 252 (softfloat.c:1053,11 in qemu-system-x86_64 + 3389600) [0x104bef8a0] (running) 29
2 floatx80_div + 496 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389588) [0x104bef894] 30-31
2 parts128_div + 288 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389588) [0x104bef894] 30-31
2 frac128_div + 240 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389588) [0x104bef894] 30-31
2 add192 + 0 (softfloat-macros.h:459,14 in qemu-system-x86_64 + 3389588) [0x104bef894] 30-31
2 uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389588) [0x104bef894] (running) 30-31
1 floatx80_div + 508 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389600) [0x104bef8a0] 32
1 parts128_div + 300 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389600) [0x104bef8a0] 32
1 frac128_div + 252 (softfloat.c:1053,11 in qemu-system-x86_64 + 3389600) [0x104bef8a0] (running) 32
1 floatx80_div + 500 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389592) [0x104bef898] 33
1 parts128_div + 292 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389592) [0x104bef898] 33
1 frac128_div + 244 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389592) [0x104bef898] 33
1 add192 + 4 (softfloat-macros.h:460,14 in qemu-system-x86_64 + 3389592) [0x104bef898] 33
1 uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389592) [0x104bef898] (running) 33
11 floatx80_div + 496 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389588) [0x104bef894] 34-44
11 parts128_div + 288 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389588) [0x104bef894] 34-44
11 frac128_div + 240 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389588) [0x104bef894] 34-44
11 add192 + 0 (softfloat-macros.h:459,14 in qemu-system-x86_64 + 3389588) [0x104bef894] 34-44
11 uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389588) [0x104bef894] (running) 34-44
1 floatx80_div + 508 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389600) [0x104bef8a0] 45
1 parts128_div + 300 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389600) [0x104bef8a0] 45
1 frac128_div + 252 (softfloat.c:1053,11 in qemu-system-x86_64 + 3389600) [0x104bef8a0] (running) 45
3 floatx80_div + 496 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389588) [0x104bef894] 46-48
3 parts128_div + 288 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389588) [0x104bef894] 46-48
3 frac128_div + 240 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389588) [0x104bef894] 46-48
3 add192 + 0 (softfloat-macros.h:459,14 in qemu-system-x86_64 + 3389588) [0x104bef894] 46-48
3 uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389588) [0x104bef894] (running) 46-48
1 frac128_div + 244 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389592) [0x104bef898] 12
1 add192 + 4 (softfloat-macros.h:460,14 in qemu-system-x86_64 + 3389592) [0x104bef898] 12
1 uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389592) [0x104bef898] (running) 12
3 floatx80_div + 496 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389588) [0x104bef894] 13-15
3 parts128_div + 288 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389588) [0x104bef894] 13-15
3 frac128_div + 240 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389588) [0x104bef894] 13-15
3 add192 + 0 (softfloat-macros.h:459,14 in qemu-system-x86_64 + 3389588) [0x104bef894] 13-15
3 uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389588) [0x104bef894] (running) 13-15
1 floatx80_div + 500 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389592) [0x104bef898] 16
1 parts128_div + 292 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389592) [0x104bef898] 16
1 frac128_div + 244 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389592) [0x104bef898] 16
1 add192 + 4 (softfloat-macros.h:460,14 in qemu-system-x86_64 + 3389592) [0x104bef898] 16
1 uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389592) [0x104bef898] (running) 16
1 floatx80_div + 512 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389604) [0x104bef8a4] 17
1 parts128_div + 304 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389604) [0x104bef8a4] 17
1 frac128_div + 256 (softfloat.c:1052,5 in qemu-system-x86_64 + 3389604) [0x104bef8a4] (running) 17
11 floatx80_div + 496 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389588) [0x104bef894] 18-28
11 parts128_div + 288 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389588) [0x104bef894] 18-28
11 frac128_div + 240 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389588) [0x104bef894] 18-28
11 add192 + 0 (softfloat-macros.h:459,14 in qemu-system-x86_64 + 3389588) [0x104bef894] 18-28
11 uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389588) [0x104bef894] (running) 18-28
1 floatx80_div + 508 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389600) [0x104bef8a0] 29
1 parts128_div + 300 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389600) [0x104bef8a0] 29
1 frac128_div + 252 (softfloat.c:1053,11 in qemu-system-x86_64 + 3389600) [0x104bef8a0] (running) 29
2 floatx80_div + 496 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389588) [0x104bef894] 30-31
2 parts128_div + 288 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389588) [0x104bef894] 30-31
2 frac128_div + 240 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389588) [0x104bef894] 30-31
2 add192 + 0 (softfloat-macros.h:459,14 in qemu-system-x86_64 + 3389588) [0x104bef894] 30-31
2 uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389588) [0x104bef894] (running) 30-31
1 floatx80_div + 508 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389600) [0x104bef8a0] 32
1 parts128_div + 300 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389600) [0x104bef8a0] 32
1 frac128_div + 252 (softfloat.c:1053,11 in qemu-system-x86_64 + 3389600) [0x104bef8a0] (running) 32
1 floatx80_div + 500 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389592) [0x104bef898] 33
1 parts128_div + 292 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389592) [0x104bef898] 33
1 frac128_div + 244 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389592) [0x104bef898] 33
1 add192 + 4 (softfloat-macros.h:460,14 in qemu-system-x86_64 + 3389592) [0x104bef898] 33
1 uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389592) [0x104bef898] (running) 33
11 floatx80_div + 496 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389588) [0x104bef894] 34-44
11 parts128_div + 288 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389588) [0x104bef894] 34-44
11 frac128_div + 240 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389588) [0x104bef894] 34-44
11 add192 + 0 (softfloat-macros.h:459,14 in qemu-system-x86_64 + 3389588) [0x104bef894] 34-44
11 uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389588) [0x104bef894] (running) 34-44
1 floatx80_div + 508 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389600) [0x104bef8a0] 45
1 parts128_div + 300 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389600) [0x104bef8a0] 45
1 frac128_div + 252 (softfloat.c:1053,11 in qemu-system-x86_64 + 3389600) [0x104bef8a0] (running) 45
3 floatx80_div + 496 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389588) [0x104bef894] 46-48
3 parts128_div + 288 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389588) [0x104bef894] 46-48
3 frac128_div + 240 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389588) [0x104bef894] 46-48
3 add192 + 0 (softfloat-macros.h:459,14 in qemu-system-x86_64 + 3389588) [0x104bef894] 46-48
3 uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389588) [0x104bef894] (running) 46-48
complexity is the core of simplicity
-
- Member
- Posts: 5560
- Joined: Mon Mar 25, 2013 7:01 pm
Re: QEMU hangs on M2 MacBook running Ventura
What happens if you replace line 574 with "#if 0"? That should remove the call to __builtin_addcl without changing anything else.prajwal wrote:The line in qemu where it was hanging was "host-utils.h:576" - which is a call to function "__builtin_addcll"
If that fixes it, it narrows down the possibilities a bit: it could be QEMU's build scripts enabling unsupported instructions, it could be a bug in the compiler (Clang?), or it could be a bug in the CPU.
Re: QEMU hangs on M2 MacBook running Ventura
I narrowed the problem to usub64_borrow()
The problem seems to be with __builtin_subcll() call actually
After I put some debug printf, I found that (in my case), when I start qemu,
the first call to usub64_borrow is for x = 0, y = 0, carryin = 0, and the output result = 0 and carryout = 0
the next call to usub64_borrow is for x = 7205759403792793600, y = 7205759402719051776, carryin = 0, and the output result = 1073741824 and carryout = 1
The carryout must be 0 in this case but instead it is coming as 1. If I mark the code as #if 0 and execute the other alternate, it works fine
By returning the incorrect carryout = 1, the calculation actually gets into an infinite loop and hence qemu is hanging
I tried above calculation in a sample c program but it works fine there. The problem exists only as part of qemu execution
PS: The weird thing I observed is if I pass carryin as "0" - as a hardcoded param to __bulitin_subcll() instead of passing *pborrow, then it works as expected, i.e. the carryout comes back as 0
The problem seems to be with __builtin_subcll() call actually
After I put some debug printf, I found that (in my case), when I start qemu,
the first call to usub64_borrow is for x = 0, y = 0, carryin = 0, and the output result = 0 and carryout = 0
the next call to usub64_borrow is for x = 7205759403792793600, y = 7205759402719051776, carryin = 0, and the output result = 1073741824 and carryout = 1
The carryout must be 0 in this case but instead it is coming as 1. If I mark the code as #if 0 and execute the other alternate, it works fine
By returning the incorrect carryout = 1, the calculation actually gets into an infinite loop and hence qemu is hanging
I tried above calculation in a sample c program but it works fine there. The problem exists only as part of qemu execution
PS: The weird thing I observed is if I pass carryin as "0" - as a hardcoded param to __bulitin_subcll() instead of passing *pborrow, then it works as expected, i.e. the carryout comes back as 0
complexity is the core of simplicity
-
- Member
- Posts: 5560
- Joined: Mon Mar 25, 2013 7:01 pm
Re: QEMU hangs on M2 MacBook running Ventura
I think you'll have to either disassemble usub64_borrow() or step through it at the instruction level to see what's going on. That might be difficult if it's been inlined, but there are ways to track it down.
Re: QEMU hangs on M2 MacBook running Ventura
I suspect there is some compiler optimisation happening with bool to uint64_t conversions - so, I have below patch that makes it work properly. This patch was actually needed for usub64_borrow() alone - however, for consistency, I modified uadd64_carry() as well
Code: Select all
diff --git a/include/qemu/host-utils.h b/include/qemu/host-utils.h
index 3ce62bf4a5..bc9955a3ad 100644
--- a/include/qemu/host-utils.h
+++ b/include/qemu/host-utils.h
@@ -571,8 +571,8 @@ static inline bool mulu128(uint64_t *plow, uint64_t *phigh, uint64_t factor)
static inline uint64_t uadd64_carry(uint64_t x, uint64_t y, bool *pcarry)
{
#if __has_builtin(__builtin_addcll)
- unsigned long long c = *pcarry;
- x = __builtin_addcll(x, y, c, &c);
+ volatile uint64_t c = *pcarry;
+ x = __builtin_addcll(x, y, c, (uint64_t*)&c);
*pcarry = c & 1;
return x;
#else
@@ -596,8 +596,8 @@ static inline uint64_t uadd64_carry(uint64_t x, uint64_t y, bool *pcarry)
static inline uint64_t usub64_borrow(uint64_t x, uint64_t y, bool *pborrow)
{
#if __has_builtin(__builtin_subcll)
- unsigned long long b = *pborrow;
- x = __builtin_subcll(x, y, b, &b);
+ volatile uint64_t b = *pborrow;
+ x = __builtin_subcll(x, y, b, (uint64_t*)&b);
*pborrow = b & 1;
return x;
#else
complexity is the core of simplicity
-
- Member
- Posts: 5560
- Joined: Mon Mar 25, 2013 7:01 pm
Re: QEMU hangs on M2 MacBook running Ventura
Have you examined the code at the instruction level to see for sure?prajwal wrote:I suspect there is some compiler optimisation happening with bool to uint64_t conversions
That just hides the problem, it doesn't fix it.prajwal wrote:I have below patch that makes it work properly.
Re: QEMU hangs on M2 MacBook running Ventura
I moved the inlined usub64_borrow() function from host-utils.h to fpu/softfloat.c as a non static, non inline function and then reproduced the problem with and without my workaround
Here's the disassembly output of the original (failing) code:
Here's the disassembly output of the modified (working) code:
Here's the disassembly output of the original (failing) code:
Code: Select all
0000000000016f94 <_usub64_borrow>:
; unsigned long long b = *pborrow;
16f94: 48 00 40 39 ldrb w8, [x2]
; x = __builtin_subcll(x, y, b, &b);
16f98: 09 00 01 eb subs x9, x0, x1
16f9c: ea 27 9f 1a cset w10, lo
16fa0: 20 01 08 eb subs x0, x9, x8
16fa4: e8 27 9f 1a cset w8, lo
16fa8: 48 01 08 2a orr w8, w10, w8
; *pborrow = b & 1;
16fac: 48 00 00 39 strb w8, [x2]
; return x;
16fb0: c0 03 5f d6 ret
Code: Select all
; {
17980: ff 43 00 d1 sub sp, sp, #16
; volatile unsigned long long b = *pborrow;
17984: 48 00 40 39 ldrb w8, [x2]
17988: e8 07 00 f9 str x8, [sp, #8]
; x = __builtin_subcll(x, y, b, (uint64_t*)&b);
1798c: e8 07 40 f9 ldr x8, [sp, #8]
17990: 09 00 01 eb subs x9, x0, x1
17994: ea 27 9f 1a cset w10, lo
17998: 20 01 08 eb subs x0, x9, x8
1799c: e8 27 9f 1a cset w8, lo
179a0: 48 01 08 2a orr w8, w10, w8
179a4: e8 07 00 f9 str x8, [sp, #8]
; *pborrow = b & 1;
179a8: e8 07 40 f9 ldr x8, [sp, #8]
179ac: 08 01 00 12 and w8, w8, #0x1
179b0: 48 00 00 39 strb w8, [x2]
; return x;
179b4: ff 43 00 91 add sp, sp, #16
179b8: c0 03 5f d6 ret
complexity is the core of simplicity
-
- Member
- Posts: 5560
- Joined: Mon Mar 25, 2013 7:01 pm
Re: QEMU hangs on M2 MacBook running Ventura
Those two functions do exactly the same thing. There's no difference.
Are you sure the failing version hasn't been inlined anywhere?
Are you sure the failing version hasn't been inlined anywhere?