QEMU hangs on M2 MacBook running Ventura

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
User avatar
prajwal
Member
Member
Posts: 154
Joined: Sat Oct 23, 2004 11:00 pm
Contact:

QEMU hangs on M2 MacBook running Ventura

Post by prajwal »

Hi,

I installed QEMU from brew on M2 Macbook running Ventura 13.4
The Qemu version is 8.0.2

My OS is 32 bit protected mode OS for x86 arch. It is cross compiled using i686-elf-gcc. I am running it using qemu-system-x86_64

The same setup on WSL (windows) works fine but on MacOS, I traced that QEMU hangs when it encounters FPU instruction "FILDLL" or "FDIVRP" - either one of them

If I remove that floating point math code then my OS boots fine. I confirmed that the generated assembly on WSL also has same FILDLL and FDIVRP instructions and it is working fine there

When I looked at the 'out_asm' debug logs of Qemu on MacOS, the last few lines before hanging are a bunch of (I think around 8 ) .quad lines.

I have verified that there are no guest_errors, interrupts from debug logs etc.

Does anyone know what could be wrong here ?

My FPU initialisation is done fine. I verified that CPU has builtin FPU, EM is cleared, MP and NE bits are set in CR0. SSE is also enabled

This is my entire command line
qemu-system-x86_64 \
-serial file:serial_debug.log \
-pflash ./OVMF.fd \
-m 512 \
-smp 1 \
-usb \
-d guest_errors,int,out_asm \
-drive if=none,id=usbbootdrive,file=$BOOT_DRIVE \
-drive if=none,id=usbdrive1,file=$UPANIX_HOME/USBImage/300MUSB_ehci.img \
-device usb-ehci,id=ehci \
-device usb-storage,bus=ehci.0,drive=usbdrive1 \
-device nec-usb-xhci,id=xhci \
-device usb-storage,bus=xhci.0,port=1,drive=usbbootdrive \
-device usb-hub,bus=xhci.0,port=3

Regards
Prajwal
complexity is the core of simplicity
Octocontrabass
Member
Member
Posts: 5560
Joined: Mon Mar 25, 2013 7:01 pm

Re: QEMU hangs on M2 MacBook running Ventura

Post by Octocontrabass »

prajwal wrote:The same setup on WSL (windows) works fine but on MacOS, I traced that QEMU hangs when it encounters FPU instruction "FILDLL" or "FDIVRP" - either one of them
Did you run any other FPU instructions before these?
prajwal wrote:I verified that CPU has builtin FPU, EM is cleared, MP and NE bits are set in CR0.
How about CR0.TS?
User avatar
prajwal
Member
Member
Posts: 154
Joined: Sat Oct 23, 2004 11:00 pm
Contact:

Re: QEMU hangs on M2 MacBook running Ventura

Post by prajwal »

Yes, I have run other FPU instructions before this. I modified the code to do the same math operation using function local variables instead of those that came as function parameters. The compiler generated instructions in this case, did not have FILDLL and FDIVRP but continued to include FSTPL and FLDL. (fyi: I am not doing code optimisation by passing -O0 param)
complexity is the core of simplicity
Octocontrabass
Member
Member
Posts: 5560
Joined: Mon Mar 25, 2013 7:01 pm

Re: QEMU hangs on M2 MacBook running Ventura

Post by Octocontrabass »

Does the behavior change according to the operands to the FILD or FDIVRP instructions? Is there enough room in the x87 stack for FILD?

Can you turn on one-insn-per-tb and log out_asm where QEMU hangs?

Can you run QEMU itself under a debugger to see why it hangs?
User avatar
prajwal
Member
Member
Posts: 154
Joined: Sat Oct 23, 2004 11:00 pm
Contact:

Re: QEMU hangs on M2 MacBook running Ventura

Post by prajwal »

I tried using LLDB but couldn't succeed is single stepping beyond the first breakpoint at main function.

In any case, I tried couple of other things. I downloaded and built qemu-7.2.3 on my Mac M2 Ventura 13.4 successfully and then used qemu-system-x86_64 from that build. That failed (got froze) at the same point

one-insn-per-tb is not supported in qemu-8.0.2, so I used -singlestep option with -d out_asm,in_asm and this is the output before qemu froze

Code: Select all

IN:
0x0019cadf:  de f9                    fdivrp   %st(1)

OUT: [size=72]
  -- guest addr 0x0000000000000adf + tb prologue
0x10ca72200:  b85f0274  ldur     w20, [x19, #-0x10]
0x10ca72204:  7100029f  cmp      w20, #0
0x10ca72208:  540001cb  b.lt     #0x10ca72240
0x10ca7220c:  aa1303e0  mov      x0, x19
0x10ca72210:  52800021  movz     w1, #0x1
0x10ca72214:  9602645a  bl       #0x104b0b37c
0x10ca72218:  aa1303e0  mov      x0, x19
0x10ca7221c:  96026166  bl       #0x104b0a7b4
0x10ca72220:  b940d274  ldr      w20, [x19, #0xd0]
0x10ca72224:  7905e674  strh     w20, [x19, #0x2f2]
0x10ca72228:  f9404274  ldr      x20, [x19, #0x80]
0x10ca7222c:  f9017e74  str      x20, [x19, #0x2f8]
0x10ca72230:  91000a94  add      x20, x20, #2
0x10ca72234:  2a1403f4  mov      w20, w20
0x10ca72238:  f9004274  str      x20, [x19, #0x80]
0x10ca7223c:  16a3a77b  b        #0x10735c028
0x10ca72240:  70fff600  adr      x0, #0x10ca72103
0x10ca72244:  16a3a77a  b        #0x10735c02c
Any clue from above code on what could be going wrong ?
complexity is the core of simplicity
User avatar
prajwal
Member
Member
Posts: 154
Joined: Sat Oct 23, 2004 11:00 pm
Contact:

Re: QEMU hangs on M2 MacBook running Ventura

Post by prajwal »

In addition, if this information helps - my OS successfully boots (from USB) and runs on my real laptop, that has x86_64 processor
complexity is the core of simplicity
Octocontrabass
Member
Member
Posts: 5560
Joined: Mon Mar 25, 2013 7:01 pm

Re: QEMU hangs on M2 MacBook running Ventura

Post by Octocontrabass »

prajwal wrote:I tried using LLDB but couldn't succeed is single stepping beyond the first breakpoint at main function.
Instead of setting a breakpoint, hang QEMU and send SIGINT as your breakpoint. That should let you see what QEMU is doing when it hangs.

Code: Select all

0x0019cadf:  de f9                    fdivrp   %st(1)
In Intel syntax this is FDIVP, not FDIVRP.

Code: Select all

0x10ca72200:  b85f0274  ldur     w20, [x19, #-0x10]
0x10ca72204:  7100029f  cmp      w20, #0
0x10ca72208:  540001cb  b.lt     #0x10ca72240
This is checking for some kind of exception. I'm not sure what, though.

Code: Select all

0x10ca7220c:  aa1303e0  mov      x0, x19
0x10ca72210:  52800021  movz     w1, #0x1
0x10ca72214:  9602645a  bl       #0x104b0b37c
This is a call to helper_fdiv_STN_ST0().

Code: Select all

0x10ca72218:  aa1303e0  mov      x0, x19
0x10ca7221c:  96026166  bl       #0x104b0a7b4
This is a call to helper_fpop().

Code: Select all

0x10ca72220:  b940d274  ldr      w20, [x19, #0xd0]
0x10ca72224:  7905e674  strh     w20, [x19, #0x2f2]
0x10ca72228:  f9404274  ldr      x20, [x19, #0x80]
0x10ca7222c:  f9017e74  str      x20, [x19, #0x2f8]
0x10ca72230:  91000a94  add      x20, x20, #2
0x10ca72234:  2a1403f4  mov      w20, w20
0x10ca72238:  f9004274  str      x20, [x19, #0x80]
This is updating the FPU instruction pointer and adding 2 to EIP.

Code: Select all

0x10ca7223c:  16a3a77b  b        #0x10735c028
0x10ca72240:  70fff600  adr      x0, #0x10ca72103
0x10ca72244:  16a3a77a  b        #0x10735c02c
This is exiting the translation block. There's a normal exit and an exception exit.
prajwal wrote:Any clue from above code on what could be going wrong ?
I don't see anything wrong in the generated code. I don't think the helper functions are doing anything crazy enough to cause a problem either, especially since they seem to work fine on other CPUs.

At this point you might have better luck coming up with the smallest program that replicates the problem and submitting a bug report to the QEMU developers.
User avatar
prajwal
Member
Member
Posts: 154
Joined: Sat Oct 23, 2004 11:00 pm
Contact:

Re: QEMU hangs on M2 MacBook running Ventura

Post by prajwal »

Thank you. I looked at the qemu crash report @ /Library/Logs/DiagnosticReports and found below thread stack dump.

The line in qemu where it was hanging was "host-utils.h:576" - which is a call to function "__builtin_addcll"

I then modified "include/qemu/compiler.h" and forced (redefined) "#define __has_builtin(x) 0" and recompiled qemu-7.2.3. It threw a bunch of warnings for overriding the definition of __has_builtin(x) but compilation went through fine.

This time, it all worked fine - the OS ran successfully! So, at this point - the issue points to __builtin_addcll().

Appreciate if the details shared so far helps find the root cause - so, I don't have to go with this hack/workaround.

Code: Select all

static inline uint64_t uadd64_carry(uint64_t x, uint64_t y, bool *pcarry)
{
#if __has_builtin(__builtin_addcll)
    unsigned long long c = *pcarry;
    x = __builtin_addcll(x, y, c, &c); // This is the line with problem
    *pcarry = c & 1;
    return x;
#else
    bool c = *pcarry;
    /* This is clang's internal expansion of __builtin_addc. */
    c = uadd64_overflow(x, c, &x);
    c |= uadd64_overflow(x, y, &x);
    *pcarry = c;
    return x;
#endif
}

Code: Select all

  Thread 0xecd2a    48 samples (1-48)    priority 31 (base 31)    cpu time 4.698s (16.4G cycles, 43.1G instructions, 0.38c/i)
  <process frontmost, thread QoS default (requested default), process unclamped, process received importance donation from WindowServer [350], IO tier 0>
  48  thread_start + 8 (libsystem_pthread.dylib + 7584) [0x18c196da0] 1-48
    48  _pthread_start + 148 (libsystem_pthread.dylib + 28584) [0x18c19bfa8] 1-48
      48  qemu_thread_start + 128 (qemu-thread-posix.c:505,9 in qemu-system-x86_64 + 5037844) [0x104d81f14] 1-48
        48  rr_cpu_thread_fn + 480 (tcg-accel-ops-rr.c:223,21 in qemu-system-x86_64 + 3606504) [0x104c247e8] 1-48
          48  tcg_cpus_exec + 44 (tcg-accel-ops.c:69,11 in qemu-system-x86_64 + 3603396) [0x104c23bc4] 1-48
            48  cpu_exec + 1764 (cpu-exec.c:1032,13 in qemu-system-x86_64 + 3469648) [0x104c03150] 1-48
              48  cpu_loop_exec_tb + 32 (cpu-exec.c:868,10 in qemu-system-x86_64 + 3469648) [0x104c03150] 1-48
      48  qemu_thread_start + 128 (qemu-thread-posix.c:505,9 in qemu-system-x86_64 + 5037844) [0x104d81f14] 1-48
        48  rr_cpu_thread_fn + 480 (tcg-accel-ops-rr.c:223,21 in qemu-system-x86_64 + 3606504) [0x104c247e8] 1-48
          48  tcg_cpus_exec + 44 (tcg-accel-ops.c:69,11 in qemu-system-x86_64 + 3603396) [0x104c23bc4] 1-48
            48  cpu_exec + 1764 (cpu-exec.c:1032,13 in qemu-system-x86_64 + 3469648) [0x104c03150] 1-48
              48  cpu_loop_exec_tb + 32 (cpu-exec.c:868,10 in qemu-system-x86_64 + 3469648) [0x104c03150] 1-48
                48  cpu_tb_exec + 148 (cpu-exec.c:438,11 in qemu-system-x86_64 + 3467428) [0x104c028a4] 1-48
                  48  ??? [0x10ca72218] 1-48
                    48  helper_fdiv_STN_ST0 + 76 (fpu_helper.c:578,10 in qemu-system-x86_64 + 2454472) [0x104b0b3c8] 1-48
                      48  helper_fdiv + 16 (fpu_helper.c:159,20 in qemu-system-x86_64 + 2454472) [0x104b0b3c8] 1-48
                        1   floatx80_div + 496 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389588) [0x104bef894] 1
                          1   parts128_div + 288 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389588) [0x104bef894] 1
                            1   frac128_div + 240 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389588) [0x104bef894] 1
                              1   add192 + 0 (softfloat-macros.h:459,14 in qemu-system-x86_64 + 3389588) [0x104bef894] 1
                                1   uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389588) [0x104bef894] (running) 1
                        1   floatx80_div + 512 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389604) [0x104bef8a4] 2
                          1   parts128_div + 304 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389604) [0x104bef8a4] 2
                            1   frac128_div + 256 (softfloat.c:1052,5 in qemu-system-x86_64 + 3389604) [0x104bef8a4] (running) 2
                        1   floatx80_div + 496 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389588) [0x104bef894] 3
                          1   parts128_div + 288 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389588) [0x104bef894] 3
                            1   frac128_div + 240 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389588) [0x104bef894] 3
                              1   add192 + 0 (softfloat-macros.h:459,14 in qemu-system-x86_64 + 3389588) [0x104bef894] 3
                                1   uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389588) [0x104bef894] (running) 3
                        1   floatx80_div + 512 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389604) [0x104bef8a4] 4
                          1   parts128_div + 304 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389604) [0x104bef8a4] 4
                            1   frac128_div + 256 (softfloat.c:1052,5 in qemu-system-x86_64 + 3389604) [0x104bef8a4] (running) 4
                        3   floatx80_div + 496 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389588) [0x104bef894] 5-7
                          3   parts128_div + 288 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389588) [0x104bef894] 5-7
                            3   frac128_div + 240 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389588) [0x104bef894] 5-7
                              3   add192 + 0 (softfloat-macros.h:459,14 in qemu-system-x86_64 + 3389588) [0x104bef894] 5-7
                                3   uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389588) [0x104bef894] (running) 5-7
                        1   floatx80_div + 512 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389604) [0x104bef8a4] 8
                          1   parts128_div + 304 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389604) [0x104bef8a4] 8
                            1   frac128_div + 256 (softfloat.c:1052,5 in qemu-system-x86_64 + 3389604) [0x104bef8a4] (running) 8
                        3   floatx80_div + 496 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389588) [0x104bef894] 9-11
                          3   parts128_div + 288 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389588) [0x104bef894] 9-11
                            3   frac128_div + 240 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389588) [0x104bef894] 9-11
                              3   add192 + 0 (softfloat-macros.h:459,14 in qemu-system-x86_64 + 3389588) [0x104bef894] 9-11
                                3   uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389588) [0x104bef894] (running) 9-11
                        1   floatx80_div + 500 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389592) [0x104bef898] 12
                          1   parts128_div + 292 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389592) [0x104bef898] 12
                            1   frac128_div + 244 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389592) [0x104bef898] 12
                              1   add192 + 4 (softfloat-macros.h:460,14 in qemu-system-x86_64 + 3389592) [0x104bef898] 12
                                1   uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389592) [0x104bef898] (running) 12
                        3   floatx80_div + 496 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389588) [0x104bef894] 13-15
                          3   parts128_div + 288 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389588) [0x104bef894] 13-15
                            3   frac128_div + 240 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389588) [0x104bef894] 13-15
                              3   add192 + 0 (softfloat-macros.h:459,14 in qemu-system-x86_64 + 3389588) [0x104bef894] 13-15
                                3   uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389588) [0x104bef894] (running) 13-15
                        1   floatx80_div + 500 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389592) [0x104bef898] 16
                          1   parts128_div + 292 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389592) [0x104bef898] 16
                            1   frac128_div + 244 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389592) [0x104bef898] 16
                              1   add192 + 4 (softfloat-macros.h:460,14 in qemu-system-x86_64 + 3389592) [0x104bef898] 16
                                1   uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389592) [0x104bef898] (running) 16
                        1   floatx80_div + 512 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389604) [0x104bef8a4] 17
                          1   parts128_div + 304 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389604) [0x104bef8a4] 17
                            1   frac128_div + 256 (softfloat.c:1052,5 in qemu-system-x86_64 + 3389604) [0x104bef8a4] (running) 17
                        11  floatx80_div + 496 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389588) [0x104bef894] 18-28
                          11  parts128_div + 288 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389588) [0x104bef894] 18-28
                            11  frac128_div + 240 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389588) [0x104bef894] 18-28
                              11  add192 + 0 (softfloat-macros.h:459,14 in qemu-system-x86_64 + 3389588) [0x104bef894] 18-28
                                11  uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389588) [0x104bef894] (running) 18-28
                        1   floatx80_div + 508 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389600) [0x104bef8a0] 29
                          1   parts128_div + 300 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389600) [0x104bef8a0] 29
                            1   frac128_div + 252 (softfloat.c:1053,11 in qemu-system-x86_64 + 3389600) [0x104bef8a0] (running) 29
                        2   floatx80_div + 496 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389588) [0x104bef894] 30-31
                          2   parts128_div + 288 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389588) [0x104bef894] 30-31
                            2   frac128_div + 240 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389588) [0x104bef894] 30-31
                              2   add192 + 0 (softfloat-macros.h:459,14 in qemu-system-x86_64 + 3389588) [0x104bef894] 30-31
                                2   uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389588) [0x104bef894] (running) 30-31
                        1   floatx80_div + 508 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389600) [0x104bef8a0] 32
                          1   parts128_div + 300 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389600) [0x104bef8a0] 32
                            1   frac128_div + 252 (softfloat.c:1053,11 in qemu-system-x86_64 + 3389600) [0x104bef8a0] (running) 32
                        1   floatx80_div + 500 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389592) [0x104bef898] 33
                          1   parts128_div + 292 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389592) [0x104bef898] 33
                            1   frac128_div + 244 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389592) [0x104bef898] 33
                              1   add192 + 4 (softfloat-macros.h:460,14 in qemu-system-x86_64 + 3389592) [0x104bef898] 33
                                1   uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389592) [0x104bef898] (running) 33
                        11  floatx80_div + 496 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389588) [0x104bef894] 34-44
                          11  parts128_div + 288 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389588) [0x104bef894] 34-44
                            11  frac128_div + 240 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389588) [0x104bef894] 34-44
                              11  add192 + 0 (softfloat-macros.h:459,14 in qemu-system-x86_64 + 3389588) [0x104bef894] 34-44
                                11  uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389588) [0x104bef894] (running) 34-44
                        1   floatx80_div + 508 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389600) [0x104bef8a0] 45
                          1   parts128_div + 300 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389600) [0x104bef8a0] 45
                            1   frac128_div + 252 (softfloat.c:1053,11 in qemu-system-x86_64 + 3389600) [0x104bef8a0] (running) 45
                        3   floatx80_div + 496 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389588) [0x104bef894] 46-48
                              3   add192 + 0 (softfloat-macros.h:459,14 in qemu-system-x86_64 + 3389588) [0x104bef894] 9-11
                                3   uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389588) [0x104bef894] (running) 9-11
                        1   floatx80_div + 500 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389592) [0x104bef898] 12
                          1   parts128_div + 292 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389592) [0x104bef898] 12
                            1   frac128_div + 244 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389592) [0x104bef898] 12
                              1   add192 + 4 (softfloat-macros.h:460,14 in qemu-system-x86_64 + 3389592) [0x104bef898] 12
                                1   uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389592) [0x104bef898] (running) 12
                        3   floatx80_div + 496 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389588) [0x104bef894] 13-15
                          3   parts128_div + 288 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389588) [0x104bef894] 13-15
                            3   frac128_div + 240 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389588) [0x104bef894] 13-15
                              3   add192 + 0 (softfloat-macros.h:459,14 in qemu-system-x86_64 + 3389588) [0x104bef894] 13-15
                                3   uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389588) [0x104bef894] (running) 13-15
                        1   floatx80_div + 500 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389592) [0x104bef898] 16
                          1   parts128_div + 292 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389592) [0x104bef898] 16
                            1   frac128_div + 244 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389592) [0x104bef898] 16
                              1   add192 + 4 (softfloat-macros.h:460,14 in qemu-system-x86_64 + 3389592) [0x104bef898] 16
                                1   uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389592) [0x104bef898] (running) 16
                        1   floatx80_div + 512 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389604) [0x104bef8a4] 17
                          1   parts128_div + 304 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389604) [0x104bef8a4] 17
                            1   frac128_div + 256 (softfloat.c:1052,5 in qemu-system-x86_64 + 3389604) [0x104bef8a4] (running) 17
                        11  floatx80_div + 496 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389588) [0x104bef894] 18-28
                          11  parts128_div + 288 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389588) [0x104bef894] 18-28
                            11  frac128_div + 240 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389588) [0x104bef894] 18-28
                              11  add192 + 0 (softfloat-macros.h:459,14 in qemu-system-x86_64 + 3389588) [0x104bef894] 18-28
                                11  uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389588) [0x104bef894] (running) 18-28
                        1   floatx80_div + 508 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389600) [0x104bef8a0] 29
                          1   parts128_div + 300 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389600) [0x104bef8a0] 29
                            1   frac128_div + 252 (softfloat.c:1053,11 in qemu-system-x86_64 + 3389600) [0x104bef8a0] (running) 29
                        2   floatx80_div + 496 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389588) [0x104bef894] 30-31
                          2   parts128_div + 288 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389588) [0x104bef894] 30-31
                            2   frac128_div + 240 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389588) [0x104bef894] 30-31
                              2   add192 + 0 (softfloat-macros.h:459,14 in qemu-system-x86_64 + 3389588) [0x104bef894] 30-31
                                2   uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389588) [0x104bef894] (running) 30-31
                        1   floatx80_div + 508 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389600) [0x104bef8a0] 32
                          1   parts128_div + 300 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389600) [0x104bef8a0] 32
                            1   frac128_div + 252 (softfloat.c:1053,11 in qemu-system-x86_64 + 3389600) [0x104bef8a0] (running) 32
                        1   floatx80_div + 500 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389592) [0x104bef898] 33
                          1   parts128_div + 292 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389592) [0x104bef898] 33
                            1   frac128_div + 244 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389592) [0x104bef898] 33
                              1   add192 + 4 (softfloat-macros.h:460,14 in qemu-system-x86_64 + 3389592) [0x104bef898] 33
                                1   uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389592) [0x104bef898] (running) 33
                        11  floatx80_div + 496 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389588) [0x104bef894] 34-44
                          11  parts128_div + 288 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389588) [0x104bef894] 34-44
                            11  frac128_div + 240 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389588) [0x104bef894] 34-44
                              11  add192 + 0 (softfloat-macros.h:459,14 in qemu-system-x86_64 + 3389588) [0x104bef894] 34-44
                                11  uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389588) [0x104bef894] (running) 34-44
                        1   floatx80_div + 508 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389600) [0x104bef8a0] 45
                          1   parts128_div + 300 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389600) [0x104bef8a0] 45
                            1   frac128_div + 252 (softfloat.c:1053,11 in qemu-system-x86_64 + 3389600) [0x104bef8a0] (running) 45
                        3   floatx80_div + 496 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389588) [0x104bef894] 46-48
                          3   parts128_div + 288 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389588) [0x104bef894] 46-48
                            3   frac128_div + 240 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389588) [0x104bef894] 46-48
                              3   add192 + 0 (softfloat-macros.h:459,14 in qemu-system-x86_64 + 3389588) [0x104bef894] 46-48
                                3   uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389588) [0x104bef894] (running) 46-48
                            1   frac128_div + 244 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389592) [0x104bef898] 12
                              1   add192 + 4 (softfloat-macros.h:460,14 in qemu-system-x86_64 + 3389592) [0x104bef898] 12
                                1   uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389592) [0x104bef898] (running) 12
                        3   floatx80_div + 496 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389588) [0x104bef894] 13-15
                          3   parts128_div + 288 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389588) [0x104bef894] 13-15
                            3   frac128_div + 240 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389588) [0x104bef894] 13-15
                              3   add192 + 0 (softfloat-macros.h:459,14 in qemu-system-x86_64 + 3389588) [0x104bef894] 13-15
                                3   uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389588) [0x104bef894] (running) 13-15
                        1   floatx80_div + 500 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389592) [0x104bef898] 16
                          1   parts128_div + 292 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389592) [0x104bef898] 16
                            1   frac128_div + 244 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389592) [0x104bef898] 16
                              1   add192 + 4 (softfloat-macros.h:460,14 in qemu-system-x86_64 + 3389592) [0x104bef898] 16
                                1   uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389592) [0x104bef898] (running) 16
                        1   floatx80_div + 512 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389604) [0x104bef8a4] 17
                          1   parts128_div + 304 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389604) [0x104bef8a4] 17
                            1   frac128_div + 256 (softfloat.c:1052,5 in qemu-system-x86_64 + 3389604) [0x104bef8a4] (running) 17
                        11  floatx80_div + 496 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389588) [0x104bef894] 18-28
                          11  parts128_div + 288 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389588) [0x104bef894] 18-28
                            11  frac128_div + 240 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389588) [0x104bef894] 18-28
                              11  add192 + 0 (softfloat-macros.h:459,14 in qemu-system-x86_64 + 3389588) [0x104bef894] 18-28
                                11  uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389588) [0x104bef894] (running) 18-28
                        1   floatx80_div + 508 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389600) [0x104bef8a0] 29
                          1   parts128_div + 300 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389600) [0x104bef8a0] 29
                            1   frac128_div + 252 (softfloat.c:1053,11 in qemu-system-x86_64 + 3389600) [0x104bef8a0] (running) 29
                        2   floatx80_div + 496 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389588) [0x104bef894] 30-31
                          2   parts128_div + 288 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389588) [0x104bef894] 30-31
                            2   frac128_div + 240 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389588) [0x104bef894] 30-31
                              2   add192 + 0 (softfloat-macros.h:459,14 in qemu-system-x86_64 + 3389588) [0x104bef894] 30-31
                                2   uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389588) [0x104bef894] (running) 30-31
                        1   floatx80_div + 508 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389600) [0x104bef8a0] 32
                          1   parts128_div + 300 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389600) [0x104bef8a0] 32
                            1   frac128_div + 252 (softfloat.c:1053,11 in qemu-system-x86_64 + 3389600) [0x104bef8a0] (running) 32
                        1   floatx80_div + 500 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389592) [0x104bef898] 33
                          1   parts128_div + 292 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389592) [0x104bef898] 33
                            1   frac128_div + 244 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389592) [0x104bef898] 33
                              1   add192 + 4 (softfloat-macros.h:460,14 in qemu-system-x86_64 + 3389592) [0x104bef898] 33
                                1   uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389592) [0x104bef898] (running) 33
                        11  floatx80_div + 496 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389588) [0x104bef894] 34-44
                          11  parts128_div + 288 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389588) [0x104bef894] 34-44
                            11  frac128_div + 240 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389588) [0x104bef894] 34-44
                              11  add192 + 0 (softfloat-macros.h:459,14 in qemu-system-x86_64 + 3389588) [0x104bef894] 34-44
                                11  uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389588) [0x104bef894] (running) 34-44
                        1   floatx80_div + 508 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389600) [0x104bef8a0] 45
                          1   parts128_div + 300 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389600) [0x104bef8a0] 45
                            1   frac128_div + 252 (softfloat.c:1053,11 in qemu-system-x86_64 + 3389600) [0x104bef8a0] (running) 45
                        3   floatx80_div + 496 (softfloat.c:2560,10 in qemu-system-x86_64 + 3389588) [0x104bef894] 46-48
                          3   parts128_div + 288 (softfloat-parts.c.inc:605,28 in qemu-system-x86_64 + 3389588) [0x104bef894] 46-48
                            3   frac128_div + 240 (softfloat.c:1054,9 in qemu-system-x86_64 + 3389588) [0x104bef894] 46-48
                              3   add192 + 0 (softfloat-macros.h:459,14 in qemu-system-x86_64 + 3389588) [0x104bef894] 46-48
                                3   uadd64_carry + 0 (host-utils.h:576,9 in qemu-system-x86_64 + 3389588) [0x104bef894] (running) 46-48
complexity is the core of simplicity
Octocontrabass
Member
Member
Posts: 5560
Joined: Mon Mar 25, 2013 7:01 pm

Re: QEMU hangs on M2 MacBook running Ventura

Post by Octocontrabass »

prajwal wrote:The line in qemu where it was hanging was "host-utils.h:576" - which is a call to function "__builtin_addcll"
What happens if you replace line 574 with "#if 0"? That should remove the call to __builtin_addcl without changing anything else.

If that fixes it, it narrows down the possibilities a bit: it could be QEMU's build scripts enabling unsupported instructions, it could be a bug in the compiler (Clang?), or it could be a bug in the CPU.
User avatar
prajwal
Member
Member
Posts: 154
Joined: Sat Oct 23, 2004 11:00 pm
Contact:

Re: QEMU hangs on M2 MacBook running Ventura

Post by prajwal »

I narrowed the problem to usub64_borrow()

The problem seems to be with __builtin_subcll() call actually

After I put some debug printf, I found that (in my case), when I start qemu,

the first call to usub64_borrow is for x = 0, y = 0, carryin = 0, and the output result = 0 and carryout = 0
the next call to usub64_borrow is for x = 7205759403792793600, y = 7205759402719051776, carryin = 0, and the output result = 1073741824 and carryout = 1

The carryout must be 0 in this case but instead it is coming as 1. If I mark the code as #if 0 and execute the other alternate, it works fine

By returning the incorrect carryout = 1, the calculation actually gets into an infinite loop and hence qemu is hanging

I tried above calculation in a sample c program but it works fine there. The problem exists only as part of qemu execution

PS: The weird thing I observed is if I pass carryin as "0" - as a hardcoded param to __bulitin_subcll() instead of passing *pborrow, then it works as expected, i.e. the carryout comes back as 0
complexity is the core of simplicity
Octocontrabass
Member
Member
Posts: 5560
Joined: Mon Mar 25, 2013 7:01 pm

Re: QEMU hangs on M2 MacBook running Ventura

Post by Octocontrabass »

I think you'll have to either disassemble usub64_borrow() or step through it at the instruction level to see what's going on. That might be difficult if it's been inlined, but there are ways to track it down.
User avatar
prajwal
Member
Member
Posts: 154
Joined: Sat Oct 23, 2004 11:00 pm
Contact:

Re: QEMU hangs on M2 MacBook running Ventura

Post by prajwal »

I suspect there is some compiler optimisation happening with bool to uint64_t conversions - so, I have below patch that makes it work properly. This patch was actually needed for usub64_borrow() alone - however, for consistency, I modified uadd64_carry() as well

Code: Select all

diff --git a/include/qemu/host-utils.h b/include/qemu/host-utils.h
index 3ce62bf4a5..bc9955a3ad 100644
--- a/include/qemu/host-utils.h
+++ b/include/qemu/host-utils.h
@@ -571,8 +571,8 @@ static inline bool mulu128(uint64_t *plow, uint64_t *phigh, uint64_t factor)
 static inline uint64_t uadd64_carry(uint64_t x, uint64_t y, bool *pcarry)
 {
 #if __has_builtin(__builtin_addcll)
-    unsigned long long c = *pcarry;
-    x = __builtin_addcll(x, y, c, &c);
+    volatile uint64_t c = *pcarry;
+    x = __builtin_addcll(x, y, c, (uint64_t*)&c);
     *pcarry = c & 1;
     return x;
 #else
@@ -596,8 +596,8 @@ static inline uint64_t uadd64_carry(uint64_t x, uint64_t y, bool *pcarry)
 static inline uint64_t usub64_borrow(uint64_t x, uint64_t y, bool *pborrow)
 {
 #if __has_builtin(__builtin_subcll)
-    unsigned long long b = *pborrow;
-    x = __builtin_subcll(x, y, b, &b);
+    volatile uint64_t b = *pborrow;
+    x = __builtin_subcll(x, y, b, (uint64_t*)&b);
     *pborrow = b & 1;
     return x;
 #else
complexity is the core of simplicity
Octocontrabass
Member
Member
Posts: 5560
Joined: Mon Mar 25, 2013 7:01 pm

Re: QEMU hangs on M2 MacBook running Ventura

Post by Octocontrabass »

prajwal wrote:I suspect there is some compiler optimisation happening with bool to uint64_t conversions
Have you examined the code at the instruction level to see for sure?
prajwal wrote:I have below patch that makes it work properly.
That just hides the problem, it doesn't fix it.
User avatar
prajwal
Member
Member
Posts: 154
Joined: Sat Oct 23, 2004 11:00 pm
Contact:

Re: QEMU hangs on M2 MacBook running Ventura

Post by prajwal »

I moved the inlined usub64_borrow() function from host-utils.h to fpu/softfloat.c as a non static, non inline function and then reproduced the problem with and without my workaround

Here's the disassembly output of the original (failing) code:

Code: Select all

0000000000016f94 <_usub64_borrow>:
;     unsigned long long b = *pborrow;
   16f94: 48 00 40 39  	ldrb	w8, [x2]
;     x = __builtin_subcll(x, y, b, &b);
   16f98: 09 00 01 eb  	subs	x9, x0, x1
   16f9c: ea 27 9f 1a  	cset	w10, lo
   16fa0: 20 01 08 eb  	subs	x0, x9, x8
   16fa4: e8 27 9f 1a  	cset	w8, lo
   16fa8: 48 01 08 2a  	orr	w8, w10, w8
;     *pborrow = b & 1;
   16fac: 48 00 00 39  	strb	w8, [x2]
;     return x;
   16fb0: c0 03 5f d6  	ret
Here's the disassembly output of the modified (working) code:

Code: Select all

; {
   17980: ff 43 00 d1  	sub	sp, sp, #16
;     volatile unsigned long long b = *pborrow;
   17984: 48 00 40 39  	ldrb	w8, [x2]
   17988: e8 07 00 f9  	str	x8, [sp, #8]
;     x = __builtin_subcll(x, y, b, (uint64_t*)&b);
   1798c: e8 07 40 f9  	ldr	x8, [sp, #8]
   17990: 09 00 01 eb  	subs	x9, x0, x1
   17994: ea 27 9f 1a  	cset	w10, lo
   17998: 20 01 08 eb  	subs	x0, x9, x8
   1799c: e8 27 9f 1a  	cset	w8, lo
   179a0: 48 01 08 2a  	orr	w8, w10, w8
   179a4: e8 07 00 f9  	str	x8, [sp, #8]
;     *pborrow = b & 1;
   179a8: e8 07 40 f9  	ldr	x8, [sp, #8]
   179ac: 08 01 00 12  	and	w8, w8, #0x1
   179b0: 48 00 00 39  	strb	w8, [x2]
;     return x;
   179b4: ff 43 00 91  	add	sp, sp, #16
   179b8: c0 03 5f d6  	ret
complexity is the core of simplicity
Octocontrabass
Member
Member
Posts: 5560
Joined: Mon Mar 25, 2013 7:01 pm

Re: QEMU hangs on M2 MacBook running Ventura

Post by Octocontrabass »

Those two functions do exactly the same thing. There's no difference.

Are you sure the failing version hasn't been inlined anywhere?
Post Reply