[SOLVED] Weird x86-64 QEMU behavior

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
User avatar
pvc
Member
Member
Posts: 201
Joined: Mon Jan 15, 2018 2:27 pm

[SOLVED] Weird x86-64 QEMU behavior

Post by pvc »

I either have an epic brainfart of gargantuan proportions, or there is something weird going on here. My OS started to have some weird crashes in QEMU, so I started to investigate. I've come to conclusion that I have some memory corruption somewhere causing memory access with non-canonical address leading to GPF. I've managed to setup a breakpoint before offending instruction and found something weird. RAX value seems to be completely bogus. I would expect RAX to become 0xffff9000000037bb and not 0x00ff90000000bd4e. Below are some screenshots showing machine state before and after problematic instruction.

It doesn't seem to be debugger artifact, because QEMU monitor says exactly the same.

Code: Select all

QEMU 5.0.0 monitor - type 'help' for more information
(qemu) x /1g $ebp-8
ffff900000003438: 0xffff9000000037bb
(qemu) x /2i 0xffff80000011dba6
0xffff80000011dba6:  48 8b 45 f8              movq     -8(%rbp), %rax
0xffff80000011dbaa:  88 10                    movb     %dl, (%rax)
(qemu) info registers
RAX=00ff90000000bd4e RBX=0000000000000000 RCX=ffff800000138de0 RDX=0000000000000000
RSI=ffff800000138de0 RDI=ffff9000000037b0 RBP=ffff900000003440 RSP=ffff900000003418
R8 =0000000000000002 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
RIP=ffff80000011dbaa RFL=00000086 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0020 0000000000000000 ffffffff 00af9300 DPL=0 DS   [-WA]
CS =0018 0000000000000000 ffffffff 00af9a00 DPL=0 CS64 [-R-]
SS =0020 0000000000000000 ffffffff 00af9300 DPL=0 DS   [-WA]
DS =0020 0000000000000000 ffffffff 00af9300 DPL=0 DS   [-WA]
FS =0030 0000000000000000 ffffffff 00aff300 DPL=3 DS   [-WA]
GS =0030 0000000000000000 ffffffff 00aff300 DPL=3 DS   [-WA]
LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
TR =003b ffff80000014bc30 00000067 00408900 DPL=0 TSS64-avl
GDT=     ffff80000012b000 00001037
IDT=     ffff800000131000 00000fff
. . .
Interrupts are disabled, and QEMUs -d int, doesn't show any interrupts between instructions either.
Attachments
before
before
after
after
Last edited by pvc on Wed Jul 08, 2020 4:34 pm, edited 1 time in total.
Octocontrabass
Member
Member
Posts: 5885
Joined: Mon Mar 25, 2013 7:01 pm

Re: Weird x86-64 QEMU behavior

Post by Octocontrabass »

RAX has a bogus value in your "before" picture as well.

My first thought is that you should check "info mem" to make sure your page tables are pointing to actual memory and not something else.
User avatar
pvc
Member
Member
Posts: 201
Joined: Mon Jan 15, 2018 2:27 pm

Re: Weird x86-64 QEMU behavior

Post by pvc »

True, I didn't see that. It should be 0xffff800000138deb after <+ 145> instruction. I wonder what went wrong.
It sure looks like it's pulling data from some random place, but monitor 'x ...' command shows proper data. 'info mem' and 'info tlb' commands do not show anything out of ordinary either.
Octocontrabass
Member
Member
Posts: 5885
Joined: Mon Mar 25, 2013 7:01 pm

Re: Weird x86-64 QEMU behavior

Post by Octocontrabass »

Would QEMU show anything out of the ordinary if you were looking at MMIO? Reading from MMIO can change the state of the emulated hardware, so it's possible the QEMU monitor will just return the last value written.

(For comparison, MAME - an emulator I am more familiar with - has separate MMIO read handlers for the emulated CPU and for the debugger, so the debugger can read MMIO without changing the emulated hardware state.)
User avatar
pvc
Member
Member
Posts: 201
Joined: Mon Jan 15, 2018 2:27 pm

Re: Weird x86-64 QEMU behavior

Post by pvc »

I get what you're trying to say, but I am sure that page pointed by RBP is mapped to regular memory. 'info tlb' says

Code: Select all

...
ffff900000003000: 0000000000276000 ---DA---W
...
User avatar
bzt
Member
Member
Posts: 1584
Joined: Thu Oct 13, 2016 4:55 pm
Contact:

Re: Weird x86-64 QEMU behavior

Post by bzt »

Just a guess, but it looks like what you are seeing in the window (titled "Qt creator") and what the vm actually reads from memory are two different things. This could happen if you set up the paging tables incorrectly or forgot to issue an INVPLG instruction after changing the mapping. Then the mov rax, [rbp-8] will read from the cache, not from the RAM, this could cause such problems. The problem with this is that it's very hard to catch, as you'll see the RAM in the debugger (along with the new tlb), not the cache (nor the old tlb).

I'm not saying this is what is happening here, all I'm saying this could be one reason. First try to disable cache entirely and see if the problem goes away.

Cheers,
bzt
nexos
Member
Member
Posts: 1081
Joined: Tue Feb 18, 2020 3:29 pm
Libera.chat IRC: nexos

Re: Weird x86-64 QEMU behavior

Post by nexos »

Does it work in another emulator like Bochs?
"How did you do this?"
"It's very simple — you read the protocol and write the code." - Bill Joy
Projects: NexNix | libnex | nnpkg
User avatar
pvc
Member
Member
Posts: 201
Joined: Mon Jan 15, 2018 2:27 pm

Re: Weird x86-64 QEMU behavior

Post by pvc »

Disabling cache with CR0.CD didn't help. I've also inserted few extra CR3 reloads but It didn't change anything either.
I am stumped.
nexos wrote:Does it work in another emulator like Bochs?
It seems to work fine in Bochs. And in QEMU it's intermittent when not debugging and happens 100% of the time when breakpoints are set in that problematic area (even with kvm enabled).
User avatar
pvc
Member
Member
Posts: 201
Joined: Mon Jan 15, 2018 2:27 pm

Re: Weird x86-64 QEMU behavior

Post by pvc »

I found the problem and it was absolute BS.

It was the debugger changing RAX register value every time the breakpoint was hit. I have had conditional breakpoints set up here and there. But my condition expressions were in the form of '$rax = …' instead of '$rax == …'. I would have never guessed that you can modify program state with condition expressions. Well, it seems like it takes about half a day to figure that out…
Post Reply