Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
I either have an epic brainfart of gargantuan proportions, or there is something weird going on here. My OS started to have some weird crashes in QEMU, so I started to investigate. I've come to conclusion that I have some memory corruption somewhere causing memory access with non-canonical address leading to GPF. I've managed to setup a breakpoint before offending instruction and found something weird. RAX value seems to be completely bogus. I would expect RAX to become 0xffff9000000037bb and not 0x00ff90000000bd4e. Below are some screenshots showing machine state before and after problematic instruction.
It doesn't seem to be debugger artifact, because QEMU monitor says exactly the same.
True, I didn't see that. It should be 0xffff800000138deb after <+ 145> instruction. I wonder what went wrong.
It sure looks like it's pulling data from some random place, but monitor 'x ...' command shows proper data. 'info mem' and 'info tlb' commands do not show anything out of ordinary either.
Would QEMU show anything out of the ordinary if you were looking at MMIO? Reading from MMIO can change the state of the emulated hardware, so it's possible the QEMU monitor will just return the last value written.
(For comparison, MAME - an emulator I am more familiar with - has separate MMIO read handlers for the emulated CPU and for the debugger, so the debugger can read MMIO without changing the emulated hardware state.)
Just a guess, but it looks like what you are seeing in the window (titled "Qt creator") and what the vm actually reads from memory are two different things. This could happen if you set up the paging tables incorrectly or forgot to issue an INVPLG instruction after changing the mapping. Then the mov rax, [rbp-8] will read from the cache, not from the RAM, this could cause such problems. The problem with this is that it's very hard to catch, as you'll see the RAM in the debugger (along with the new tlb), not the cache (nor the old tlb).
I'm not saying this is what is happening here, all I'm saying this could be one reason. First try to disable cache entirely and see if the problem goes away.
Disabling cache with CR0.CD didn't help. I've also inserted few extra CR3 reloads but It didn't change anything either.
I am stumped.
nexos wrote:Does it work in another emulator like Bochs?
It seems to work fine in Bochs. And in QEMU it's intermittent when not debugging and happens 100% of the time when breakpoints are set in that problematic area (even with kvm enabled).
It was the debugger changing RAX register value every time the breakpoint was hit. I have had conditional breakpoints set up here and there. But my condition expressions were in the form of '$rax = …' instead of '$rax == …'. I would have never guessed that you can modify program state with condition expressions. Well, it seems like it takes about half a day to figure that out…