Page 1 of 1

Qemu-Kvm : NMI delivered to the guest L1 instead of VMExit

Posted: Fri Apr 26, 2019 9:25 am
by parfait
I use Qemu-kvm to run a hypervisor which runs a Linux guest OS.
So a guest OS (Linux in L2) is running on my hypervisor (in L1), which is running on a Linux host (L0) via Qemu-Kvm.

According to Intel SDM
24.6.1 Pin-Based VM-Execution Controls
If the NMI exiting bit (3 of Pin-Based VM-Execution Controls) is 1, non-maskable interrupts (NMIs) cause VM exits. Otherwise, they are
delivered normally using descriptor 2 of the IDT.
So let's say the NMI exiting bit is set.
if an NMI occured during L1's execution, it should be delivered to L1's IDT, but if an NMI occured while L2 is executing, KVM should simulate a VMExit from L2 to L1.
But I noticed that when an NMI occured while the nested guest (L2) is running, KVM delivers it directly to the hypervisor's (L1) IDT instead of triggering a VMExit from the Linux guest; Regardless of the running guest level (L2 or L1)
Has anyone noticed the same?
Is this a desired behavior or a bug?
NB : host CPU : Intel Coreā„¢ i5-4300U

Re: Qemu-Kvm : NMI delivered to the guest L1 instead of VMEx

Posted: Wed May 29, 2019 2:47 pm
by feryno
The question is whether the NMI occurred while your CPU was in L2. What was the source of the NMI? Was it interprocessor interrupt? Imagine this situation:
cpu0 is in L2 and you send a NMI to the cpu0 from cpu1
there are some cpu cycles until the NMI is delivered to cpu0, during which the cpu0 could cause an vm exit (e.g. external interrupt etc).
when the NMI reaches the cpu0 it is already in L1 so it is delivered via L1 IDT

I had to solve similar situation in my intel hypervisor while developing capabilities for nesting of hyper-v (my hypervisor parent L0 - loaded via uefi, hyper-v child L1, OS L2). Sometimes the NMI came while the cpu was running my parent hypervisor (L0) although my hypervisor never send any NMI (hyper-v sends these NMIs quite frequently as a way of interprocessor communication). So my hypervisor captured NMIs using its IDT in L0 and then I had to reinject NMIs back into child (hyper-v in L1) which was not trivial (NMI window exiting + another trick).
On intel, you can't block NMI easily. On AMD you can (GIF=0) so on AMD on every #VMEXIT NMIs are blocked because every #VMEXIT sets GIF=0. At intel on vmexit from L2 the NMIs are not blocked so the NMI could be delivered via parent IDT.

Another idea:
while the cpu did vmexit because of the NMI occurred really in L2 and the nmi exiting bit enabled, so now the cpu is already in L1, another NMI may come which is delivered via L1 IDT and is handled immediately (final instruction IRETQ) before handling of the first NMI completes via vm exit handler (final instruction vmresume).

NMIs on Intel are very nasty, sometimes they came when you do not expect them and when you don't want them.