linux KVM error_code due to EPT violation VM-exit

cianfa72 · Post by **cianfa72** » Tue May 20, 2025 3:08 am

I would dig into some details on Linux kvm implementation for Intel Vt-x technology.

As you can see here https://github.com/torvalds/linux/blob/ ... mx.c#L5810, it defines the callback function "static int handle_ept_violation (struct kvm_vcpu *vcpu)" to handle VM-exits due to EPT violations.

I'm not expert, so I'm in trouble to fully understand the meaning of those error_code values.

For instance on my QEMU/kvm VM running on Linux 6.7.5 host with EPT enabled, I can see the following trace events:

Code: Select all

root@eve-ng62:~# trace-cmd record -e kvm_page_fault -P 1060614
Hit Ctrl^C to stop recording
<snip...>
root@eve-ng62:~# trace-cmd report
CPU 0 is empty
<snip...>
CPU 47 is empty
cpus=48
 qemu-system-x86-1060614 [036] 2316032.840657: kvm_page_fault:       vcpu 3 rip 0x7f80ed32638b address 0x00000003f7369712 error_code 0x181
 qemu-system-x86-1060614 [036] 2316033.325844: kvm_page_fault:       vcpu 3 rip 0x7f80ed32638b address 0x00000003f9ce5a12 error_code 0x181
 qemu-system-x86-1060614 [036] 2316033.709713: kvm_page_fault:       vcpu 3 rip 0x7f80ed3251d6 address 0x00000003f72089d0 error_code 0x182
root@eve-ng62:~#

here you can see some VM-exits for EPT violation with error_code 0x181, 0x182.

What do such error_codes actually refer to ? Thanks.

Octocontrabass · Post by **Octocontrabass** » Tue May 20, 2025 11:24 am

With EPT enabled, the error_code value reported by this trace event is the VMCS exit qualification field.

cianfa72 · Post by **cianfa72** » Wed May 21, 2025 4:15 am

Octocontrabass wrote: ↑Tue May 20, 2025 11:24 am With EPT enabled, the error_code value reported by this trace event is the VMCS exit qualification field.

From Intel 64 SDM vol 3c my understanding is that, upon an EPT violation VM-exit, the VMCS exit qualification bit 8 is clear (0) if the access causing the EPT violation is targeted to a guest paging-structure entry (i.e. the guest physical address being translated by EPT causing the EPT violation is the GPA of a guest paging-structure entry, e.g. a guest's PML4E, PDPTE, PDE or a PTE entry) or it is an update of an EPT entry's accessed or dirty bit. On the other hand, bit 8 is set (1) when the access causing the EPT violation is targeted to a GPA that is the (already performed/completed) translation of a guest linear/virtual address (GVA).

From table 29-7 of the manual

Code: Select all

root@eve-ng62:~# trace-cmd report
CPU 0 is empty
<snip...>
CPU 47 is empty
cpus=48
 qemu-system-x86-1060614 [036] 2316032.840657: kvm_page_fault:       vcpu 3 rip 0x7f80ed32638b address 0x00000003f7369712 error_code 0x181
 qemu-system-x86-1060614 [036] 2316033.325844: kvm_page_fault:       vcpu 3 rip 0x7f80ed32638b address 0x00000003f9ce5a12 error_code 0x181
 qemu-system-x86-1060614 [036] 2316033.709713: kvm_page_fault:       vcpu 3 rip 0x7f80ed3251d6 address 0x00000003f72089d0 error_code 0x182
root@eve-ng62:~#

the above error_code 0x181 should point to an EPT violation VM-exit due to a data read access (rightmost 0x1) to a valid guest linear address (0x8) targeted to a GPA that is the (already performed) translation of a GVA (leftmost 0x1).

I believe such a violation results from an EPT paging-structure entry not present (at any level of EPT hierarchy). Indeed I don't think kvm actually set up present EPT paging-structures' entries which disallow the read access from guest.

cianfa72 · Post by **cianfa72** » Fri May 23, 2025 7:25 am

cianfa72 wrote: ↑Wed May 21, 2025 4:15 am the above error_code 0x181 should point to an EPT violation VM-exit due to a data read access (rightmost 0x1) to a valid guest linear address (0x8) targeted to a GPA that is the (already performed) translation of a GVA (leftmost 0x1).

I believe such a violation results from an EPT paging-structure entry not present (at any level of EPT hierarchy). Indeed I don't think kvm actually set up present EPT paging-structures' entries which disallow the read access from guest.

Do you think that could be actually the case ? Thanks.

Octocontrabass · Post by **Octocontrabass** » Fri May 23, 2025 6:38 pm

cianfa72 wrote: ↑Fri May 23, 2025 7:25 amDo you think that could be actually the case ? Thanks.

It couldn't be anything else, since that error code is coming directly from the CPU.

cianfa72 · Post by **cianfa72** » Mon May 26, 2025 5:54 am

Octocontrabass wrote: ↑Fri May 23, 2025 6:38 pm It couldn't be anything else, since that error code is coming directly from the CPU.

Ok, so my guess about what is really going on "under the hood" is as follows:

qemu/kvm VM was launched with -overcommit mem-lock=on qemu option. Host Linux OS allocates a qemu process's virtual address range which is what the guest thinks to be its "physical" address space (note that such a range includes VM configured RAM plus memory-mapped I/O as well). As result of that option, host Linux OS allocates and locks RAM resident page frames for guest "physical" memory including the "normal" host paging-structures needed to map it.

Here you can see the qemu process's VA range of 16779264 KiB (16386 MiB) allocated from 0x000072582bc00000 as resident (RSS) for VM/guest "physical" address space as required by mem-lock=on qemu option.

Code: Select all

root@eve-ng62:~# pmap -x 869394 | egrep "(qemu-6.0.0)|(000072582bc00000)" -A 1
869394:   /opt/qemu-6.0.0/bin/qemu-system-x86_64 -nographic -device e1000,addr=3.0,multifunction=on,netdev=net0,mac=50:00:00:09:00:00 -netdev tap,id=net0,ifname=vunl0_9_0,script=no -device e1000,addr=3.1,multifunction=on,netdev=net1,mac=50:00:00:09:00:01 -netdev tap,id=net1,ifname=vunl0_9_1,script=no -device e1000,addr=3.2,multifunction=on,netdev=net2,mac=50:00:00:09:00:02 -netdev tap,id=net2,ifname=vunl0_9_2,script=no -smp 4 -m 16386 -enable-kvm
Address           Kbytes     RSS   Dirty Mode  Mapping
--
000072582bc00000 16779264 16779264 16779264 rw---   [ anon ]
0000725c2be00000       4       0       0 -----   [ anon ]
root@eve-ng62:~#

As long as guest keeps accessing new memory locations, the Dirty column increases up to the maximum. Therefore a qemu access to any address in that VA range will never result in a page fault. So far so good.

When it comes to second level address translation (Intel EPT) things are really different. qemu/kvm build on demand EPT paging-structures devoted to map VM's GPAs and do not lock RAM resident memory for them. So when the guest tries to access a GVA an EPT-induced VM-exit can occur for two reasons:

exit occurs when the guest accesses an already accomplished/completed GVA-> GPA translated address (bit 8 within VM-exit violation is set (1) as in 0x181 and 0x182 error_code)

exit occurs during the translation of a GVA into GPA (bit 8 within VM-exit violation is clear (0) as in 0x82 error_code)

What do you think, is the above plausible ? Thanks.

OSDev.org

linux KVM error_code due to EPT violation VM-exit

linux KVM error_code due to EPT violation VM-exit

Re: linux KVM error_code due to EPT violation VM-exit

Re: linux KVM error_code due to EPT violation VM-exit

Re: linux KVM error_code due to EPT violation VM-exit

Re: linux KVM error_code due to EPT violation VM-exit

Re: linux KVM error_code due to EPT violation VM-exit