KVM page fault exceptions
KVM page fault exceptions
Hi,
a basic question about KVM implementation (I'm using kvm based VMs to implement a network virtual lab running on top on an Intel based sever)
Intel server supports EPT nested paging, so I'd like to better understand how page fault exceptions raised by CPU when guest OS is running are handled in this scenario.
As far as I can understand, leveraging on EPT, kvm module may be able to disable #PF vm exit (setting the related bit in the VMCS field for the guest VM). In that case page fault exceptions (#PF) due to guest OS paging system translations are handled by guest OS itself without the need to VM exit giving back the control to KVM.
Furthermore allocation of host physical page to be used for "guest physical pages" including the related EPT mapping (basically the EPT entries filling process for them) will be handled by KVM handler upon EPT VIOLATION or MISCONFIGURATION vm exit (EPT-induced vm exit).
did I get it right ? Thanks
a basic question about KVM implementation (I'm using kvm based VMs to implement a network virtual lab running on top on an Intel based sever)
Intel server supports EPT nested paging, so I'd like to better understand how page fault exceptions raised by CPU when guest OS is running are handled in this scenario.
As far as I can understand, leveraging on EPT, kvm module may be able to disable #PF vm exit (setting the related bit in the VMCS field for the guest VM). In that case page fault exceptions (#PF) due to guest OS paging system translations are handled by guest OS itself without the need to VM exit giving back the control to KVM.
Furthermore allocation of host physical page to be used for "guest physical pages" including the related EPT mapping (basically the EPT entries filling process for them) will be handled by KVM handler upon EPT VIOLATION or MISCONFIGURATION vm exit (EPT-induced vm exit).
did I get it right ? Thanks
Re: KVM page fault exceptions
Help please
Re: KVM page fault exceptions
What is the exact problem? Even if you find the manual too large, you can experiment with page tables.
I should read old sources to be sure but AFAIR it works like this:
1. EPT entries has "present" bit that allow you to allocate physical memory lazily, i.e. when guest accesses GVA (guest virtual address) that have mapped GPA (guest physical address) according to guest's page tables but don't have present HPA (host physical address).
2. When page fault occurs during GVA -> GPA translation (before doing GPA -> HPA) this page fault can be handled by guest without any VM exits if you configure VMX properly.
I should read old sources to be sure but AFAIR it works like this:
1. EPT entries has "present" bit that allow you to allocate physical memory lazily, i.e. when guest accesses GVA (guest virtual address) that have mapped GPA (guest physical address) according to guest's page tables but don't have present HPA (host physical address).
2. When page fault occurs during GVA -> GPA translation (before doing GPA -> HPA) this page fault can be handled by guest without any VM exits if you configure VMX properly.
Re: KVM page fault exceptions
Thanks for reply
Which manual you are referring to ? Furthermore could you tell me which is the kvm source code module (.c) for it ?Nable wrote:What is the exact problem? Even if you find the manual too large, you can experiment with page tables.
I should read old sources...
If I understand correctly, this condition (present bit missing in the EPT entry for the HPA page) would trigger an EPT-induced vm exit (EPT VIOLATION or MISCONFIGURATION vm exit) handled by a specific kvm handling routine, right ?Nable wrote:1. EPT entries has "present" bit that allow you to allocate physical memory lazily, i.e. when guest accesses GVA (guest virtual address) that have mapped GPA (guest physical address) according to guest's page tables but don't have present HPA (host physical address).
Re: KVM page fault exceptions
> Which manual you are referring to ?
I'm talking about IASDM, of course. Although I don't remember exact volume number, I should look for it in the evening.
> Furthermore could you tell me which is the kvm source code module (.c) for it ?
http://lxr.free-electrons.com/source/arch/x86/kvm/vmx.c
> If I understand correctly, this condition (present bit missing in the EPT entry for the HPA page) would trigger an EPT-induced vm exit (EPT VIOLATION or MISCONFIGURATION vm exit) handled by a specific kvm handling routine, right ?
Yes, you are right. For KVM this exit code is defined as EXIT_REASON_EPT_VIOLATION and it is handled by "handle_ept_violation" function. Here it is: http://lxr.free-electrons.com/source/ar ... mx.c#L6111
Btw, I should note that Intel manuals are quite difficult to read. Not just because of the complex subject but because they has twisted style of explanation when you have to constantly jump from chapter to chapter in order to catch the whole picture (AMD manuals in whole and specifically SVM implementation are much easier to understand). KVM description is even worse, it's extremely twisted. Linux source code has a similar problem, although IDE could help you to navigate through these thousands of lines. While I was learning about hardware assisted virtualization, Bochs and Palacios (yes, Palacios is mostly dead and it's code style isn't very good but at least it's much easier to read) source code helped me very much.
I'm talking about IASDM, of course. Although I don't remember exact volume number, I should look for it in the evening.
> Furthermore could you tell me which is the kvm source code module (.c) for it ?
http://lxr.free-electrons.com/source/arch/x86/kvm/vmx.c
> If I understand correctly, this condition (present bit missing in the EPT entry for the HPA page) would trigger an EPT-induced vm exit (EPT VIOLATION or MISCONFIGURATION vm exit) handled by a specific kvm handling routine, right ?
Yes, you are right. For KVM this exit code is defined as EXIT_REASON_EPT_VIOLATION and it is handled by "handle_ept_violation" function. Here it is: http://lxr.free-electrons.com/source/ar ... mx.c#L6111
Btw, I should note that Intel manuals are quite difficult to read. Not just because of the complex subject but because they has twisted style of explanation when you have to constantly jump from chapter to chapter in order to catch the whole picture (AMD manuals in whole and specifically SVM implementation are much easier to understand). KVM description is even worse, it's extremely twisted. Linux source code has a similar problem, although IDE could help you to navigate through these thousands of lines. While I was learning about hardware assisted virtualization, Bochs and Palacios (yes, Palacios is mostly dead and it's code style isn't very good but at least it's much easier to read) source code helped me very much.
Re: KVM page fault exceptions
I was wondering about the following (you can find same topic asked for also on other forums without getting a clear answer...)...just to recap the scenario, I'm working on a virtual networking lab where each network node is implemented as a qemu/kvm based VM running on top of a Linux Ubuntu (bare-metal) system equipped with a huge amount of RAM (128 GB).Nable wrote:What is the exact problem?
Memory is not an issue as follow:
Code: Select all
root@unl01:~# free -g
total used free shared buffers cached
Mem: 125 19 106 0 0 1
-/+ buffers/cache: 17 107
Swap: 127 0 127
root@unl01:~#
Code: Select all
root@unl01:~# ps -p 47716 -o min_flt,maj_flt,cmd
MINFL MAJFL CMD
222142654 69 /opt/qemu/bin/qemu-system-x86_64 -device e1000,netdev=net0,mac=50:01:00:1a:00:00 -netdev tap,id=ne
root@unl01:~#
root@unl01:~#
root@unl01:~# ps -p 47716 -o min_flt,maj_flt,cmd
MINFL MAJFL CMD
222148030 69 /opt/qemu/bin/qemu-system-x86_64 -device e1000,netdev=net0,mac=50:01:00:1a:00:00 -netdev tap,id=ne
root@unl01:~#
Memory (RAM) is available (free) in a large amount, thus why basically kernel memory manager continuously try to shrink down the working set for the process running the qemu/kvm VM instance (resulting in minor page faults when guest code accesses memory pages again) ?
Googling for it I found http://kneuro.net/linux-mm/index.php?fi ... cache.html Reading it, as far as I can understand, Linux memory manager subsystem always try to move pages from active_list to inactive_list unmapping them from process virtual memory (basically clearing process' PTE entries for them). Upon process' attempt to access one of those, page fault handler can find it in memory and simply restore process' PTE entry pointing to it (basically this is a minor page fault event tracked by linux kernel)
What do you think about, it could be a valid reason for it ?
Re: KVM page fault exceptions
1. Why do you have swap? And why do you have so much swap? This sounds like a bad idea at first. If you don't need hibernation - you most likely don't need swap at all.
2. How much memory do you allocate for VMs? I don't see "-m" switch in the command line. Note that kernel allocates memory lazily for all processes, including QEmu instances.
3. Did you enable KSM? This is an in-kernel service that is periodically looking for pages with same contents and merges them into the one page with CoW (copy-on-write) logic. KSM helps with VMs a lot but it would contribute to page faults counters of course. Although you don't lose much performance due to it.
4. Why did you choose VMs when containers (LXC) are more than enough for this task?
2. How much memory do you allocate for VMs? I don't see "-m" switch in the command line. Note that kernel allocates memory lazily for all processes, including QEmu instances.
3. Did you enable KSM? This is an in-kernel service that is periodically looking for pages with same contents and merges them into the one page with CoW (copy-on-write) logic. KSM helps with VMs a lot but it would contribute to page faults counters of course. Although you don't lose much performance due to it.
4. Why did you choose VMs when containers (LXC) are more than enough for this task?
Re: KVM page fault exceptions
I disabled swap (swapoff -a) anyway I never seen swap activities during the time VMs were running...Nable wrote:1. Why do you have swap? And why do you have so much swap? This sounds like a bad idea at first. If you don't need hibernation - you most likely don't need swap at all.
Code: Select all
root@unl01:~# free -k
total used free shared buffers cached
Mem: 131762936 25273540 106489396 10988 194212 1818944
-/+ buffers/cache: 23260384 108502552
Swap: 0 0 0
root@unl01:~#
each VMs has assigned 2GB RAMNable wrote:2. How much memory do you allocate for VMs? I don't see "-m" switch in the command line. Note that kernel allocates memory lazily for all processes, including QEmu instances.
KSM is disabled as follows:Nable wrote:3. Did you enable KSM? This is an in-kernel service that is periodically looking for pages with same contents and merges them into the one page with CoW (copy-on-write) logic. KSM helps with VMs a lot but it would contribute to page faults counters of course. Although you don't lose much performance due to it.
Code: Select all
root@unl01:~# cat /sys/kernel/mm/ksm/run
0
root@unl01:~#
My virtual networking lab is based on this project where each network node (router) is implemented via a dedicated VMNable wrote:4. Why did you choose VMs when containers (LXC) are more than enough for this task?
However, even with swap turned off, I continue to see qemu processes' minor page faults keep incrementing. Considering furthermore that lab VMs have been running for at least 1 week I'm not sure this behaviour has to be considered expected or not (basically I've not found a clear understanding of it)...
Re: KVM page fault exceptions
If you don't see significant performance loss, why should you care about counters? And it makes sense to enable KSM, after all.
Btw, I've thought about another possible source of minor page faults: on each context switch kernel has to remove/replace mappings of current process so that next one won't be able to read/write anything from there. And then kernel has to enable mappings for the new process. Maybe it's also done in some lazy way. I don't know enough details about Linux virtual memory management to prove or decline this idea.
Btw, I've thought about another possible source of minor page faults: on each context switch kernel has to remove/replace mappings of current process so that next one won't be able to read/write anything from there. And then kernel has to enable mappings for the new process. Maybe it's also done in some lazy way. I don't know enough details about Linux virtual memory management to prove or decline this idea.
Re: KVM page fault exceptions
Sure on x86, upon context switching, linux kernel has to switch CR3 register to point to the address space of the context being loaded. Neverthless, considering a system with a plenty of RAM, I see no reason to unmap previous context' memory pages (basically invalidating the associated page table entries) - incurring in minor page faults then when the process address space will be loaded again.Nable wrote:another possible source of minor page faults: on each context switch kernel has to remove/replace mappings of current process so that next one won't be able to read/write anything from there. And then kernel has to enable mappings for the new process.
From the point of view of linux kernel - host memory pages mapped to guest physical memory pages via EPT- belong to qemu process (user) address space. Regarding EPT hierarchy (PML4T, PDPT, PDT and PT) is the memory for it actually allocated by kvm in the context of qemu process ?
Re: KVM page fault exceptions
There are also different forms of INVTLB for this.cianfa72 wrote:Sure on x86, upon context switching, linux kernel has to switch CR3 register to point to the address space of the context being loaded.
Processes shouldn't be able to access each other's data, so kernel have to change address space when it switches to different process. Btw, it's also a good thing to pin QEmu processes to specific CPU cores because scheduler likes to constantly move unpinned processes from core to core and (consequently) from one NUMA node to another.cianfa72 wrote:Neverthless, considering a system with a plenty of RAM, I see no reason to unmap previous context' memory pages (basically invalidating the associated page table entries) - incurring in minor page faults then when the process address space will be loaded again.
As far as I remember, it's true - QEmu needs access to guest memory, so the whole(?) guest RAM is mapped into QEmu process.cianfa72 wrote:From the point of view of linux kernel - host memory pages mapped to guest physical memory pages via EPT - belong to qemu process (user) address space. Regarding EPT hierarchy (PML4T, PDPT, PDT and PT) is the memory for it actually allocated by kvm in the context of qemu process?
Btw, I've remembered one more source of periodical EPT faults: accesses to memory-mapped virtual devices are intercepted using unmapped guest's "physical" pages. Switching to VirtIO paravirtual devices may make situation better (they are using buffers in memory areas shared between host and guest).
Re: KVM page fault exceptions
of course, but this should not involve minor page faults when the process address space (i.e. process context) is loaded again on the CPU core, I guess...Nable wrote:Processes shouldn't be able to access each other's data, so kernel have to change address space when it switches to different process.
Code: Select all
root@unl01:~# root@unl01:~# free -k
total used free shared buffers cached
Mem: 131762936 70593604 61169332 10996 177000 1868560
-/+ buffers/cache: 68548044 63214892
Swap: 133943292 0 133943292
root@unl01:~# vmstat -a
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free inact active si so bi bo in cs us sy id wa st
4 0 0 61169680 347324 68430712 0 0 0 0 4 7 3 1 96 0 0
root@unl01:~#
Btw, tracing kvm I've noticed that QEmu process minor page faults increment when kvm_page_fault event occours, e.g.
Code: Select all
root@unl02:~# trace-cmd record -e kvm:kvm_exit -f 'exit_reason == 48' -e kvm:kvm_page_fault
<snip....>
<...>-42086 [002] 194509.194110: kvm_exit: reason EPT_VIOLATION rip 0x8284597 info 181 0
<...>-42086 [002] 194509.194110: kvm_page_fault: address a2896f28 error_code 181
Thanks
Re: KVM page fault exceptions
I've posted the link to KVM source code above, that link brings you to the place where this function is executed. You can study the code further if you want to find deeper details. Oh, and did you think about my suggestion of VirtIO devices instead of default emulated RTL8139/Intel ones?cianfa72 wrote:Therefore - considering EPT is enabled- which are actually the occurrences kvm_page_fault handler will be executed?