Page 1 of 1

QEMU architectural view

Posted: Fri Oct 08, 2021 1:15 am
by cianfa72
Hi,

I've a doubt about the QEMU architectural view as depicted in the following (and similar) pictures -- e.g. https://www.redhat.com/en/blog/introduc ... -vhost-net.
QEMU architectural view
QEMU architectural view
To me this view makes sense for 'plane QEMU' only since it has not any hw acceleration (i.e. KVM). In fact, without any hw acceleration, QEMU employs dynamic translation even if the emulated CPU (vCPU) is the same as the host physical CPU.

On the other hand QEMU/KVM leverages on CPU virtualization support (e.g. Intel VT-x) providing root and non-root (guest) modes (e.g. VMX root vs VMX non-root mode). In that case CPU guest mode (VMX non-root) has got its own User and Kernel mode (Ring 3 vs Ring 0) so I believe that picture actually does not make sense -- since the Guest is not inside the the Host User mode.

What do you think about ?

Re: QEMU architectural view

Posted: Fri Oct 08, 2021 2:22 am
by klange
This is kinda splitting hairs about how KVM provides userspace applications with access to hardware virtualization functionality, but essentially the entire guest CPU is treated similarly to normal user code in the context of the (host) process that is operating as the hypervisor. It's not granted any special privileges on the host side, the privileged instructions it runs are (at various) emulated rather than actually executed, and it still needs to be scheduled by the host kernel. So, for practical purposes, the guest kernel space and userspace are still "within" the QEMU process in a similar vane to how the would be with TCG. And, sure enough, if you burn CPU in the virtualized guest you'll see that reported as user CPU time in the host QEMU process - not system (kernel) time, and not mysteriously lost somewhere else, further solidifying that concept.

Re: QEMU architectural view

Posted: Mon Oct 11, 2021 1:47 am
by cianfa72
I'd like to ask about another point related to this interesting Deep dive in Virtio-networking with the following sentence:

Qemu allocates one eventfd and registers it to both vhost and KVM in order to achieve the notification bypass. The vhost-$pid kernel thread polls it, and KVM writes to it when the guest writes in a specific address. This mechanism is named ioeventfd. This way, a simple read/write operation to a specific guest memory address does not need to go through the expensive QEMU process wakeup and can be routed to the vhost worker thread directly. This also has the advantage of being asynchronous, no need for the vCPU to stop (so no need to do an immediate context switch).

In the part in bold he says vCPU does not stop and there is no immediate context switch. AFAIK for the KVM code to write to ioeventdfd (shared between KVM driver and vhost also running inside host kernel) the physical CPU has to do a vmexit from VMX non-root/guest mode since KVM driver code runs inside the host kernel. So, IMO, there is actually a mode switch (VMX non-root mode -> VMX root mode) even if there is no switch from host kernel to host user mode (no QEMU process wakeup anymore). To do that guest vCPU has to stop to run guest code and enter inside KVM code loaded inside host kernel.

Do you think it make sense ? Thank you.

Re: QEMU architectural view

Posted: Mon Oct 11, 2021 10:28 am
by Korona
It has to do a vmexit but it doesn't exit the Linux kernel to re-enter the qemu userspace process.

Re: QEMU architectural view

Posted: Fri Oct 22, 2021 1:33 am
by cianfa72
Korona wrote:It has to do a vmexit but it doesn't exit the Linux kernel to re-enter the qemu userspace process.
So, there is actually no 'tax' due to 'mode switch' from kernel mode to userspace mode (and back) in order to process packets. Then, when packet processing inside host Linux kernel is completed, the VMX non-root/guest mode for that VM is re-entered again (vmenter) to resume the vCPU.

Technically I believe it is always the same QEMU's thread (i.e. the vCPU from the guest point of view) that on vmexit runs the KVM code inside (host) Linux kernel and writes on ioeventfd file descriptor to communicate with the vhost-$pid kernel thread.

Is the above correct ? Thanks.

Re: QEMU architectural view

Posted: Fri Oct 22, 2021 2:21 am
by Korona
Yes.

(I saw that you deleted and re-posted your comment to bump the thread. That is generally considered bad style.)