How KVM and QEMU run VMs in Linux

Follow @popovicu94

When I first encountered virtualization, I was quite confused about the exact roles of KVM and QEMU. These terms often appeared side-by-side, leading me to initially believe they were distinct solutions to the same problem.

While there are some minor overlaps in their functionality, they don’t serve the same fundamental purpose. In fact, they complement each other to deliver optimal performance for virtual machines.

In this article, we’ll build an intuitive understanding of virtualization in Linux, starting from the most basic approach and progressively moving towards the sophisticated QEMU/KVM combination.

Open Table of contents

Emulate or not

[Implementing…

Follow @popovicu94

In this article, we’ll build an intuitive understanding of virtualization in Linux, starting from the most basic approach and progressively moving towards the sophisticated QEMU/KVM combination.

Open Table of contents

Emulate or not

Implementing the guest VM (CPU)

JIT emulation
Hardware acceleration

QEMU

KVM

QEMU and KVM combo

Conclusion

Emulate or not

A fundamental question when creating a VM is whether its CPU architecture matches that of the host system.

While we’ll look into the specifics shortly, it’s intuitively clear that a VM sharing the host’s architecture requires less overhead.

Although terminology on the internet can be inconsistent, with slight variations, the general principle holds: if the VM’s architecture differs from the host’s, it must be emulated.

Implementing the guest VM (CPU)

The CPU is the core of any conventional computing system, and it naturally forms the heart of a virtual machine’s operation. For this discussion, we’ll disregard multi-core CPU emulation and conceptually focus on a single core. Emulating surrounding components like the bus, peripherals, and I/O is beyond the scope of this discussion.

Consider a naive code implementation of CPU virtualization within a hypothetical VM hypervisor:

while (true) {
InstructionBytes next_instruction_bytes = get_next_instruction(register_pc); // pc is the program counter i.e. the pointer to the next instruction in memory
Instruction next_instruction = decode(next_instruction_bytes);

if (next_instruction.opcode == ADD) {
next_instruction.register_destination = next_instruction.register_source_1 + next_instruction.register_source_2;
continue;
}

if (next_instruction.opcode == SUB) {
next_instruction.register_destination = next_instruction.register_source_1 - next_instruction.register_source_2;
continue;
}

...
}

This approach is viable for small VMs with minimal workloads where simplicity is prioritized over performance. In fact, this is how my custom Mrav CPU can run as a software simulation, both natively on the host machine or even within a browser!

The primary drawback of this approach is performance. This is the key insight: each VM CPU instruction, which would typically execute in a few clock cycles on real hardware, necessitates numerous expressions in our VM implementation language. These expressions, in turn, translate into many host CPU instructions. Consequently, the VM operates at only a fraction of the CPU frequency it would achieve on physical hardware equivalent to the host.

JIT emulation

For a VM with a different architecture (e.g., an ARM VM on an x86_64 host), achieving practical performance necessitates a more sophisticated implementation.

This is where Just In Time (JIT) emulation techniques become crucial. JIT emulation software analyzes the guest’s code execution flow and dynamically compiles blocks of guest VM instructions into the host’s native code. For instance, it might map guest register R_VM_1 to host register R_HOST_1 (and similarly for R_VM_2 to R_HOST_2), directly translating a guest addition instruction to a host addition:

ADD R_HOST1, R_HOST_2

Many JIT emulators claim near-native performance, and the reason is clear: the more efficiently the emulator tracks and recompiles guest instructions into native host code, the closer it gets to executing those code blocks with minimal overhead, thereby approaching the performance of running directly on the emulated architecture’s native hardware.

In essence, JIT emulation achieves the same logical outcome as the naive approach but with significantly enhanced performance.

Hardware acceleration

Let’s now consider the simpler scenario of creating a VM with the same architecture as the host, for instance, an x86_64 VM on an x86_64 system.

Despite the advancements of JIT, the aforementioned approach still incurs overhead. Recognizing this, and driven by other considerations like security, CPU vendors such as Intel began introducing specialized CPU instructions for virtualization. These instructions allow VM implementation code to bypass numerous general-purpose instructions, instead executing CPU instructions specifically designed to assist in creating and managing VMs. This effectively provides hardware acceleration for virtualization. VMs leveraging this hardware-assisted setup can achieve near-native performance. While some context switching is inherent when entering and exiting VM execution, efficient VM software minimizes this overhead.

In rarer instances, a CPU might offer acceleration for hosting a VM of a different architecture. The increased cost is evident: when the VM shares the host’s architecture, the same CPU logic blocks can execute instructions for both. However, hosting a VM with a disparate architecture introduces significant complexity.

Nevertheless, similar to the JIT solution, this approach remains functionally identical to the initial naive code block but delivers substantially improved performance.

QEMU

Given these various virtualization techniques, what then is QEMU’s role?

By default, QEMU typically employs JIT emulation for CPU execution and also handles peripheral simulation. While I’m not deeply familiar with its internal implementation, this serves as a useful mental model for understanding its user-facing behavior.

Irrespective of the VM’s architecture, QEMU executes userspace code to simulate the guest’s in-memory environment. As previously noted, it also emulates peripherals such as display and storage devices.

Thus, QEMU functions as a comprehensive system emulator. We’ll pause our discussion of QEMU here and revisit it after exploring KVM.

KVM

Kernel-based Virtual Machine (KVM) is a module for the Linux kernel that enables it to leverage the CPU’s hardware virtualization extensions for hosting VMs. As userspace applications cannot directly access these privileged instructions, KVM acts as a crucial bridge, allowing your application to utilize the CPU’s hardware acceleration for optimal VM performance.

QEMU and KVM combo

The synergy between QEMU and KVM should now be clearer. QEMU leverages the KVM API in Linux to achieve maximum performance when hosting a guest CPU. Conceptually, QEMU continues to perform the tasks outlined in our naive code block, but its implementation with KVM is significantly more complex and, crucially, far more performant.

A recent search of my shell history for running a QEMU VM revealed the following command:

/tmp/linux/kernel/linux-6.17.2/arch/x86/boot/bzImage -drive file=/tmp/disk.ext4,format=raw,if=virtio -nographic --append "console=ttyS0 init=/init root=/dev/vda ro" --enable-kvm -smp 16

This command now makes complete sense. KVM can be understood as the acceleration mechanism that enables QEMU to achieve native performance within the VM. The --enable-kvm flag instructs QEMU to delegate guest CPU emulation to the kernel, thereby utilizing hardware acceleration.

Conclusion

This article aimed to build an intuitive understanding of VM implementation, progressing from the most basic and least performant methods to the native-level performance achieved by the QEMU + KVM combination. The key takeaway is that KVM serves as the crucial layer enabling the utilization of CPU hardware acceleration for VM software development.

While VM software like QEMU can perform virtualization independently (addressing the minor overlap mentioned earlier), performance will be suboptimal without the kernel’s hardware acceleration layer.

Bonus: For Mac users running QEMU, macOS offers a KVM equivalent called Hypervisor Framework (HVF). The QEMU flag to enable it is -accel hvf, though I haven’t personally tested it.

I hope this exploration has been insightful.

Please consider following me on Twitter/X and LinkedIn for further updates.

Table of contents

Emulate or not

Table of contents

Emulate or not

QEMU

KVM

QEMU and KVM combo

Emulate or not

Implementing the guest VM (CPU)

JIT emulation

Hardware acceleration

QEMU

KVM

QEMU and KVM combo

Conclusion

Similar Posts