Next Stop - Ihcblog!

Some creations and thoughts sharing | sub site:ihc.im

0%

Mini VMM in Rust 1 - Basic

This article also has a Chinese version.

This series of blog posts mainly records my process of trying to implement a Hypervisor in Rust.

Why am I writing this series? A few months ago, when I was exploring KVM in my spare time, I encountered some difficulties. Many articles on the Internet did not explain things clearly, and there wasn’t a single article that could build a VMM from scratch and clearly explain the meaning and reason of each Magic Number. I hope my sharing can help beginners avoid some detours to a certain extent. Of course, there may be some misunderstandings in my explanations, and I welcome corrections from everyone.

Table of Contents:

  1. Mini VMM in Rust - Basic
  2. Mini VMM in Rust - Mode Switch
  3. Mini VMM in Rust - Run Real Linux Kernel
  4. Mini VMM in Rust - Implement Virtio Devices

This article is the first in the series, which mainly covers some introductory knowledge and runs some actual code.

In recent years, there seem to be an increasing number of microvms implemented in Rust, from crosvm to firecracker, followed by Huawei and Intel’s respective development of stratovirt and cloud hypervisor. The main reason is that as infrastructure, there are extremely high requirements for performance and security. With Rust, we can control unsafe code within a smaller scope to avoid memory safety issues as much as possible.

Is creating such a Hypervisor complex? Creating a minimal Hypervisor based on KVM is very simple; the complexity is more reflected in simulating devices. This series of articles will use Rust to implement a micro VMM step by step, which offers better security assurances compared to directly using C, as unsafe operations are encapsulated within very small segments of code used from the library.

Chap 0: Basic Knowledge

First, we need to know what KVM is: Kernel-based Virtual Machine (KVM) is an open-source virtualization technology built into Linux®. Specifically, KVM can help turn Linux into a hypervisor, allowing the host computer to run multiple isolated virtual environments, that is, virtual clients or virtual machines (VM). [src]

How to use KVM then? Normally, you might think, since it’s a capability provided by the kernel, it should be a syscall, right? Heh, guess wrong, it’s through the /dev/kvm device; on supported machines, you can see it by ls /dev/kvm. Abstracting as a device file is easier for permission management compared to direct syscall.

After opening the device, you can operate it through the ioctl syscall. There are three levels here:

  1. System: affects the entire KVM subsystem, such as creating VMs.
  2. VM: affects a single VM, such as creating a vCPU for the VM.
  3. vCPU: queries or controls the properties of a single vCPU.

A typical example of using KVM is: open /dev/kvm to get kvmfd, then use ioctl KVM_CREATE_VM to get vmfd, and then use ioctl KVM_CREATE_VCPU to get cpufd. After setting up memory (ioctl KVM_SET_USER_MEMORY_REGION) and devices, initializing registers, you can use a thread to execute the vCPU (ioctl KVM_RUN).

When an event requiring host intervention occurs, KVM will trigger VM_EXIT. When there is an event that KVM itself cannot handle, ioctl KVM_RUN will return to user space waiting for the user to process it, and after processing, the user space can continue to loop ioctl KVM_RUN or exit the loop (for example, encountering Poweroff).

In addition, all details related to Intel processors can be found in the SDM, and many of KVM’s data structures can correspond to it.

Chap 1: Get Hands Dirty

First, let’s run the simplest hello world: it has no devices, only a fixed size of memory, only one vCPU, and we only support running on Intel x86-64 CPUs.

Using Existing Crates

Given it’s inconvenient and unnecessary to manually write device access, syscalls and a bunch of different flags, let’s directly use the ready-made crates from rust-vmm:

  • kvm-bindings: As the name implies, it includes a bunch of bindings, containing a lot of kernel structure definitions and constant definitions.
  • kvm-ioctls: A safe abstraction of the KVM API; we can safely use Kvm, VmFd, VcpuFd, etc.
  • vm-memory: Related to memory management, such as converting GVA to HPA, etc., but of course, it also provides some convenient features such as automagically mmaping a block of memory and mapping it as GuestMemory.
  • vmm-sys-util: Some tools, we’ll discuss as needed.

Creating VM and Memory

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// create vm
let kvm = Kvm::new().expect("open kvm device failed");
let vm = kvm.create_vm().expect("create vm failed");

// create memory
let guest_addr = GuestAddress(0x0);
let guest_mem = GuestMemoryMmap::<()>::from_ranges(&[(guest_addr, MEMORY_SIZE)]).unwrap();
let host_addr = guest_mem.get_host_address(guest_addr).unwrap();
let mem_region = kvm_userspace_memory_region {
slot: 0,
guest_phys_addr: 0,
memory_size: MEMORY_SIZE as u64,
userspace_addr: host_addr as u64,
flags: KVM_MEM_LOG_DIRTY_PAGES,
};
unsafe {
vm.set_user_memory_region(mem_region).expect("set user memory region failed")
};

An extra note: Requesting memory is actually a private anonymous mmap, the initial value of the anonymous mapping corresponds to /dev/zero, so it is all zeros. Sometimes we also attach MADV_MERGEABLE for madvise to open page sharing, which can save some memory when booting multiple VMs with the same kernel (but the linux kernel modifies itself to do some optimizations, which may cause some page sharing to fail).

Creating vCPU and Initializing Registers

Here the default initial value of the registers is zero, our entry point is also set to 0, and rflags is set to 2 (because according to the manual its bit1 is always 1). Since it runs in real mode, we don’t need to worry about complex initialization logic like page tables, GDT, etc.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// create vcpu and set cpuid
let vcpu = vm.create_vcpu(0).expect("create vcpu failed");
let kvm_cpuid = kvm.get_supported_cpuid(KVM_MAX_CPUID_ENTRIES).unwrap();
vcpu.set_cpuid2(&kvm_cpuid).unwrap();

// set regs
let mut regs = vcpu.get_regs().unwrap();
regs.rip = 0;
regs.rflags = 2;
vcpu.set_regs(&regs).unwrap();

// set sregs
let mut sregs = vcpu.get_sregs().unwrap();
sregs.cs.selector = 0;
sregs.cs.base = 0;
vcpu.set_sregs(&sregs).unwrap();

Copying and Executing Code

We need to generate a small piece of code that can run under 16-bit real mode.

First, write such a file demo.asm, then nasm demo.asm to generate a binary. The expected file will be named demo.

1
2
3
4
bits 16
mov ax, 0x42
mov ds:[0x1000], ax
hlt

We can see the actual result with ndisasm -b16 demo:

1
2
3
00000000  B84200            mov ax,0x42
00000003 3EA30010 mov [ds:0x1000],ax
00000007 F4 hlt

Yes, the result is as expected. Of course, you could also directly check the manual and handwrite this assembly.

Since this code segment is quite simple, merely setting a register, writing to memory, then halt, for simplicity, we directly hard-code part of the instructions into our program.

1
2
3
4
5
6
7
8
9
10
11
12
13
// copy code
// B84200 mov ax,0x42
// 3EA30010 mov [ds:0x1000],ax
// F4 hlt
let code = [0xb8, 0x42, 0x00, 0x3e, 0xa3, 0x00, 0x10, 0xf4];
guest_mem.write_slice(&code, GuestAddress(0x0)).unwrap();
let reason = vcpu.run().unwrap();
let regs = vcpu.get_regs().unwrap();
println!("rax: {:x}, rip: {:X?}", regs.rax, regs.rip);
println!(
"memory at 0x10000: 0x{:X}",
guest_mem.read_obj::<u16>(GuestAddress(0x1000)).unwrap()
);

By now, our most basic hello world is finished. Run it and we can get:

1
2
3
exit reason: Hlt
rax: 42, rip: 8
memory at 0x10000: 0x42

As you can see, our virtual machine has started up and got the expected result; we can correctly handle computation and memory access.

Full code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
use kvm_bindings::{kvm_userspace_memory_region, KVM_MAX_CPUID_ENTRIES, KVM_MEM_LOG_DIRTY_PAGES};
use kvm_ioctls::Kvm;
use vm_memory::{Bytes, GuestAddress, GuestMemory, GuestMemoryMmap};

const MEMORY_SIZE: usize = 0x30000;

fn main() {
// create vm
let kvm = Kvm::new().expect("open kvm device failed");
let vm = kvm.create_vm().expect("create vm failed");

// create memory
let guest_addr = GuestAddress(0x0);
let guest_mem = GuestMemoryMmap::<()>::from_ranges(&[(guest_addr, MEMORY_SIZE)]).unwrap();
let host_addr = guest_mem.get_host_address(guest_addr).unwrap();
let mem_region = kvm_userspace_memory_region {
slot: 0,
guest_phys_addr: 0,
memory_size: MEMORY_SIZE as u64,
userspace_addr: host_addr as u64,
flags: KVM_MEM_LOG_DIRTY_PAGES,
};
unsafe {
vm.set_user_memory_region(mem_region)
.expect("set user memory region failed")
};

// create vcpu and set cpuid
let vcpu = vm.create_vcpu(0).expect("create vcpu failed");
let kvm_cpuid = kvm.get_supported_cpuid(KVM_MAX_CPUID_ENTRIES).unwrap();
vcpu.set_cpuid2(&kvm_cpuid).unwrap();

// set regs
let mut regs = vcpu.get_regs().unwrap();
regs.rip = 0;
regs.rflags = 2;
vcpu.set_regs(&regs).unwrap();

// set sregs
let mut sregs = vcpu.get_sregs().unwrap();
sregs.cs.selector = 0;
sregs.cs.base = 0;
vcpu.set_sregs(&sregs).unwrap();

// copy code
// B84200 mov ax,0x42
// 3EA30010 mov [ds:0x1000],ax
// F4 hlt
let code = [0xb8, 0x42, 0x00, 0x3e, 0xa3, 0x00, 0x10, 0xf4];
guest_mem.write_slice(&code, GuestAddress(0x0)).unwrap();
let reason = vcpu.run().unwrap();
let regs = vcpu.get_regs().unwrap();
println!("exit reason: {:?}", reason);
println!("rax: {:x}, rip: {:X?}", regs.rax, regs.rip);
println!(
"memory at 0x10000: 0x{:X}",
guest_mem.read_obj::<u16>(GuestAddress(0x1000)).unwrap()
);
}

Welcome to my other publishing channels