Next Stop - Ihcblog!

Some creations and thoughts sharing | sub site:ihc.im

0%

Mini VMM in Rust 3 - Run Real Linux Kernel

This article also has a Chinese version.

This series of articles mainly records my attempt to implement a Hypervisor using Rust. The table of contents:

  1. Mini VMM in Rust - Basic
  2. Mini VMM in Rust - Mode Switch
  3. Mini VMM in Rust - Run Real Linux Kernel
  4. Mini VMM in Rust - Implement Virtio Devices

This article is the third in the series, where we’ll do some preparatory work and actually get a real Linux system running.

In the previous chapters, we have managed to run arbitrary code in 64-bit mode. The goal of this chapter is to get a real Linux Kernel up and running.

Some might wonder, Linux can also be started in real mode, so why do we need to go through all this trouble? It’s because under normal circumstances, Linux relies on a bootloader to perform the mode switch and kernel code loading, whereas our VMM can handle this step more efficiently. After we switch modes, we just need to ensure that the kernel code and initrd are loaded into the corresponding page table entries in memory, then we can directly jump-start the vcpu.

Environment Preparation

Since we only aim to start Linux and not a full-blown Linux with a hard disk, we just need to prepare:

  1. Kernel file: You can compile it yourself, or download a pre-compiled and optimized file from the address provided by firecracker.
  2. initrd image: (This is not actually a disk image, it just carries the old name) You can package it yourself, or use a script to create one.

Here is a simple guide I wrote while trying out Rust for Linux, which includes kernel compilation, manual initrd building, and booting. It can serve as a reference. However, building the Kernel and initrd is not our current focus. To ensure we are not sidetracked by these issues, we’ll just use ready-made ones.

vmlinux.bin: https://s3.amazonaws.com/spec.ccfc.min/img/quickstart_guide/x86_64/kernels/vmlinux.bin

initrd.img: Build according to https://github.com/marcov/firecracker-initrd.git (Note: this is somewhat outdated, it may prompt that the root user’s password is too simple, just modify it manually)

We place both vmlinux.bin and initrd.img files under /tmp/mini-kvm.

IRQ and PIT Creation

PIC and APIC

In addition to the CPU and memory, another important part of a computer is I/O devices. There are two ways to determine if a device has data: either the CPU polls, which can be costly when done frequently, and incur latency when done too infrequently, or the device notifies the CPU when the data is ready, through what is known as an interrupt. At the end of each instruction cycle, the CPU checks the interrupt flag IF to see if it’s set, and if so, it jumps to the corresponding interrupt handler.

External devices come in all shapes and sizes, so it’s not feasible for the CPU to have corresponding pins to receive interrupts for each type of device. Therefore, a dispatcher role played by hardware is needed to assist. IBM designed the 8259A interrupt controller with 8 signal lines, working in programmable form, allowing dynamic registration of pins and priorities, interrupt masking, and more. To support more peripherals, multiple 8259As are often cascaded to work together. This type of programmable interrupt controller is known as PIC (Programmable Interrupt Controller).

In the era of multiple CPUs, Intel proposed APIC (Advanced Programmable Interrupt Controller) technology. APIC consists of two parts: one is LAPIC (Local APIC), which exists in each CPU (now there’s one in each logical core); the other is IOAPIC, which might be singular or plural, connecting to external devices. Both are interconnected via the APIC Bus. External devices broadcast interrupts to LAPIC via IOAPIC, and LAPIC decides whether to handle them.

apic

IRQ Virtualization

KVM has virtualized IRQ chips for us, and we only need to create it to use:

1
vm.create_irq_chip().unwrap();

For the need to trigger a certain interrupt, we only need to register an EventFd and the corresponding IRQ number:

1
vm.register_irqfd(&evtfd, 0).unwrap();

Clock Signal Virtualization

In computer systems, there are two types of time-related devices: clocks and timers. We can obtain the current time information through a clock, such as the TSC (Time Stamp Counter) device; with a timer, we can trigger interrupts at a specific time or at regular intervals to make the CPU aware of the passage of time while executing userspace code, such as a PIT (Programmable Interval Timer).

The PIT has relatively low precision and is only used during system startup; after startup, the LAPIC Timer is used, which operates within the CPU with higher precision.

To create a virtual PIT device, we just need to use KVM’s capabilities:

1
2
3
4
5
let pit_config = kvm_pit_config {
flags: KVM_PIT_SPEAKER_DUMMY,
..Default::default()
};
vm.create_pit2(pit_config).unwrap();

CPUID Handling

Information about the CPU is obtained through the CPUID instruction, and we need to modify the CPUID seen inside the VM. We need to tell KVM the CPUID information the Guest expects to see at the very beginning so that when the Guest executes CPUID leading to VM_EXIT later on, KVM can handle it themselves without having to involve the userspace VMM.

You can refer to the specific standards at https://en.wikipedia.org/wiki/CPUID and http://www.flounder.com/cpuid_explorer2.htm.

Simple Example

As an example, let’s look at the register data corresponding to function = 0:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
let mut kvm_cpuid = kvm.get_supported_cpuid(KVM_MAX_CPUID_ENTRIES).unwrap();
let entries = kvm_cpuid.as_mut_slice();
for entry in entries.iter_mut() {
match entry.function {
0 => {
println!("EBX: {:x}", entry.ebx);
println!("EDX: {:x}", entry.edx);
println!("ECX: {:x}", entry.ecx);
}
_ => (),
}
}

// EBX: 756e6547
// EDX: 49656e69
// ECX: 6c65746e

// "\\x47\\x65\\x6e\\x75\\x69\\x6e\\x65\\x49\\x6e\\x74\\x65\\x6c" =>
// "GenuineIntel"

Another example is 0x40000000, which stores the string KVMKVMKVM in the registers. You can see the specifics here: https://01.org/linuxgraphics/gfx-docs/drm/virt/kvm/cpuid.html

Structure Definition

The structure definition for kvm_cpuid_entry2 is as follows:

1
2
3
4
5
6
7
8
9
10
11
12
#[repr(C)]
#[derive(Debug, Default, Copy, Clone, PartialEq, Versionize)]
pub struct kvm_cpuid_entry2 {
pub function: __u32,
pub index: __u32,
pub flags: __u32,
pub eax: __u32,
pub ebx: __u32,
pub ecx: __u32,
pub edx: __u32,
pub padding: [__u32; 3usize],
}

You might be curious, isn’t executing the CPUID instruction just a matter of setting EAX/ECX? Where do this function and index come from? Referring here https://elixir.bootlin.com/linux/latest/source/arch/x86/kvm/cpuid.c#L1392 we can see that the function is obtained from *EAX, and the index is obtained from *ECX. So when we cross-reference the previous specifications, we can map function and index to *EAX, *ECX, respectively.

Setting CPUID

This part mainly references firecracker code, where some configurations may be necessary and some may not.

Referring to the wiki mentioned earlier, we can find when EAX=1 (since this is an input, it corresponds to our function=1):

  1. Bit 31 of ECX should be set to 1 to indicate a hypervisor.
  2. Bits 32:24 of EBX should be set to the Local APIC ID, number from 0 for multiple vcpus.
  3. Bits 15:8 of EBX should be set to the CLFLUSH line size. On x86 the cache line is usually 64 bytes, according to the wiki, our set value will be multiplied by 8 to set the actual value, so we should set it to 8.
  4. Bit 19 of EDX should be set to 1 to enable CLFLUSH, this setting only takes effect if the CLFLUSH line size is set. TODO: Why haven’t the reference projects set this one?
  5. Bits 23:16 of EBX should be set to the number of logical processors in a single physical package, usually set to the power of two greater or equal to the number of vCPUs (although it’s okay not to set it).
  6. Bit 28 of EDX should be set to 1 to enable hyper-threading, which only takes effect if the previous logical processor number is configured. Usually set when vCPU > 1.
  7. Bit 24 of ECX should be set to 1 to enable tsc-deadline.

EAX=4 mainly relates to cache and cores, such as how many cores are on one socket:

  1. Omitted for brevity.

EAX=6 Fan and power management:

  1. Set bit 3 of ECX to 0, which disables Performance-Energy Bias capability.
  2. Set bit 1 of EAX to 0, which disables Intel Turbo Boost Technology capability.

EAX=10 Performance monitoring:

  1. Set everything to 0 to turn it off.

EAX=11 Extended Topology Entry:

  1. Omitted for brevity.

EAX=0x80000002..=0x80000004 CPU model information:

  1. You can make it up yourself.

Simple Handling

In fact, you can just pass the cpuid out directly without handling it:

1
2
let kvm_cpuid = kvm.get_supported_cpuid(KVM_MAX_CPUID_ENTRIES).unwrap();
vcpu.set_cpuid2(&kvm_cpuid).unwrap();

Here we’ll apply a workaround for the time being and come back to address this part later.

Setting TSS

TODO: Educate on TSS & explain why KVM needs to do this

1
2
const KVM_TSS_ADDRESS: usize = 0xfffb_d000;
vm.set_tss_address(KVM_TSS_ADDRESS as usize).expect("set tss failed");

Loading the Kernel and initrd

We have to do the work of the bootloader, load the kernel and initrd, boot parameters into the memory, and place some necessary information in memory to pass to the kernel.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
// load linux kernel
let mut kernel_file = File::open(KERNEL_PATH).expect("open kernel file failed");
let kernel_entry = Elf::load(
&guest_mem,
None,
&mut kernel_file,
Some(GuestAddress(HIMEM_START)),
)
.unwrap()
.kernel_load;

// load initrd
let initrd_content = std::fs::read(INITRD_PATH).expect("read initrd file failed");
let first_region = guest_mem.find_region(GuestAddress::new(0)).unwrap();
assert!(
initrd_content.len() <= first_region.size(),
"too big initrd"
);
let initrd_addr =
GuestAddress((first_region.size() - initrd_content.len()) as u64 & !(4096 - 1));
guest_mem
.read_from(
initrd_addr,
&mut Cursor::new(&initrd_content),
initrd_content.len(),
)
.unwrap();

// load boot command
let mut boot_cmdline = Cmdline::new(0x10000);
boot_cmdline.insert_str(BOOT_CMD).unwrap();
load_cmdline(&guest_mem, GuestAddress(BOOT_CMD_START), &boot_cmdline).unwrap();

Create the boot parameters and write them into memory:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
// crate and write boot_params
let mut params = boot_params::default();
// <https://www.kernel.org/doc/html/latest/x86/boot.html>
const KERNEL_TYPE_OF_LOADER: u8 = 0xff;
const KERNEL_BOOT_FLAG_MAGIC_NUMBER: u16 = 0xaa55;
const KERNEL_HDR_MAGIC_NUMBER: u32 = 0x5372_6448;
const KERNEL_MIN_ALIGNMENT_BYTES: u32 = 0x0100_0000;

params.hdr.type_of_loader = KERNEL_TYPE_OF_LOADER;
params.hdr.boot_flag = KERNEL_BOOT_FLAG_MAGIC_NUMBER;
params.hdr.header = KERNEL_HDR_MAGIC_NUMBER;
params.hdr.cmd_line_ptr = BOOT_CMD_START as u32;
params.hdr.cmdline_size = 1 + BOOT_CMD.len() as u32;
params.hdr.kernel_alignment = KERNEL_MIN_ALIGNMENT_BYTES;
params.hdr.ramdisk_image = initrd_addr.raw_value() as u32;
params.hdr.ramdisk_size = initrd_content.len() as u32;

// Value taken from <https://elixir.bootlin.com/linux/v5.10.68/source/arch/x86/include/uapi/asm/e820.h#L31>
const E820_RAM: u32 = 1;
const EBDA_START: u64 = 0x9fc00;
const FIRST_ADDR_PAST_32BITS: u64 = 1 << 32;
const MEM_32BIT_GAP_SIZE: u64 = 768 << 20;
const MMIO_MEM_START: u64 = FIRST_ADDR_PAST_32BITS - MEM_32BIT_GAP_SIZE;

add_e820_entry(&mut params, 0, EBDA_START, E820_RAM);
let last_addr = guest_mem.last_addr();
let first_addr_past_32bits = GuestAddress(FIRST_ADDR_PAST_32BITS);
let end_32bit_gap_start = GuestAddress(MMIO_MEM_START);
let himem_start = GuestAddress(HIMEM_START);
if last_addr < end_32bit_gap_start {
add_e820_entry(
&mut params,
himem_start.raw_value() as u64,
// it's safe to use unchecked_offset_from because
// mem_end > himem_start
last_addr.unchecked_offset_from(himem_start) as u64 + 1,
E820_RAM,
);
} else {
add_e820_entry(
&mut params,
himem_start.raw_value(),
// it's safe to use unchecked_offset_from because
// end_32bit_gap_start > himem_start
end_32bit_gap_start.unchecked_offset_from(himem_start),
E820_RAM,
);

if last_addr > first_addr_past_32bits {
add_e820_entry(
&mut params,
first_addr_past_32bits.raw_value(),
// it's safe to use unchecked_offset_from because
// mem_end > first_addr_past_32bits
last_addr.unchecked_offset_from(first_addr_past_32bits) + 1,
E820_RAM,
);
}
}
LinuxBootConfigurator::write_bootparams(
&BootParams::new(&params, GuestAddress(ZERO_PAGE_START)),
&guest_mem,
)
.unwrap();

fn add_e820_entry(params: &mut boot_params, addr: u64, size: u64, mem_type: u32) {
if params.e820_entries >= params.e820_table.len() as u8 {
panic!();
}
params.e820_table[params.e820_entries as usize].addr = addr;
params.e820_table[params.e820_entries as usize].size = size;
params.e820_table[params.e820_entries as usize].type_ = mem_type;
params.e820_entries += 1;
}

This normally involves the bootloader obtaining available memory information through a BIOS interrupt (interrupt number 0x15, AX=0xE820, hence the derived structure name e820 entry). Here, we manually represent the available memory as multiple e820 entries and pass them to the kernel.

TODO: Memory layout

Creating Input/Output Devices

There are generally two types of input/output devices: PortIO and mmap IO. Here, we will focus solely on PortIO communication.

PortIO has a 64K Port address space, with typical addresses including (reference link):

  • COM1: I/O port 0x3F8, IRQ 4
  • COM2: I/O port 0x2F8, IRQ 3
  • COM3: I/O port 0x3E8, IRQ 4
  • COM4: I/O port 0x2E8, IRQ 3

In Linux, /dev/ttyS{0/1…} corresponds to COM{1/2…}. Therefore, to get Linux console input and output via PortIO, one simply has to handle COM1 (0x3F8, IRQ 4) and specify console=ttyS0 in the boot arguments.

Here, we create an EventFd and register it with IRQ 4. When COM1 receives a PortIO, we can get a notification via this EventFd.

In practice, we use the vm_superio crate that provides an emulated serial port. We use this EventFd as its trigger and standard output as its output.

1
2
3
4
5
6
7
8
// initialize devices
let com_evt_1 = EventWrapper::new();
vm.register_irqfd(&com_evt_1.0, 4).unwrap();
let stdio_serial = Arc::new(Mutex::new(Serial::with_events(
com_evt_1.try_clone().unwrap(),
DummySerialEvent,
std::io::stdout(),
)));

To adapt to its interface, we need to make two additional structures: EventWrapper and DummySerialEvent. The main purpose is to implement Trigger and SerialEvents. This part of the code is not important; it is merely to satisfy the interface constraints.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
struct EventWrapper(EventFd);

impl EventWrapper {
pub fn new() -> Self {
Self(EventFd::new(EFD_NONBLOCK).unwrap())
}

pub fn try_clone(&self) -> std::io::Result<Self> {
self.0.try_clone().map(Self)
}
}

impl std::ops::Deref for EventWrapper {
type Target = EventFd;

fn deref(&self) -> &Self::Target {
&self.0
}
}

impl Trigger for EventWrapper {
type E = std::io::Error;

fn trigger(&self) -> std::io::Result<()> {
self.0.write(1)
}
}

struct DummySerialEvent;

impl SerialEvents for DummySerialEvent {
fn buffer_read(&self) {}
fn out_byte(&self) {}
fn tx_lost_byte(&self) {}
fn in_buffer_empty(&self) {}
}

When encountering VcpuExit::IoIn and VcpuExit::IoOut, we can obtain the corresponding PortIO address and data. At this point, after making the necessary checks, we can hand it over to stdio_serial for processing. For output, stdio_serial writes directly to stdout; for input, we need to handle it ourselves.

Vcpu Run

As previously mentioned, we need to forward the IoIn and IoOut events to the Serial for processing.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
// run vcpu in another thread
let exit_evt = EventWrapper::new();
let vcpu_exit_evt = exit_evt.try_clone().unwrap();
let stdio_serial_read = stdio_serial.clone();

std::thread::spawn(move || {
loop {
match vcpu.run() {
Ok(run) => match run {
VcpuExit::IoIn(addr, data) => {
if addr >= COM1 && addr - COM1 < 8 {
data[0] = stdio_serial_read.lock().unwrap().read((addr - COM1) as u8);
}
}
VcpuExit::IoOut(addr, data) => {
if addr >= COM1 && addr - COM1 < 8 {
let _ = stdio_serial_read
.lock()
.unwrap()
.write((addr - COM1) as u8, data[0]);
}
}
VcpuExit::MmioRead(_, _) => {}
VcpuExit::MmioWrite(_, _) => {}
VcpuExit::Hlt => {
println!("KVM_EXIT_HLT");
break;
}
VcpuExit::Shutdown => {
println!("KVM_EXIT_SHUTDOWN");
break;
}
r => {
println!("KVM_EXIT: {:?}", r);
}
},
Err(e) => {
println!("KVM Run error: {:?}", e);
break;
}
}
}
vcpu_exit_evt.trigger().unwrap();
});

Upon the completion of the thread’s execution, we can be notified via exit_evt, which allows our main thread to wait for stdin input while also waiting for the vcpu exit event.

Stdin Handling

The Serial device requires us to handle the input data ourselves, and while waiting for user-side stdin, we also need to wait for the vcpu exit so that the main thread can exit when the vm stops. As you may have guessed, we can use epoll as the multiplexing mechanism here since KVM is already Linux-only, eliminating the need to consider cross-platform issues.

Here, we use PollContext encapsulated by vmm_sys_util.

For stdin handling, we need to use raw mode, as we need to forward keystrokes such as CTRL+C.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
// process events
let stdin = std::io::stdin().lock();
stdin.set_raw_mode().expect("set terminal raw mode failed");

let poll: PollContext<u8> = PollContext::new().unwrap();
poll.add(&exit_evt.0, 0).unwrap();
poll.add(&stdin, 1).unwrap();
'l: loop {
let events: PollEvents<u8> = poll.wait().unwrap();
for event in events.iter_readable() {
match event.token() {
0 => {
println!("vcpu stopped, main loop exit");
break 'l;
}
1 => {
let mut out = [0u8; 64];
match stdin.read_raw(&mut out[..]) {
Ok(0) => {}
Ok(count) => {
stdio_serial
.lock()
.unwrap()
.enqueue_raw_bytes(&out[..count])
.expect("enqueue bytes failed");
}
Err(e) => {
println!("error while reading stdin: {:?}", e);
}
}
}
_ => unreachable!(),
}
}
}

Complete Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
use std::{
fs::File,
io::Cursor,
sync::{Arc, Mutex},
};

use kvm_bindings::{
kvm_pit_config, kvm_segment, kvm_userspace_memory_region, KVM_MAX_CPUID_ENTRIES,
KVM_MEM_LOG_DIRTY_PAGES, KVM_PIT_SPEAKER_DUMMY,
};
use kvm_ioctls::{Kvm, VcpuExit};
use linux_loader::{
bootparam::boot_params,
configurator::{linux::LinuxBootConfigurator, BootConfigurator, BootParams},
loader::{elf::Elf, load_cmdline, Cmdline, KernelLoader},
};
use vm_memory::{Address, Bytes, GuestAddress, GuestMemory, GuestMemoryMmap};
use vm_superio::{serial::SerialEvents, Serial, Trigger};
use vmm_sys_util::{
eventfd::{EventFd, EFD_NONBLOCK},
poll::{PollContext, PollEvents},
terminal::Terminal,
};

const MEMORY_SIZE: usize = 128 << 20;

const KVM_TSS_ADDRESS: usize = 0xfffb_d000;
const X86_CR0_PE: u64 = 0x1;
const X86_CR4_PAE: u64 = 0x20;
const X86_CR0_PG: u64 = 0x80000000;
const BOOT_GDT_OFFSET: u64 = 0x500;
const EFER_LME: u64 = 0x100;
const EFER_LMA: u64 = 0x400;

const HIMEM_START: u64 = 0x100000;
const BOOT_CMD_START: u64 = 0x20000;
const BOOT_STACK_POINTER: u64 = 0x8ff0;
const ZERO_PAGE_START: u64 = 0x7000;

const KERNEL_PATH: &str = "/tmp/mini-kvm/vmlinux.bin";
const INITRD_PATH: &str = "/tmp/mini-kvm/initrd.img";
const BOOT_CMD: &str = "console=ttyS0 noapic noacpi reboot=k panic=1 pci=off nomodule";

fn main() {
// create vm
let kvm = Kvm::new().expect("open kvm device failed");
let vm = kvm.create_vm().expect("create vm failed");

// initialize irq chip and pit
vm.create_irq_chip().unwrap();
let pit_config = kvm_pit_config {
flags: KVM_PIT_SPEAKER_DUMMY,
..Default::default()
};
vm.create_pit2(pit_config).unwrap();

// create memory
let guest_addr = GuestAddress(0x0);
let guest_mem = GuestMemoryMmap::<()>::from_ranges(&[(guest_addr, MEMORY_SIZE)]).unwrap();
let host_addr = guest_mem.get_host_address(guest_addr).unwrap();
let mem_region = kvm_userspace_memory_region {
slot: 0,
guest_phys_addr: 0,
memory_size: MEMORY_SIZE as u64,
userspace_addr: host_addr as u64,
flags: KVM_MEM_LOG_DIRTY_PAGES,
};
unsafe {
vm.set_user_memory_region(mem_region)
.expect("set user memory region failed")
};
vm.set_tss_address(KVM_TSS_ADDRESS as usize)
.expect("set tss failed");

// create vcpu and set cpuid
let vcpu = vm.create_vcpu(0).expect("create vcpu failed");
let kvm_cpuid = kvm.get_supported_cpuid(KVM_MAX_CPUID_ENTRIES).unwrap();
vcpu.set_cpuid2(&kvm_cpuid).unwrap();

// load linux kernel
let mut kernel_file = File::open(KERNEL_PATH).expect("open kernel file failed");
let kernel_entry = Elf::load(
&guest_mem,
None,
&mut kernel_file,
Some(GuestAddress(HIMEM_START)),
)
.unwrap()
.kernel_load;

// load initrd
let initrd_content = std::fs::read(INITRD_PATH).expect("read initrd file failed");
let first_region = guest_mem.find_region(GuestAddress::new(0)).unwrap();
assert!(
initrd_content.len() <= first_region.size(),
"too big initrd"
);
let initrd_addr =
GuestAddress((first_region.size() - initrd_content.len()) as u64 & !(4096 - 1));
guest_mem
.read_from(
initrd_addr,
&mut Cursor::new(&initrd_content),
initrd_content.len(),
)
.unwrap();

// load boot command
let mut boot_cmdline = Cmdline::new(0x10000);
boot_cmdline.insert_str(BOOT_CMD).unwrap();
load_cmdline(&guest_mem, GuestAddress(BOOT_CMD_START), &boot_cmdline).unwrap();

// set regs
let mut regs = vcpu.get_regs().unwrap();
regs.rip = kernel_entry.raw_value();
regs.rsp = BOOT_STACK_POINTER;
regs.rbp = BOOT_STACK_POINTER;
regs.rsi = ZERO_PAGE_START;
regs.rflags = 2;
vcpu.set_regs(&regs).unwrap();

// set sregs
let mut sregs = vcpu.get_sregs().unwrap();
const CODE_SEG: kvm_segment = seg_with_st(1, 0b1011);
const DATA_SEG: kvm_segment = seg_with_st(2, 0b0011);

// construct kvm_segment and set to segment registers
sregs.cs = CODE_SEG;
sregs.ds = DATA_SEG;
sregs.es = DATA_SEG;
sregs.fs = DATA_SEG;
sregs.gs = DATA_SEG;
sregs.ss = DATA_SEG;

// construct gdt table, write to memory and set it to register
let gdt_table: [u64; 3] = [
0, // NULL
to_gdt_entry(&CODE_SEG), // CODE
to_gdt_entry(&DATA_SEG), // DATA
];
let boot_gdt_addr = GuestAddress(BOOT_GDT_OFFSET);
for (index, entry) in gdt_table.iter().enumerate() {
let addr = guest_mem
.checked_offset(boot_gdt_addr, index * std::mem::size_of::<u64>())
.unwrap();
guest_mem.write_obj(*entry, addr).unwrap();
}
sregs.gdt.base = BOOT_GDT_OFFSET;
sregs.gdt.limit = std::mem::size_of_val(&gdt_table) as u16 - 1;

// enable protected mode
sregs.cr0 |= X86_CR0_PE;

// set page table
let boot_pml4_addr = GuestAddress(0xa000);
let boot_pdpte_addr = GuestAddress(0xb000);
let boot_pde_addr = GuestAddress(0xc000);

guest_mem
.write_slice(
&(boot_pdpte_addr.raw_value() as u64 | 0b11).to_le_bytes(),
boot_pml4_addr,
)
.unwrap();
guest_mem
.write_slice(
&(boot_pde_addr.raw_value() as u64 | 0b11).to_le_bytes(),
boot_pdpte_addr,
)
.unwrap();

for i in 0..512 {
guest_mem
.write_slice(
&((i << 21) | 0b10000011u64).to_le_bytes(),
boot_pde_addr.unchecked_add(i * 8),
)
.unwrap();
}
sregs.cr3 = boot_pml4_addr.raw_value() as u64;
sregs.cr4 |= X86_CR4_PAE;
sregs.cr0 |= X86_CR0_PG;
sregs.efer |= EFER_LMA | EFER_LME;
vcpu.set_sregs(&sregs).unwrap();

// crate and write boot_params
let mut params = boot_params::default();
// <https://www.kernel.org/doc/html/latest/x86/boot.html>
const KERNEL_TYPE_OF_LOADER: u8 = 0xff;
const KERNEL_BOOT_FLAG_MAGIC_NUMBER: u16 = 0xaa55;
const KERNEL_HDR_MAGIC_NUMBER: u32 = 0x5372_6448;
const KERNEL_MIN_ALIGNMENT_BYTES: u32 = 0x0100_0000;

params.hdr.type_of_loader = KERNEL_TYPE_OF_LOADER;
params.hdr.boot_flag = KERNEL_BOOT_FLAG_MAGIC_NUMBER;
params.hdr.header = KERNEL_HDR_MAGIC_NUMBER;
params.hdr.cmd_line_ptr = BOOT_CMD_START as u32;
params.hdr.cmdline_size = 1 + BOOT_CMD.len() as u32;
params.hdr.kernel_alignment = KERNEL_MIN_ALIGNMENT_BYTES;
params.hdr.ramdisk_image = initrd_addr.raw_value() as u32;
params.hdr.ramdisk_size = initrd_content.len() as u32;

// Value taken from <https://elixir.bootlin.com/linux/v5.10.68/source/arch/x86/include/uapi/asm/e820.h#L31>
const E820_RAM: u32 = 1;
const EBDA_START: u64 = 0x9fc00;
const FIRST_ADDR_PAST_32BITS: u64 = 1 << 32;
const MEM_32BIT_GAP_SIZE: u64 = 768 << 20;
const MMIO_MEM_START: u64 = FIRST_ADDR_PAST_32BITS - MEM_32BIT_GAP_SIZE;

add_e820_entry(&mut params, 0, EBDA_START, E820_RAM);
let last_addr = guest_mem.last_addr();
let first_addr_past_32bits = GuestAddress(FIRST_ADDR_PAST_32BITS);
let end_32bit_gap_start = GuestAddress(MMIO_MEM_START);
let himem_start = GuestAddress(HIMEM_START);
if last_addr < end_32bit_gap_start {
add_e820_entry(
&mut params,
himem_start.raw_value() as u64,
// it's safe to use unchecked_offset_from because
// mem_end > himem_start
last_addr.unchecked_offset_from(himem_start) as u64 + 1,
E820_RAM,
);
} else {
add_e820_entry(
&mut params,
himem_start.raw_value(),
// it's safe to use unchecked_offset_from because
// end_32bit_gap_start > himem_start
end_32bit_gap_start.unchecked_offset_from(himem_start),
E820_RAM,
);

if last_addr > first_addr_past_32bits {
add_e820_entry(
&mut params,
first_addr_past_32bits.raw_value(),
// it's safe to use unchecked_offset_from because
// mem_end > first_addr_past_32bits
last_addr.unchecked_offset_from(first_addr_past_32bits) + 1,
E820_RAM,
);
}
}
LinuxBootConfigurator::write_bootparams(
&BootParams::new(&params, GuestAddress(ZERO_PAGE_START)),
&guest_mem,
)
.unwrap();

// initialize devices
const COM1: u16 = 0x3f8;
let com_evt_1 = EventWrapper::new();
vm.register_irqfd(&com_evt_1.0, 4).unwrap();
let stdio_serial = Arc::new(Mutex::new(Serial::with_events(
com_evt_1.try_clone().unwrap(),
DummySerialEvent,
std::io::stdout(),
)));

// run vcpu in another thread
let exit_evt = EventWrapper::new();
let vcpu_exit_evt = exit_evt.try_clone().unwrap();
let stdio_serial_read = stdio_serial.clone();
std::thread::spawn(move || {
loop {
match vcpu.run() {
Ok(run) => match run {
VcpuExit::IoIn(addr, data) => {
if addr >= COM1 && addr - COM1 < 8 {
data[0] = stdio_serial_read.lock().unwrap().read((addr - COM1) as u8);
}
}
VcpuExit::IoOut(addr, data) => {
if addr >= COM1 && addr - COM1 < 8 {
let _ = stdio_serial_read
.lock()
.unwrap()
.write((addr - COM1) as u8, data[0]);
}
}
VcpuExit::MmioRead(_, _) => {}
VcpuExit::MmioWrite(_, _) => {}
VcpuExit::Hlt => {
println!("KVM_EXIT_HLT");
break;
}
VcpuExit::Shutdown => {
println!("KVM_EXIT_SHUTDOWN");
break;
}
r => {
println!("KVM_EXIT: {:?}", r);
}
},
Err(e) => {
println!("KVM Run error: {:?}", e);
break;
}
}
}
vcpu_exit_evt.trigger().unwrap();
});

// process events
let stdin = std::io::stdin().lock();
stdin.set_raw_mode().expect("set terminal raw mode failed");

let poll: PollContext<u8> = PollContext::new().unwrap();
poll.add(&exit_evt.0, 0).unwrap();
poll.add(&stdin, 1).unwrap();
'l: loop {
let events: PollEvents<u8> = poll.wait().unwrap();
for event in events.iter_readable() {
match event.token() {
0 => {
println!("vcpu stopped, main loop exit");
break 'l;
}
1 => {
let mut out = [0u8; 64];
match stdin.read_raw(&mut out[..]) {
Ok(0) => {}
Ok(count) => {
stdio_serial
.lock()
.unwrap()
.enqueue_raw_bytes(&out[..count])
.expect("enqueue bytes failed");
}
Err(e) => {
println!("error while reading stdin: {:?}", e);
}
}
}
_ => unreachable!(),
}
}
}
}

const fn seg_with_st(selector_index: u16, type_: u8) -> kvm_segment {
kvm_segment {
base: 0,
limit: 0x000fffff,
selector: selector_index << 3,
// 0b1011: Code, Executed/Read, accessed
// 0b0011: Data, Read/Write, accessed
type_,
present: 1,
dpl: 0,
// If L-bit is set, then D-bit must be cleared.
db: 0,
s: 1,
l: 1,
g: 1,
avl: 0,
unusable: 0,
padding: 0,
}
}

// Ref: <https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html> 3-10 Vol. 3A
const fn to_gdt_entry(seg: &kvm_segment) -> u64 {
let base = seg.base;
let limit = seg.limit as u64;
// flags: G, DB, L, AVL
let flags = (seg.g as u64 & 0x1) << 3
| (seg.db as u64 & 0x1) << 2
| (seg.l as u64 & 0x1) << 1
| (seg.avl as u64 & 0x1);
// access: P, DPL, S, Type
let access = (seg.present as u64 & 0x1) << 7
| (seg.dpl as u64 & 0x11) << 5
| (seg.s as u64 & 0x1) << 4
| (seg.type_ as u64 & 0x1111);
((base & 0xff00_0000u64) << 32)
| ((base & 0x00ff_ffffu64) << 16)
| (limit & 0x0000_ffffu64)
| ((limit & 0x000f_0000u64) << 32)
| (flags << 52)
| (access << 40)
}

fn add_e820_entry(params: &mut boot_params, addr: u64, size: u64, mem_type: u32) {
if params.e820_entries >= params.e820_table.len() as u8 {
panic!();
}
params.e820_table[params.e820_entries as usize].addr = addr;
params.e820_table[params.e820_entries as usize].size = size;
params.e820_table[params.e820_entries as usize].type_ = mem_type;
params.e820_entries += 1;
}

struct EventWrapper(EventFd);

impl EventWrapper {
pub fn new() -> Self {
Self(EventFd::new(EFD_NONBLOCK).unwrap())
}

pub fn try_clone(&self) -> std::io::Result<Self> {
self.0.try_clone().map(Self)
}
}

impl std::ops::Deref for EventWrapper {
type Target = EventFd;

fn deref(&self) -> &Self::Target {
&self.0
}
}

impl Trigger for EventWrapper {
type E = std::io::Error;

fn trigger(&self) -> std::io::Result<()> {
self.0.write(1)
}
}

struct DummySerialEvent;

impl SerialEvents for DummySerialEvent {
fn buffer_read(&self) {}
fn out_byte(&self) {}
fn tx_lost_byte(&self) {}
fn in_buffer_empty(&self) {}
}

Run it:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
[    0.000000] Linux version 4.14.174 (@57edebb99db7) (gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)) #2 SMP Wed Jul 14 11:47:24 UTC 2021
[ 0.000000] Command line: console=ttyS0 noapic noacpi reboot=k panic=1 pci=off nomodule
[ 0.000000] Disabled fast string operations
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x020: 'AVX-512 opmask'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x040: 'AVX-512 Hi256'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x080: 'AVX-512 ZMM_Hi256'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x200: 'Protection Keys User registers'
[ 0.000000] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
[ 0.000000] x86/fpu: xstate_offset[3]: 832, xstate_sizes[3]: 64
[ 0.000000] x86/fpu: xstate_offset[4]: 896, xstate_sizes[4]: 64
[ 0.000000] x86/fpu: xstate_offset[5]: 960, xstate_sizes[5]: 64
[ 0.000000] x86/fpu: xstate_offset[6]: 1024, xstate_sizes[6]: 512
[ 0.000000] x86/fpu: xstate_offset[7]: 1536, xstate_sizes[7]: 1024
[ 0.000000] x86/fpu: xstate_offset[9]: 2560, xstate_sizes[9]: 8
[ 0.000000] x86/fpu: Enabled xstate features 0x2ff, context size is 2568 bytes, using 'compacted' format.
[ 0.000000] e820: BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x0000000007ffffff] usable
[ 0.000000] NX (Execute Disable) protection: active
[ 0.000000] DMI not present or invalid.
[ 0.000000] tsc: Unable to calibrate against PIT
[ 0.000000] tsc: No reference (HPET/PMTIMER) available
[ 0.000000] e820: last_pfn = 0x8000 max_arch_pfn = 0x400000000
[ 0.000000] MTRR: Disabled
[ 0.000000] x86/PAT: MTRRs disabled, skipping PAT initialization too.
[ 0.000000] CPU MTRRs all blank - virtualized system.
[ 0.000000] x86/PAT: Configuration [0-7]: WB WT UC- UC WB WT UC- UC
[ 0.000000] Scanning 1 areas for low memory corruption
[ 0.000000] Using GB pages for direct mapping
[ 0.000000] RAMDISK: [mem 0x06525000-0x07ffffff]
[ 0.000000] No NUMA configuration found
[ 0.000000] Faking a node at [mem 0x0000000000000000-0x0000000007ffffff]
[ 0.000000] NODE_DATA(0) allocated [mem 0x06503000-0x06524fff]
[ 0.000000] Zone ranges:
[ 0.000000] DMA [mem 0x0000000000001000-0x0000000000ffffff]
[ 0.000000] DMA32 [mem 0x0000000001000000-0x0000000007ffffff]
[ 0.000000] Normal empty
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000000001000-0x000000000009efff]
[ 0.000000] node 0: [mem 0x0000000000100000-0x0000000007ffffff]
[ 0.000000] Initmem setup node 0 [mem 0x0000000000001000-0x0000000007ffffff]
[ 0.000000] smpboot: Boot CPU (id 0) not listed by BIOS
[ 0.000000] smpboot: Allowing 1 CPUs, 0 hotplug CPUs
[ 0.000000] PM: Registered nosave memory: [mem 0x00000000-0x00000fff]
[ 0.000000] PM: Registered nosave memory: [mem 0x0009f000-0x000fffff]
[ 0.000000] e820: [mem 0x08000000-0xffffffff] available for PCI devices
[ 0.000000] Booting paravirtualized kernel on bare hardware
[ 0.000000] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
[ 0.000000] random: get_random_bytes called from start_kernel+0x94/0x486 with crng_init=0
[ 0.000000] setup_percpu: NR_CPUS:128 nr_cpumask_bits:128 nr_cpu_ids:1 nr_node_ids:1
[ 0.000000] percpu: Embedded 41 pages/cpu s128600 r8192 d31144 u2097152
[ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 32137
[ 0.000000] Policy zone: DMA32
[ 0.000000] Kernel command line: console=ttyS0 noapic noacpi reboot=k panic=1 pci=off nomodule
[ 0.000000] PID hash table entries: 512 (order: 0, 4096 bytes)
[ 0.000000] Memory: 83524K/130680K available (8204K kernel code, 645K rwdata, 1480K rodata, 1324K init, 2792K bss, 47156K reserved, 0K cma-reserved)
[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[ 0.000000] Kernel/User page tables isolation: enabled
[ 0.000000] Hierarchical RCU implementation.
[ 0.000000] RCU restricting CPUs from NR_CPUS=128 to nr_cpu_ids=1.
[ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=1
[ 0.000000] NR_IRQS: 4352, nr_irqs: 24, preallocated irqs: 16
[ 0.000000] Console: colour dummy device 80x25
[ 0.000000] console [ttyS0] enabled
[ 0.024000] tsc: Unable to calibrate against PIT
[ 0.028000] tsc: No reference (HPET/PMTIMER) available
[ 0.032000] tsc: Marking TSC unstable due to could not calculate TSC khz
[ 0.040000] Calibrating delay loop... 5951.48 BogoMIPS (lpj=11902976)
[ 0.088000] pid_max: default: 32768 minimum: 301
[ 0.092000] Security Framework initialized
[ 0.096000] SELinux: Initializing.
[ 0.100000] Dentry cache hash table entries: 16384 (order: 5, 131072 bytes)
[ 0.108000] Inode-cache hash table entries: 8192 (order: 4, 65536 bytes)
[ 0.112000] Mount-cache hash table entries: 512 (order: 0, 4096 bytes)
[ 0.120000] Mountpoint-cache hash table entries: 512 (order: 0, 4096 bytes)
[ 0.132000] Disabled fast string operations
[ 0.140000] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
[ 0.148000] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0
[ 0.156000] Spectre V1 : Mitigation: usercopy/swapgs barriers and __user pointer sanitization
[ 0.164000] Spectre V2 : Mitigation: Full generic retpoline
[ 0.168000] Spectre V2 : Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch
[ 0.176000] Spectre V2 : Enabling Restricted Speculation for firmware calls
[ 0.184000] Spectre V2 : mitigation: Enabling conditional Indirect Branch Prediction Barrier
[ 0.188000] Speculative Store Bypass: Mitigation: Speculative Store Bypass disabled via prctl and seccomp
[ 0.196000] MDS: Vulnerable: Clear CPU buffers attempted, no microcode
[ 0.248000] Freeing SMP alternatives memory: 28K
[ 0.268000] smpboot: Max logical packages: 1
[ 0.272000] smpboot: SMP motherboard not detected
[ 0.276000] smpboot: SMP disabled
[ 0.276000] Not enabling interrupt remapping due to skipped IO-APIC setup
[ 0.500000] Performance Events: Skylake events, Intel PMU driver.
[ 0.504000] ... version: 2
[ 0.508000] ... bit width: 48
[ 0.512000] ... generic registers: 4
[ 0.516000] ... value mask: 0000ffffffffffff
[ 0.520000] ... max period: 000000007fffffff
[ 0.524000] ... fixed-purpose events: 3
[ 0.528000] ... event mask: 000000070000000f
[ 0.536000] Hierarchical SRCU implementation.
[ 0.544000] smp: Bringing up secondary CPUs ...
[ 0.548000] smp: Brought up 1 node, 1 CPU
[ 0.552000] smpboot: Total of 1 processors activated (5951.48 BogoMIPS)
[ 0.560000] devtmpfs: initialized
[ 0.564000] x86/mm: Memory block size: 128MB
[ 0.572000] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
[ 0.576000] futex hash table entries: 256 (order: 2, 16384 bytes)
[ 0.588000] NET: Registered protocol family 16
[ 0.596000] cpuidle: using governor ladder
[ 0.596000] cpuidle: using governor menu
[ 0.640000] HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages
[ 0.644000] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
[ 0.652000] SCSI subsystem initialized
[ 0.656000] pps_core: LinuxPPS API ver. 1 registered
[ 0.660000] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
[ 0.664000] PTP clock support registered
[ 0.664000] dmi: Firmware registration failed.
[ 0.672000] NetLabel: Initializing
[ 0.672000] NetLabel: domain hash size = 128
[ 0.676000] NetLabel: protocols = UNLABELED CIPSOv4 CALIPSO
[ 0.680000] NetLabel: unlabeled traffic allowed by default
[ 0.684000] clocksource: Switched to clocksource refined-jiffies
[ 0.688000] VFS: Disk quotas dquot_6.6.0
[ 0.692000] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[ 0.708001] NET: Registered protocol family 2
[ 0.712001] TCP established hash table entries: 1024 (order: 1, 8192 bytes)
[ 0.716002] TCP bind hash table entries: 1024 (order: 2, 16384 bytes)
[ 0.720002] TCP: Hash tables configured (established 1024 bind 1024)
[ 0.724002] UDP hash table entries: 256 (order: 1, 8192 bytes)
[ 0.728002] UDP-Lite hash table entries: 256 (order: 1, 8192 bytes)
[ 0.732003] NET: Registered protocol family 1
[ 0.736003] Unpacking initramfs...
[ 1.228034] Freeing initrd memory: 27500K
[ 1.232034] platform rtc_cmos: registered platform RTC device (no PNP device found)
[ 1.236034] Scanning for low memory corruption every 60 seconds
[ 1.240034] audit: initializing netlink subsys (disabled)
[ 1.244035] Initialise system trusted keyrings
[ 1.248035] Key type blacklist registered
[ 1.252035] audit: type=2000 audit(943920001.244:1): state=initialized audit_enabled=0 res=1
[ 1.256035] workingset: timestamp_bits=36 max_order=15 bucket_order=0
[ 1.264036] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[ 1.272036] Key type asymmetric registered
[ 1.276037] Asymmetric key parser 'x509' registered
[ 1.280037] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 252)
[ 1.288037] io scheduler noop registered (default)
[ 1.292038] io scheduler cfq registered
[ 1.296038] Serial: 8250/16550 driver, 1 ports, IRQ sharing disabled
[ 1.304038] serial8250: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a U6_16550A
[ 1.312039] loop: module loaded
[ 1.316039] Loading iSCSI transport class v2.0-870.
[ 1.320039] iscsi: registered transport (tcp)
[ 1.324040] tun: Universal TUN/TAP device driver, 1.6
[ 1.336040] i8042: Can't read CTR while initializing i8042
[ 1.340041] i8042: probe of i8042 failed with error -5
[ 1.344041] hidraw: raw HID events driver (C) Jiri Kosina
[ 1.348041] nf_conntrack version 0.5.0 (1024 buckets, 4096 max)
[ 1.356042] ip_tables: (C) 2000-2006 Netfilter Core Team
[ 1.360042] Initializing XFRM netlink socket
[ 1.364042] NET: Registered protocol family 10
[ 1.372043] Segment Routing with IPv6
[ 1.376043] NET: Registered protocol family 17
[ 1.380043] Bridge firewalling registered
[ 1.384043] NET: Registered protocol family 40
[ 1.388044] registered taskstats version 1
[ 1.392044] Loading compiled-in X.509 certificates
[ 1.396044] Loaded X.509 cert 'Build time autogenerated kernel key: e98e9d271da5d0a322cc4d7bfaa8c2c4c3e46010'
[ 1.404045] Key type encrypted registered
[ 1.416045] Freeing unused kernel memory: 1324K
[ 1.424046] Write protecting the kernel read-only data: 12288k
[ 1.440047] Freeing unused kernel memory: 2016K
[ 1.452048] Freeing unused kernel memory: 568K

OpenRC 0.44.10 is starting up Linux 4.14.174 (x86_64)

* Mounting /proc ... [ ok ]
* Mounting /run ... * /run/openrc: creating directory
* /run/lock: creating directory
* /run/lock: correcting owner
* Caching service dependencies ... [ ok ]
* Clock skew detected with `(null)'
* Adjusting mtime of `/run/openrc/deptree' to Fri Sep 23 07:15:15 2022

* WARNING: clock skew detected!
* WARNING: clock skew detected!
* Mounting devtmpfs on /dev ... [ ok ]
* Mounting /dev/mqueue ... [ ok ]
* Mounting /dev/pts ... [ ok ]
* Mounting /dev/shm ... [ ok ]
* Loading modules ...modprobe: can't change directory to '/lib/modules': No such file or directory
modprobe: can't change directory to '/lib/modules': No such file or directory
[ ok ]
* Mounting misc binary format filesystem ... [ ok ]
* Mounting /sys ... [ ok ]
* Mounting security filesystem ... [ ok ]
* Mounting debug filesystem ... [ ok ]
* Mounting SELinux filesystem ... [ ok ]
* Mounting persistent storage (pstore) filesystem ... [ ok ]
* WARNING: clock skew detected!
* Starting fcnet ... [ ok ]
* Checking local filesystems ... [ ok ]
* Remounting filesystems ... [ ok ]
* Mounting local filesystems ... [ ok ]
* Setting hostname ... [ ok ]
* Starting networking ... * eth0 ...Cannot find device "eth0"
Device "eth0" does not exist.
[ ok ]
* Starting networking ... * lo ... [ ok ]
* eth0 ... [ ok ]

Welcome to Alpine Linux 3.16
Kernel 4.14.174 on an x86_64 (ttyS0)

[ 2.744128] random: fast init done
localhost login: root
Password:
Welcome to Alpine!

The Alpine Wiki contains a large amount of how-to guides and general
information about administrating Alpine systems.
See <http://wiki.alpinelinux.org/>.

You can setup the system with the command: setup-alpine

You may change this message by editing /etc/motd.

login[1080]: root login on 'ttyS0'
localhost:~# pwd
/root
localhost:~# reboot -f
[ 15.780943] reboot: Restarting system
[ 15.780943] reboot: machine restart
KVM_EXIT_SHUTDOWN
vcpu stopped, main loop exit

Welcome to my other publishing channels