Skip to content

Support for PVH boot protocol in Firecracker #5048

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Mar 6, 2025
Merged
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,11 @@ and this project adheres to
kernels. For older kernels physical counter will still be passed to the guest
unmodified. See more info
[here](https://github.com/firecracker-microvm/firecracker/blob/main/docs/prod-host-setup.md#arm-only-vm-physical-counter-behaviour)
- [#5048](https://github.com/firecracker-microvm/firecracker/pull/5048): Added
support for [PVH boot mode](docs/pvh.md). This is used when an x86 kernel
provides the appropriate ELF Note to indicate that PVH boot mode is supported.
Linux kernels newer than 5.0 compiled with `CONFIG_PVH=y` set this ELF Note,
as do FreeBSD kernels.

### Changed

Expand Down
15 changes: 15 additions & 0 deletions docs/pvh.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# PVH boot mode

Firecracker supports booting x86 kernels in "PVH direct boot" mode
[as specified by the Xen project](https://github.com/xen-project/xen/blob/master/docs/misc/pvh.pandoc).
If a kernel is provided which contains the XEN_ELFNOTE_PHYS32_ENTRY ELF Note
then this boot mode will be used. This boot mode was designed for virtualized
environments which load the kernel directly, and is simpler than the "Linux
boot" mode which is designed to be launched from a legacy boot loader.

PVH boot mode can be enabled for Linux by setting `CONFIG_PVH=y` in the kernel
configuration. (This is not the default setting.)

PVH boot mode is enabled by default in FreeBSD, which has support for
Firecracker starting with FreeBSD 14.0. Instructions on building a FreeBSD
kernel and root filesystem are available [here](rootfs-and-kernel-setup.md).
45 changes: 43 additions & 2 deletions docs/rootfs-and-kernel-setup.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Creating Custom rootfs and kernel Images

## Creating a kernel Image
## Creating a Linux kernel Image

### Manual compilation

Expand Down Expand Up @@ -79,7 +79,7 @@ but without ACPI support) and `6.1`.
After the command finishes, the kernels along with the corresponding KConfig
used will be stored under `resources/$(uname -m)`.

## Creating a rootfs Image
## Creating a Linux rootfs Image

A rootfs image is just a file system image, that hosts at least an init system.
For instance, our getting started guide uses an ext4 filesystem image. Note
Expand Down Expand Up @@ -185,3 +185,44 @@ adjust the script(s) to suit your use case.

You should now have a rootfs image (`ubuntu-22.04.ext4`), that you can boot with
Firecracker.

## Creating FreeBSD rootfs and kernel Images

Here's a quick step-by-step guide to building a FreeBSD rootfs and kernel that
Firecracker can boot:

1. Boot a FreeBSD system. In EC2, the
[FreeBSD 13 Marketplace image](https://aws.amazon.com/marketplace/pp/prodview-ukzmy5dzc6nbq)
is a good option; you can also use weekly snapshot AMIs published by the
FreeBSD project. (Firecracker support is in FreeBSD 14 and later, so you'll
need FreeBSD 13 or later to build it.)

The build will require about 50 GB of disk space, so size the disk
appropriately.

1. Log in to the FreeBSD system and become root. If using EC2, you'll want to
ssh in as `ec2-user` with your chosen SSH key and then `su` to become root.

1. Install git and check out the FreeBSD src tree:

```sh
pkg install -y git
git clone https://git.freebsd.org/src.git /usr/src
```

Firecracker support is available since FreeBSD 14.0 (released November 2023).

1. Build FreeBSD:

```sh
make -C /usr/src buildworld buildkernel KERNCONF=FIRECRACKER
make -C /usr/src/release firecracker DESTDIR=`pwd`
```

You should now have a rootfs `freebsd-rootfs.bin` and a kernel
`freebsd-kern.bin` in the current directory (or elsewhere if you change the
`DESTDIR` value) that you can boot with Firecracker. Note that the FreeBSD
rootfs generated in this manner is somewhat minimized compared to "stock"
FreeBSD; it omits utilities which are only relevant on physical systems (e.g.,
utilities related to floppy disks, USB devices, and some network interfaces) and
also debug files and the system compiler.
32 changes: 32 additions & 0 deletions src/vmm/src/arch/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@

use log::warn;
use serde::{Deserialize, Serialize};
use vm_memory::GuestAddress;

/// Module for aarch64 related functionality.
#[cfg(target_arch = "aarch64")]
Expand Down Expand Up @@ -77,3 +78,34 @@
write!(f, "{:?}", self)
}
}

/// Suported boot protocols for
#[derive(Debug, Copy, Clone, PartialEq)]
pub enum BootProtocol {
/// Linux 64-bit boot protocol
LinuxBoot,
#[cfg(target_arch = "x86_64")]
/// PVH boot protocol (x86/HVM direct boot ABI)
PvhBoot,
}

impl fmt::Display for BootProtocol {
fn fmt(&self, f: &mut ::std::fmt::Formatter) -> ::std::fmt::Result {
match self {
BootProtocol::LinuxBoot => write!(f, "Linux 64-bit boot protocol"),
#[cfg(target_arch = "x86_64")]
BootProtocol::PvhBoot => write!(f, "PVH boot protocol"),
}
}

Check warning on line 99 in src/vmm/src/arch/mod.rs

View check run for this annotation

Codecov / codecov/patch

src/vmm/src/arch/mod.rs#L93-L99

Added lines #L93 - L99 were not covered by tests
}

#[derive(Debug, Copy, Clone)]
/// Specifies the entry point address where the guest must start
/// executing code, as well as which boot protocol is to be used
/// to configure the guest initial state.
pub struct EntryPoint {
/// Address in guest memory where the guest must start execution
pub entry_addr: GuestAddress,
/// Specifies which boot protocol to use
pub protocol: BootProtocol,
}
36 changes: 34 additions & 2 deletions src/vmm/src/arch/x86_64/gdt.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
// Copyright © 2020, Oracle and/or its affiliates.
//
// Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
// SPDX-License-Identifier: Apache-2.0
//
Expand All @@ -24,8 +26,38 @@ fn get_base(entry: u64) -> u64 {
| (((entry) & 0x0000_0000_FFFF_0000) >> 16)
}

// Extract the segment limit from the GDT segment descriptor.
//
// In a segment descriptor, the limit field is 20 bits, so it can directly describe
// a range from 0 to 0xFFFFF (1 MB). When G flag is set (4-KByte page granularity) it
// scales the value in the limit field by a factor of 2^12 (4 Kbytes), making the effective
// limit range from 0xFFF (4 KBytes) to 0xFFFF_FFFF (4 GBytes).
//
// However, the limit field in the VMCS definition is a 32 bit field, and the limit value is not
// automatically scaled using the G flag. This means that for a desired range of 4GB for a
// given segment, its limit must be specified as 0xFFFF_FFFF. Therefore the method of obtaining
// the limit from the GDT entry is not sufficient, since it only provides 20 bits when 32 bits
// are necessary. Fortunately, we can check if the G flag is set when extracting the limit since
// the full GDT entry is passed as an argument, and perform the scaling of the limit value to
// return the full 32 bit value.
//
// The scaling mentioned above is required when using PVH boot, since the guest boots in protected
// (32-bit) mode and must be able to access the entire 32-bit address space. It does not cause
// issues for the case of direct boot to 64-bit (long) mode, since in 64-bit mode the processor does
// not perform runtime limit checking on code or data segments.
//
// (For more information concerning the formats of segment descriptors, VMCS fields, et cetera,
// please consult the Intel Software Developer Manual.)
fn get_limit(entry: u64) -> u32 {
((((entry) & 0x000F_0000_0000_0000) >> 32) as u32) | (((entry) & 0x0000_0000_0000_FFFF) as u32)
#[allow(clippy::cast_possible_truncation)] // clearly, truncation is not possible
let limit: u32 =
((((entry) & 0x000F_0000_0000_0000) >> 32) | ((entry) & 0x0000_0000_0000_FFFF)) as u32;

// Perform manual limit scaling if G flag is set
match get_g(entry) {
0 => limit,
_ => (limit << 12) | 0xFFF, // G flag is either 0 or 1
}
}

fn get_g(entry: u64) -> u8 {
Expand Down Expand Up @@ -109,7 +141,7 @@ mod tests {
assert_eq!(0xB, seg.type_);
// base and limit
assert_eq!(0x10_0000, seg.base);
assert_eq!(0xfffff, seg.limit);
assert_eq!(0xffff_ffff, seg.limit);
assert_eq!(0x0, seg.unusable);
}
}
11 changes: 11 additions & 0 deletions src/vmm/src/arch/x86_64/layout.rs
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,17 @@ pub const IRQ_MAX: u32 = 23;
/// Address for the TSS setup.
pub const KVM_TSS_ADDRESS: u64 = 0xfffb_d000;

/// Address of the hvm_start_info struct used in PVH boot
pub const PVH_INFO_START: u64 = 0x6000;

/// Starting address of array of modules of hvm_modlist_entry type.
/// Used to enable initrd support using the PVH boot ABI.
pub const MODLIST_START: u64 = 0x6040;

/// Address of memory map table used in PVH boot. Can overlap
/// with the zero page address since they are mutually exclusive.
pub const MEMMAP_START: u64 = 0x7000;

/// The 'zero page', a.k.a linux kernel bootparams.
pub const ZERO_PAGE_START: u64 = 0x7000;

Expand Down
Loading