Skip to content

Commit bca9b64

Browse files
mcmilkbehlendorf
authored andcommitted
ZTS: Use QEMU for tests on Linux and FreeBSD
This commit adds functional tests for these systems: - AlmaLinux 8, AlmaLinux 9, ArchLinux - CentOS Stream 9, Fedora 39, Fedora 40 - Debian 11, Debian 12 - FreeBSD 13, FreeBSD 14, FreeBSD 15 - Ubuntu 20.04, Ubuntu 22.04, Ubuntu 24.04 - enabled by default: - AlmaLinux 8, AlmaLinux 9 - Debian 11, Debian 12 - Fedora 39, Fedora 40 - FreeBSD 13, FreeBSD 14 Workflow for each operating system: - install qemu on the github runner - download current cloud image of operating system - start and init that image via cloud-init - install dependencies and poweroff system - start system and build openzfs and then poweroff again - clone build system and start 2 instances of it - run functional testings and complete in around 3h - when tests are done, do some logfile preparing - show detailed results for each system - in the end, generate the job summary Real-world benefits from this PR: 1. The github runner scripts are in the zfs repo itself. That means you can just open a PR against zfs, like "Add Fedora 41 tester", and see the results directly in the PR. ZFS admins no longer need manually to login to the buildbot server to update the buildbot config with new version of Fedora/Almalinux. 2. Github runners allow you to run the entire test suite against your private branch before submitting a formal PR to openzfs. Just open a PR against your private zfs repo, and the exact same Fedora/Alma/FreeBSD runners will fire up and run ZTS. This can be useful if you want to iterate on a ZTS change before submitting a formal PR. 3. buildbot is incredibly cumbersome. Our buildbot config files alone are ~1500 lines (not including any build/setup scripts)! It's a huge pain to setup. 4. We're running the super ancient buildbot 0.8.12. It's so ancient it requires python2. We actually have to build python2 from source for almalinux9 just to get it to run. Ugrading to a more modern buildbot is a huge undertaking, and the UI on the newer versions is worse. 5. Buildbot uses EC2 instances. EC2 is a pain because: * It costs money * They throttle IOPS and CPU usage, leading to mysterious, * hard-to-diagnose, failures and timeouts in ZTS. * EC2 is high maintenance. We have to setup security groups, SSH * keys, networking, users, etc, in AWS and it's a pain. We also * have to periodically go in an kill zombie EC2 instances that * buildbot is unable to kill off. 6. Buildbot doesn't always handle failures well. One of the things we saw in the past was the FreeBSD builders would often die, and each builder death would take up a "slot" in buildbot. So we would periodically have to restart buildbot via a cron job to get the slots back. 7. This PR divides up the ZTS test list into two parts, launches two VMs, and on each VM runs half the test suite. The test results are then merged and shown in the sumary page. So we're basically parallelizing ZTS on the same github runner. This leads to lower overall ZTS runtimes (2.5-3 hours vs 4+ hours on buildbot), and one unified set of results per runner, which is nice. 8. Since the tests are running on a VM, we have much more control over what happens. We can capture the serial console output even if the test completely brings down the VM. In the future, we could also restart the test on the VM where it left off, so that if a single test panics the VM, we can just restart it and run the remaining ZTS tests (this functionaly is not yet implemented though, just an idea). 9. Using the runners, users can manually kill or restart a test run via the github IU. That really isn't possible with buildbot unless you're an admin. 10. Anecdotally, the tests seem to be more stable and constant under the QEMU runners. Reviewed by: Brian Behlendorf <[email protected]> Signed-off-by: Tino Reichardt <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #16537
1 parent c4d1a19 commit bca9b64

File tree

14 files changed

+1490
-3
lines changed

14 files changed

+1490
-3
lines changed

.github/workflows/scripts/README.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
2+
Workflow for each operating system:
3+
- install qemu on the github runner
4+
- download current cloud image of operating system
5+
- start and init that image via cloud-init
6+
- install dependencies and poweroff system
7+
- start system and build openzfs and then poweroff again
8+
- clone build system and start 2 instances of it
9+
- run functional testings and complete in around 3h
10+
- when tests are done, do some logfile preparing
11+
- show detailed results for each system
12+
- in the end, generate the job summary
13+
14+
/TR 14.09.2024
Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
#!/bin/awk -f
2+
#
3+
# Merge multiple ZTS tests results summaries into a single summary. This is
4+
# needed when you're running different parts of ZTS on different tests
5+
# runners or VMs.
6+
#
7+
# Usage:
8+
#
9+
# ./merge_summary.awk summary1.txt [summary2.txt] [summary3.txt] ...
10+
#
11+
# or:
12+
#
13+
# cat summary*.txt | ./merge_summary.awk
14+
#
15+
BEGIN {
16+
i=-1
17+
pass=0
18+
fail=0
19+
skip=0
20+
state=""
21+
cl=0
22+
el=0
23+
upl=0
24+
ul=0
25+
26+
# Total seconds of tests runtime
27+
total=0;
28+
}
29+
30+
# Skip empty lines
31+
/^\s*$/{next}
32+
33+
# Skip Configuration and Test lines
34+
/^Test:/{state=""; next}
35+
/Configuration/{state="";next}
36+
37+
# When we see "test-runner.py" stop saving config lines, and
38+
# save test runner lines
39+
/test-runner.py/{state="testrunner"; runner=runner$0"\n"; next}
40+
41+
# We need to differentiate the PASS counts from test result lines that start
42+
# with PASS, like:
43+
#
44+
# PASS mv_files/setup
45+
#
46+
# Use state="pass_count" to differentiate
47+
#
48+
/Results Summary/{state="pass_count"; next}
49+
/PASS/{ if (state=="pass_count") {pass += $2}}
50+
/FAIL/{ if (state=="pass_count") {fail += $2}}
51+
/SKIP/{ if (state=="pass_count") {skip += $2}}
52+
/Running Time/{
53+
state="";
54+
running[i]=$3;
55+
split($3, arr, ":")
56+
total += arr[1] * 60 * 60;
57+
total += arr[2] * 60;
58+
total += arr[3]
59+
next;
60+
}
61+
62+
/Tests with results other than PASS that are expected/{state="expected_lines"; next}
63+
/Tests with result of PASS that are unexpected/{state="unexpected_pass_lines"; next}
64+
/Tests with results other than PASS that are unexpected/{state="unexpected_lines"; next}
65+
{
66+
if (state == "expected_lines") {
67+
expected_lines[el] = $0
68+
el++
69+
}
70+
71+
if (state == "unexpected_pass_lines") {
72+
unexpected_pass_lines[upl] = $0
73+
upl++
74+
}
75+
if (state == "unexpected_lines") {
76+
unexpected_lines[ul] = $0
77+
ul++
78+
}
79+
}
80+
81+
# Reproduce summary
82+
END {
83+
print runner;
84+
print "\nResults Summary"
85+
print "PASS\t"pass
86+
print "FAIL\t"fail
87+
print "SKIP\t"skip
88+
print ""
89+
print "Running Time:\t"strftime("%T", total, 1)
90+
if (pass+fail+skip > 0) {
91+
percent_passed=(pass/(pass+fail+skip) * 100)
92+
}
93+
printf "Percent passed:\t%3.2f%", percent_passed
94+
95+
print "\n\nTests with results other than PASS that are expected:"
96+
asort(expected_lines, sorted)
97+
for (j in sorted)
98+
print sorted[j]
99+
100+
print "\n\nTests with result of PASS that are unexpected:"
101+
asort(unexpected_pass_lines, sorted)
102+
for (j in sorted)
103+
print sorted[j]
104+
105+
print "\n\nTests with results other than PASS that are unexpected:"
106+
asort(unexpected_lines, sorted)
107+
for (j in sorted)
108+
print sorted[j]
109+
}
Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
#!/usr/bin/env bash
2+
3+
######################################################################
4+
# 1) setup qemu instance on action runner
5+
######################################################################
6+
7+
set -eu
8+
9+
# install needed packages
10+
export DEBIAN_FRONTEND="noninteractive"
11+
sudo apt-get -y update
12+
sudo apt-get install -y axel cloud-image-utils daemonize guestfs-tools \
13+
ksmtuned virt-manager linux-modules-extra-$(uname -r) zfsutils-linux
14+
15+
# generate ssh keys
16+
rm -f ~/.ssh/id_ed25519
17+
ssh-keygen -t ed25519 -f ~/.ssh/id_ed25519 -q -N ""
18+
19+
# we expect RAM shortage
20+
cat << EOF | sudo tee /etc/ksmtuned.conf > /dev/null
21+
# https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/7/html/virtualization_tuning_and_optimization_guide/chap-ksm
22+
KSM_MONITOR_INTERVAL=60
23+
24+
# Millisecond sleep between ksm scans for 16Gb server.
25+
# Smaller servers sleep more, bigger sleep less.
26+
KSM_SLEEP_MSEC=10
27+
KSM_NPAGES_BOOST=300
28+
KSM_NPAGES_DECAY=-50
29+
KSM_NPAGES_MIN=64
30+
KSM_NPAGES_MAX=2048
31+
32+
KSM_THRES_COEF=25
33+
KSM_THRES_CONST=2048
34+
35+
LOGFILE=/var/log/ksmtuned.log
36+
DEBUG=1
37+
EOF
38+
sudo systemctl restart ksm
39+
sudo systemctl restart ksmtuned
40+
41+
# not needed
42+
sudo systemctl stop docker.socket
43+
sudo systemctl stop multipathd.socket
44+
45+
# remove default swapfile and /mnt
46+
sudo swapoff -a
47+
sudo umount -l /mnt
48+
DISK="/dev/disk/cloud/azure_resource-part1"
49+
sudo sed -e "s|^$DISK.*||g" -i /etc/fstab
50+
sudo wipefs -aq $DISK
51+
sudo systemctl daemon-reload
52+
53+
sudo modprobe loop
54+
sudo modprobe zfs
55+
56+
# partition the disk as needed
57+
DISK="/dev/disk/cloud/azure_resource"
58+
sudo sgdisk --zap-all $DISK
59+
sudo sgdisk -p \
60+
-n 1:0:+16G -c 1:"swap" \
61+
-n 2:0:0 -c 2:"tests" \
62+
$DISK
63+
sync
64+
sleep 1
65+
66+
# swap with same size as RAM
67+
sudo mkswap $DISK-part1
68+
sudo swapon $DISK-part1
69+
70+
# 60GB data disk
71+
SSD1="$DISK-part2"
72+
73+
# 10GB data disk on ext4
74+
sudo fallocate -l 10G /test.ssd1
75+
SSD2=$(sudo losetup -b 4096 -f /test.ssd1 --show)
76+
77+
# adjust zfs module parameter and create pool
78+
exec 1>/dev/null
79+
ARC_MIN=$((1024*1024*256))
80+
ARC_MAX=$((1024*1024*512))
81+
echo $ARC_MIN | sudo tee /sys/module/zfs/parameters/zfs_arc_min
82+
echo $ARC_MAX | sudo tee /sys/module/zfs/parameters/zfs_arc_max
83+
echo 1 | sudo tee /sys/module/zfs/parameters/zvol_use_blk_mq
84+
sudo zpool create -f -o ashift=12 zpool $SSD1 $SSD2 \
85+
-O relatime=off -O atime=off -O xattr=sa -O compression=lz4 \
86+
-O mountpoint=/mnt/tests
87+
88+
# no need for some scheduler
89+
for i in /sys/block/s*/queue/scheduler; do
90+
echo "none" | sudo tee $i > /dev/null
91+
done

0 commit comments

Comments
 (0)