Skip to content

Rollback before zfs root is mounted #15025

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 20, 2023
Merged

Rollback before zfs root is mounted #15025

merged 1 commit into from
Jul 20, 2023

Conversation

outofforest
Copy link
Contributor

@outofforest outofforest commented Jun 30, 2023

On my machines I observed random failures caused by rollback happening after zfs root
is mounted. I've observed two types of failures:

  • zfs-rollback-bootfs.service fails saying that rollback must be done just before mounting the dataset
  • boot process fails and rescue console is entered.

After making this modification and testing it for couple of days none of those problems
have been observed anymore. Looks like dracut-mount.service does not enforce the correct
sequence and sysroot.mount does a better job here.

I don't know if dracut-mount.service is still needed in the After directive.
Maybe someone else is able to address this?

Motivation and Context

Description

How Has This Been Tested?

Installed fixed service on my machines and observed them for couple of days.

Machines where root filesystem is reverted to the original snapshot on every boot.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

Signed-off-by: Wojciech Małota-Wójcik [email protected]

On my machines I observe random failures caused by rollback happening after zfs root is mounted. I've observed two types of failures:
- zfs-rollback-bootfs.service fails sying that rollback must be done just before mounting the dataset
- boot process fails and rescue console is entered.

After making this modification and testing it for couple of days none of those problems have been observed anymore.

I don't know if `dracut-mount.service` is still needed in the `After` directive. Maybe someone else is able to address this?

Signed-off-by: Wojciech Małota-Wójcik <[email protected]>
@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Jul 14, 2023
@behlendorf
Copy link
Contributor

@gregory-lee-bartholomew would you mind taking a look at this PR.

@gregory-lee-bartholomew
Copy link
Contributor

gregory-lee-bartholomew commented Jul 15, 2023

I just did a quick test of this patch on my PC and it looks fine.

$ cat /usr/lib/dracut/modules.d/90zfs/zfs-rollback-bootfs.service 
[Unit]
Description=Rollback bootfs just before it is mounted
Requisite=zfs-import.target
After=zfs-import.target dracut-pre-mount.service zfs-snapshot-bootfs.service
Before=dracut-mount.service sysroot.mount
DefaultDependencies=no
ConditionKernelCommandLine=bootfs.rollback
ConditionEnvironment=BOOTFS

[Service]
Type=oneshot
ExecStart=/bin/sh -c '. /lib/dracut-lib.sh; SNAPNAME="$(getarg bootfs.rollback)"; exec /sbin/zfs rollback -Rf "$BOOTFS@${SNAPNAME:-%v}"'
RemainAfterExit=yes

$ sudo dracut -f /boot/$(</etc/machine-id)/$(uname -r)/initrd
$ sudo shutdown -r now
...
$ journalctl -b | grep rollback
...
Jul 14 19:35:27 hal9000 systemd[1]: Starting zfs-rollback-bootfs.service - Rollback bootfs just before it is mounted...
Jul 14 19:35:28 hal9000 systemd[1]: Finished zfs-rollback-bootfs.service - Rollback bootfs just before it is mounted.
Jul 14 19:35:28 hal9000 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=zfs-rollback-bootfs comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jul 14 19:35:28 hal9000 kernel: audit: type=1130 audit(1689381328.670:24): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=zfs-rollback-bootfs comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jul 14 19:35:29 hal9000 systemd[1]: zfs-rollback-bootfs.service: Deactivated successfully.
Jul 14 19:35:29 hal9000 systemd[1]: Stopped zfs-rollback-bootfs.service - Rollback bootfs just before it is mounted.
Jul 14 19:35:29 hal9000 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=zfs-rollback-bootfs comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'

Also, just from glancing at man dracut.bootup, it doesn't look like this change would cause any problems.

image

The only service I see getting skipped by moving zfs-rollback-bootfs.service before sysroot.mount is systemd-repart.service.

$ systemctl show -p After initrd-root-fs.target
After=dracut-pre-mount.service systemd-repart.service

But that shouldn't cause any problems for this service. I don't remember for sure, but I suspect that dracut-mount.service was chosen as the upper limit for when the service would run simply because that is what zfs-import.target uses.

$ systemctl cat zfs-import.target
# /usr/lib/systemd/system/zfs-import.target
[Unit]
Description=ZFS pool import target
Before=dracut-mount.service

[Install]
WantedBy=zfs.target

I don't know if dracut-mount.service is still needed in the After directive.
Maybe someone else is able to address this?

It would no longer be needed. But it isn't harming anything either.

This PR looks OK to me.

Edit: Note, however, that since this service is ordered after zfs-import.target, this change would effectively force zfs-import.target's Before= directive to also be "sysroot.mount". That may have implications that I'm not aware of. I don't know why dracut-mount.service was chosen for zfs-import.target's Before= directive.

@ahesford
Copy link
Contributor

ahesford commented Jul 17, 2023

The fact that this service exists is terrifying. Nothing so destructive ought to be automated, especially not as a systemd unit with zero error checking.

@gregory-lee-bartholomew
Copy link
Contributor

It's not exactly automatic. It requires the user to enter "bootfs.rollback" on the kernel command line. It is just slightly more convenient than dropping to a Dracut rescue shell to enter the commands; especially if the user has "bootfs.snapshot" normally set on their kernel command line so they will occasionally get an automatic snapshot whenever they update their system and there is a kernel update.

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Jul 20, 2023
@behlendorf behlendorf merged commit e6ea31d into openzfs:master Jul 20, 2023
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Jul 21, 2023
On my machines I observe random failures caused by rollback happening 
after zfs root is mounted. I've observed two types of failures:

- zfs-rollback-bootfs.service fails saying that rollback must be
  done just before mounting the dataset
- boot process fails and rescue console is entered.

After making this modification and testing it for couple of days 
none of those problems have been observed anymore.

I don't know if `dracut-mount.service` is still needed in the 
`After` directive. Maybe someone else is able to address this?

Reviewed-by: Gregory Bartholomew <[email protected]>
Signed-off-by: Wojciech Małota-Wójcik <[email protected]>
Closes openzfs#15025
behlendorf pushed a commit that referenced this pull request Jul 21, 2023
On my machines I observe random failures caused by rollback happening 
after zfs root is mounted. I've observed two types of failures:

- zfs-rollback-bootfs.service fails saying that rollback must be
  done just before mounting the dataset
- boot process fails and rescue console is entered.

After making this modification and testing it for couple of days 
none of those problems have been observed anymore.

I don't know if `dracut-mount.service` is still needed in the 
`After` directive. Maybe someone else is able to address this?

Reviewed-by: Gregory Bartholomew <[email protected]>
Signed-off-by: Wojciech Małota-Wójcik <[email protected]>
Closes #15025
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Dec 12, 2023
On my machines I observe random failures caused by rollback happening 
after zfs root is mounted. I've observed two types of failures:

- zfs-rollback-bootfs.service fails saying that rollback must be
  done just before mounting the dataset
- boot process fails and rescue console is entered.

After making this modification and testing it for couple of days 
none of those problems have been observed anymore.

I don't know if `dracut-mount.service` is still needed in the 
`After` directive. Maybe someone else is able to address this?

Reviewed-by: Gregory Bartholomew <[email protected]>
Signed-off-by: Wojciech Małota-Wójcik <[email protected]>
Closes openzfs#15025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants