Skip to content

zfs-2.0.5 patchset #12182

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 92 commits into from

Conversation

tonyhutter
Copy link
Contributor

Motivation and Context

Proposed patchset for zfs-2.0.5. It builds on the 5.12 kernel.

Description

This is the zfs-2.0.5-staging branch with a few minor fixes to the commit messages.

How Has This Been Tested?

Buildbot will test

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

Ryan Moeller and others added 30 commits May 19, 2021 20:00
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes openzfs#11636
Calling vdev_free() only requires the we acquire the spa config
SCL_STATE_ALL locks, not the SCL_ALL locks.  In particular, we need
need to avoid taking the SCL_CONFIG lock (included in SCL_ALL) as a
writer since this can lead to a deadlock.  The txg_sync_thread() may
block in spa_txg_history_init_io() when taking the SCL_CONFIG lock
as a reading when it detects there's a pending writer.

Reviewed-by: Igor Kozhukhov <[email protected]>
Reviewed-by: Mark Maybee <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#11585
Several of the TRIM tests were based of the initialize tests and
then adapted for TRIM.  The zpool_trim_start_and_cancel_pos.ksh
test was intended to be one such test but it was overlooked and
actually never adapted.  Update it accordingly.

Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#11649
When a device which is actively trimming or initializing becomes
FAULTED, and therefore no longer writable, cancel the active
TRIM or initialization.  When the device is merely taken offline
with `zpool offline` then stop the operation but do not cancel it.
When the device is brought back online the operation will be
resumed if possible.

Reviewed-by: Brian Behlendorf <[email protected]>
Co-authored-by: Brian Behlendorf <[email protected]>
Co-authored-by: Vipin Kumar Verma <[email protected]>
Signed-off-by: Srikanth N S <[email protected]>
Closes openzfs#11588
* Restore original kern.corefile value after the test.
* Don't leave behind a frozen pool.
* Clean up leftover vdev files.
* Make zpool_002_pos and zpool_003_pos consistent in their handling of
core files while here.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes openzfs#11694
zil_replaying(zil, tx) has the side-effect of informing the ZIL that an
entry has been replayed in the (still open) tx.  The ZIL uses that
information to record the replay progress in the ZIL header when that
tx's txg syncs.

ZPL log entries are not idempotent and logically dependent and thus
calling zil_replaying() is necessary for correctness.

For ZVOLs the question of correctness is more nuanced: ZVOL logs only
TX_WRITE and TX_TRUNCATE, both of which are idempotent. Logical
dependencies between two records exist only if the write or discard
request had sync semantics or if the ranges affected by the records
overlap.

Thus, at a first glance, it would be correct to restart replay from
the beginning if we crash before replay completes. But this does not
address the following scenario:
Assume one log record per LWB.
The chain on disk is

    HDR -> 1:W(1, "A") -> 2:W(1, "B") -> 3:W(2, "X") -> 4:W(3, "Z")

where N:W(O, C) represents log entry number N which is a TX_WRITE of C
to offset A.
We replay 1, 2 and 3 in one txg, sync that txg, then crash.
Bit flips corrupt 2, 3, and 4.
We come up again and restart replay from the beginning because
we did not call zil_replaying() during replay.
We replay 1 again, then interpret 2's invalid checksum as the end
of the ZIL chain and call replay done.
The replayed zvol content is "AX".

If we had called zil_replaying() the HDR would have pointed to 3
and our resumed replay would not have replayed anything because
3 was corrupted, resulting in zvol content "BX".

If 3 logically depends on 2 then the replay corrupted the ZVOL_OBJ's
contents.

This patch adds the zil_replaying() calls to the replay functions.
Since the callbacks in the replay function need the zilog_t* pointer
so that they can call zil_replaying() we open the ZIL while
replaying in zvol_create_minor(). We also verify that replay has
been done when on-demand-opening the ZIL on the first modifying
bio.

Reviewed-by: Matthew Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Christian Schwarz <[email protected]>
Closes openzfs#11667
events_002 exercises the ZED, ensuring that it neither misses events,
nor reporting events twice.

On slow test hardware, some of the timeouts are insufficient to allow
the ZED to properly settle.  Conversely, on fast hardware these same
timeouts are too long, unnecessarily slowing the test run.

Instead of using a fixed timeout, wait for the expected final event
before returning.  Additionally, wait with a timeout for unexpected
events to avoid missing them if they show up late.

Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Antonio Russo <[email protected]>
Closes openzfs#11703
Our checkstyle doesn't work well on Ubuntu 20.04,
temporary pin it to 18.04.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: George Melikov <[email protected]>
Closes openzfs#11713
Some .h files that were added were missed in this Makefile. Since 
they are .h files, their being missing only resulted in them 
disappeared from the dist archive.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Signed-off-by: Allan Jude <[email protected]>
Closes openzfs#11705
Add branch hints and constify the intermediate evaluations of 
left/right params in VERIFY3*().

Reviewed-by: Ryan Moeller <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Adam Moss <[email protected]>
Closes openzfs#11708
A few deadman tunables ended up in the wrong sysctl node.

Move them to vfs.zfs.deadman.*

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes openzfs#11715
The manual page change in ecc277c has introduced whitespace on
line ends.

Reviewed-by: Ryan Moeller <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Martin Matuska <[email protected]>
Closes openzfs#11722
Importing a pool using the cachefile is ideal to reduce the time
required to import a pool. However, if the devices associated with
a pool in the cachefile have changed, then the import would fail.
This can easily be corrected by doing a normal import which would
then read the pool configuration from the labels.

The goal of this change is make importing using a cachefile more
resilient and auto-correcting. This is accomplished by having
the cachefile import logic automatically fallback to reading the
labels of the devices similar to a normal import. The main difference
between the fallback logic and a normal import is that the cachefile
import logic will only look at the device directories that were
originally used when the cachefile was populated. Additionally,
the fallback logic will always import by guid to ensure that only
the pools in the cachefile would be imported.

External-issue: DLPX-71980
Reviewed-by: Matthew Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: George Wilson <[email protected]>
Closes openzfs#11716
They are expected to fail only in corner cases.

Reviewed-by: Ryan Moeller <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Matt Macy <[email protected]>
Signed-off-by: Mateusz Guzik <[email protected]>
Closes openzfs#11153
1. even up ifdefs
2. drop the arguably useless teardown lock asserts -- nothing else
   checks for it

Reviewed-by: Ryan Moeller <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Matt Macy <[email protected]>
Signed-off-by: Mateusz Guzik <[email protected]>
Closes openzfs#11153
They are not very useful and hard to implement in the rms routine
the code is about to start using.

Reviewed-by: Ryan Moeller <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Matt Macy <[email protected]>
Signed-off-by: Mateusz Guzik <[email protected]>
Closes openzfs#11153
Reviewed-by: Ryan Moeller <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Matt Macy <[email protected]>
Signed-off-by: Mateusz Guzik <[email protected]>
Closes openzfs#11153
This will allow platforms to implement it as they see fit, in particular
in a different manner than rrm locks.

Reviewed-by: Ryan Moeller <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Matt Macy <[email protected]>
Signed-off-by: Mateusz Guzik <[email protected]>
Closes openzfs#11153
This deserializes otherwise non-contending operations.

The previous scheme of using 17 locks hashed by curthread runs into
conflicts very quickly. Check the pull request for sample results.

Reviewed-by: Ryan Moeller <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Matt Macy <[email protected]>
Signed-off-by: Mateusz Guzik <[email protected]>
Closes openzfs#11153
Resolve some oddities in zfsdev_close() which could result in a
panic and were not present in the equivalent function for Linux.

- Remove unused definition ZFS_MIN_MINOR
- FreeBSD: Simplify zfsdev state destruction
- Assert zs_minor is valid in zfsdev_close
- Make locking around zfsdev state match Linux

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes openzfs#11720
Add parsing of the rewind options.

When I was upstreaming the change [1], I omitted the part where we
detect that the pool should be rewind. When the FreeBSD repo has
synced with the OpenZFS, this part of the code was removed.

[1] FreeBSD repo: 277f38abffc6a8160b5044128b5b2c620fbb970c
[2] OpenZFS repo: f2c027b

External-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=254152
Originally reviewed by: tsoome, allanjude
Originally reviewed by: kevans (ok from high-level overview)
Reviewed-by: Ryan Moeller <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Mariusz Zaborski <[email protected]>
Closes openzfs#11730
The current user_run often does not work as expected.  Commands are run
in a different shell, with a different environment, and all output is
discarded.

Simplify user_run to retain the current environment, eliminate eval,
and feed the command string into ksh.  Enhance the logging for
user_run so we can see out and err.

Reviewed-by: John Kennedy <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes openzfs#11185
You can't use user_run to eval ksh functions defined in libtest unless
you include libtest in the user shell.

Fix xattr_003_neg by:
* include libtest in the user shell
* *then* run get_xattr
* assert this fails
* use variables for filenames so they don't change in the user's shell
* don't log the contents of /etc/passwd
* cleanup all byproducts

Reviewed-by: John Kennedy <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes openzfs#11185
Create a new section of tests to run with acltype=off.

For now the only test we have is for the DOS mode READONLY attribute on
FreeBSD.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes openzfs#11734
The man page was missing these two permissions.
Add the missing permissions to the man page. 

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Jeremy Faulkner <[email protected]>
Closes openzfs#11727
Don't handle (incorrectly) kmem_zalloc() failure.  With KM_SLEEP,
will never return NULL.

Free the data allocated for non-virtual kstats when deleting the object.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes openzfs#11767
Avoids tripping on asserts when doing pool recovery.

Reviewed-by: Ryan Moeller <[email protected]>
Signed-off-by: Mateusz Guzik <[email protected]>
Closes openzfs#11739
The FreeBSD boot loader relies on the bootfs property and is capable
of booting from removed (indirect) vdevs.

Reviewed-by Eric van Gyzen
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Martin Matuska <[email protected]>
Closes openzfs#11763
This change adds a new test that covers a bug fix in the binary search
in the redacted send resume logic that causes a kernel panic.
The bug was fixed in openzfs#11297.

Reviewed-by: Brian Behlendorf <[email protected]>
Co-authored-by: John Kennedy <[email protected]>
Signed-off-by: Palash Gandhi <[email protected]>
Closes openzfs#11764
Commit 235a856 introduced a regression in evaluation of POSIX modes
that require group DENY entries in the internal ZFS ACL. An example
of such a POSX mode is 007. When write_implies_delete_child is set,
then ACE_WRITE_DATA is added to `wanted_dirperms` in prior to calling
zfs_zaccess_common(). This occurs is zfs_zaccess_delete().

Unfortunately, when zfs_zaccess_aces_check hits this particular DENY
ACE, zfs_groupmember() is checked to determine whether access should be
denied, and since zfs_groupmember() always returns B_TRUE on Linux and
so this check is failed, resulting ultimately in EPERM being returned.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Signed-off-by: Andrew Walker <[email protected]>
Closes openzfs#11760
@aerusso
Copy link
Contributor

aerusso commented Jun 10, 2021

I think the ZTS failure on Ubuntu 20.04 should be fixed by b8e6401 .

@behlendorf
Copy link
Contributor

And the kernel.org build failure can fixed with 272b178.

gmelikov and others added 2 commits June 10, 2021 12:25
If there is no scsi_debug module, then this test
must be skipped, in this case cleanup routine should
be prepared for absent pool.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: George Melikov <[email protected]>
Closes openzfs#11534
Renamed _fini too for symmetry.

Suggested-by: @ensch
Reviewed-by: Tony Nguyen <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rich Ercolani <[email protected]>
Closes openzfs#12059
Closes: openzfs#11987
Closes: openzfs#12056
@tonyhutter
Copy link
Contributor Author

@aerusso @behlendorf thanks, I just pulled those in

@tonyhutter
Copy link
Contributor Author

I'm currently looking into the make lint errors in checkstyle...

@ghost
Copy link

ghost commented Jun 11, 2021

Looks like FreeBSD main and stable/13 also need 93f81eb

@tonyhutter
Copy link
Contributor Author

@freqlabs thanks, I included it in my latest push.

@IvanVolosyuk
Copy link
Contributor

Can we also add the panic fix from to the next stable release:
#12210
afa7b34

@ghost
Copy link

ghost commented Jun 13, 2021

And 8dddb25. 🤦

@tonyhutter
Copy link
Contributor Author

@IvanVolosyuk @freqlabs I included those commits in my latest push

behlendorf and others added 3 commits June 16, 2021 12:26
In order for cppcheck to perform a proper analysis it needs to be
aware of how the sources are compiled (source files, include
paths/files, extra defines, etc).  All the needed information is
available from the Makefiles and can be leveraged with a generic
cppcheck Makefile target.  So let's add one.

Additional minor changes:

* Removing the cppcheck-suppressions.txt file.  With cppcheck 2.3
  and these changes it appears to no longer be needed.  Some inline
  suppressions were also removed since they appear not to be
  needed.  We can add them back if it turns out they're needed
  for older versions of cppcheck.

* Added the ax_count_cpus m4 macro to detect at configure time how
  many processors are available in order to run multiple cppcheck
  jobs.  This value is also now used as a replacement for nproc
  when executing the kernel interface checks.

* "PHONY =" line moved in to the Rules.am file which is included
  at the top of all Makefile.am's.  This is just convenient becase
  it allows us to use the += syntax to add phony targets.

* One upside of this integration worth mentioning is it now allows
  `make cppcheck` to be run in any directory to check that subtree.

* For the moment, cppcheck is not run against the FreeBSD specific
  kernel sources.  The cppcheck-FreeBSD target will need to be
  implemented and testing on FreeBSD to support this.

Reviewed-by: Ryan Moeller <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#11508
Fixes issues when zfs is used along with other filesystems.

External-issue: https://cgit.freebsd.org/src/commit/?id=e9272225e6bed840b00eef1c817b188c172338ee
Reviewed-by: Ryan Moeller <[email protected]>
Signed-off-by: Mateusz Guzik <[email protected]>
Closes openzfs#11881
VFS_QUOTACTL(9) has been updated to allow each filesystem to indicate
whether it has changed the busy state of the mount.  The filesystem
may still assume that its .vfs_quotactl entrypoint is always called
with the mount busied, but only needs to unbusy the mount (and clear
*mp_busy) if it does something that actually requires the mount to be
unbusied.  It no longer needs to blindly copy-paste the UFS protocol
for calling vfs_unbusy(9) for the Q_QUOTAOFF and Q_QUOTAON commands.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Signed-off-by: Jason Harmening <[email protected]>
Closes openzfs#12052
@kernelOfTruth
Copy link
Contributor

9ffcaa3 Avoid deadlock when removing L2ARC devices under I/O

as well, please 🙏

PaulZ-98 and others added 3 commits June 17, 2021 12:44
In zfs_znode_alloc we always hash inodes.  If the
znode is unlinked, we do not need to hash it.  This
fixes the problem where zfs_suspend_fs is doing zrele
(iput) in an async fashion, and zfs_resume_fs unlinked
drain processing will try to hash an inode that could
still be hashed, resulting in a panic.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alan Somers <[email protected]>
Signed-off-by: Paul Zuchowski <[email protected]>
Closes openzfs#9741
Closes openzfs#11223
Closes openzfs#11648
Closes openzfs#12210
In case we have I/O and try to remove an L2ARC device a deadlock might
occur. arc_read()->zio_read()->zfs_blkptr_verify() waits for SCL_VDEV
to be dropped while holding the hash_lock. However, spa_l2cache_load()
holds SCL_ALL and waits for the hash_lock in l2arc_evict().

Fix this by moving zfs_blkptr_verify() to the top top arc_read() before
the hash_lock is taken. Verify the block pointer and return a checksum
error if damaged rather than halting the system, by using
BLK_VERIFY_LOG instead of BLK_VERIFY_HALT.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Mark Maybee <[email protected]>
Signed-off-by: George Amanakis <[email protected]>
Closes openzfs#12054
META file and changelog updated.

TEST_ZIMPORT_SKIP="yes"

Signed-off-by: Tony Hutter <[email protected]>
@awehrfritz
Copy link
Contributor

@tonyhutter any update on when we can expect the 2.0.5 release to happen? The lack of a stable ZFS release with support for Linux 5.12 forces all Fedora, Arch, Gentoo and OpenSUSE Tumbleweed users to run an unsupported/end-of-life kernel for weeks now.

@tonyhutter
Copy link
Contributor Author

@awehrfritz hopefully this week. I think we have all the patches we want in the branch, but there are some test failures I need to look into.

@awehrfritz
Copy link
Contributor

@awehrfritz hopefully this week.

Great! Looking forward to it.

@tonyhutter
Copy link
Contributor Author

zfs-2.0.5 released yesterday:

https://github.com/openzfs/zfs/releases/tag/zfs-2.0.5

@tonyhutter tonyhutter closed this Jun 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.