Skip to content

Linux 6.9: Call add_disk() from workqueue to fix zfs_allow_010_pos #16282

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 28, 2024

Conversation

tonyhutter
Copy link
Contributor

@tonyhutter tonyhutter commented Jun 18, 2024

Motivation and Context

Fixes: #16089

Description

The 6.9 kernel behaves differently in how it releases block devices. In the common case it will async release the device only after the return to userspace. This is different from the 6.8 and older kernels which release the block devices synchronously. To get around this, call add_disk() from a workqueue so that the kernel uses a different codepath to release our zvols in the way we expect. This stops zfs_allow_010_pos from hanging.

How Has This Been Tested?

Ran zfs_allow_010_pos with this PR and it passed.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@robn
Copy link
Member

robn commented Jun 18, 2024

Oh wow, great work hunting that down. I don't feel so bad now! 😆

I tried compiling on older kernels. Builds ok on 6.1, failed on 5.10: __flush_workqueue doesn't exist there. But, I wonder if this is better gated to the 6.9 APIs anyway, since the old code works great there and deciphering workqueue behaviour back into medieval times is not very much fun.

Regardless, I will be delighted to stick a +1 on this. Cheers :)

@tonyhutter
Copy link
Contributor Author

tonyhutter commented Jun 18, 2024

Builds ok on 6.1, failed on 5.10: __flush_workqueue doesn't exist there. But, I wonder if this is better gated to the 6.9 APIs anyway,

Yea, I was hoping to use a single codepath, but I had a hunch that might be the case. I'll break it out so that HAVE_BDEV_FILE_OPEN_BY_PATH (6.9 kernel) codepaths call the workqueue stuff, and the older kernels call __zvol_os_add_disk() directly.

The 6.9 kernel behaves differently in how it releases block devices.  In
the common case it will async release the device only after the return
to userspace.  This is different from the 6.8 and older kernels which
release the block devices synchronously.  To get around this, call
add_disk() from a workqueue so that the kernel uses a different
codepath to release our zvols in the way we expect.  This stops
zfs_allow_010_pos from hanging.

Fixes: openzfs#16089
Signed-off-by: Tony Hutter <[email protected]>
@robn
Copy link
Member

robn commented Jun 24, 2024

Compile checked against:

  • 6.9.1
  • 6.6.31
  • 6.1.94
  • 5.10.217
  • 4.9.337

Seems good to me, nice work!

@tonyhutter tonyhutter merged commit 49f3ce3 into openzfs:master Jun 28, 2024
21 of 25 checks passed
@robn
Copy link
Member

robn commented Jun 30, 2024

@tonyhutter again, great work on this one!

calccrypto pushed a commit to hpc/zfs that referenced this pull request Jul 3, 2024
…penzfs#16282)

The 6.9 kernel behaves differently in how it releases block devices.  In
the common case it will async release the device only after the return
to userspace.  This is different from the 6.8 and older kernels which
release the block devices synchronously.  To get around this, call
add_disk() from a workqueue so that the kernel uses a different
codepath to release our zvols in the way we expect.  This stops
zfs_allow_010_pos from hanging.

Fixes: openzfs#16089

Signed-off-by: Tony Hutter <[email protected]>
Reviewed-by: Tino Reichardt <[email protected]>
Reviewed-by: Rob Norris <[email protected]>
robn pushed a commit to robn/zfs that referenced this pull request Jul 17, 2024
…penzfs#16282)

The 6.9 kernel behaves differently in how it releases block devices.  In
the common case it will async release the device only after the return
to userspace.  This is different from the 6.8 and older kernels which
release the block devices synchronously.  To get around this, call
add_disk() from a workqueue so that the kernel uses a different
codepath to release our zvols in the way we expect.  This stops
zfs_allow_010_pos from hanging.

Fixes: openzfs#16089

Signed-off-by: Tony Hutter <[email protected]>
Reviewed-by: Tino Reichardt <[email protected]>
Reviewed-by: Rob Norris <[email protected]>
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Sep 4, 2024
…penzfs#16282)

The 6.9 kernel behaves differently in how it releases block devices.  In
the common case it will async release the device only after the return
to userspace.  This is different from the 6.8 and older kernels which
release the block devices synchronously.  To get around this, call
add_disk() from a workqueue so that the kernel uses a different
codepath to release our zvols in the way we expect.  This stops
zfs_allow_010_pos from hanging.

Fixes: openzfs#16089

Signed-off-by: Tony Hutter <[email protected]>
Reviewed-by: Tino Reichardt <[email protected]>
Reviewed-by: Rob Norris <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Linux 6.9: delegate/zfs_allow_010_pos hangs in zfs create -V
3 participants