umount, snapshot ZFS processes stuck in kernel forever causing high load

I'm observing a situation with ZFS processes stuck, causing the load to grow in the 5 digits. They are stuck in the kernel and therefore not killable. I'm wondering why this could be and if this could be fixed without rebooting the server?

    root      279580  279503  0 Feb04 ?        00:00:00 bash /root/bin/zfs-snapshot z-bod/DUMP hourly 72
    root      279599  279580  0 Feb04 ?        00:00:00 /sbin/zfs destroy -r z-bod/DUMP@hourly.71
    root     3117486 3115126  0 Feb04 ?        00:00:00 umount -t zfs -n /z-main/Share/.zfs/snapshot/weekly.4
    root     3115126       2  0 Feb04 ?        00:00:00 [kworker/u113:4+events_unbound]

zfs-snapshot is a snapshot rotation script. There are tens of thousands of zfs processes like this but only 55 "umount" processes. Other processes like CROND are also accumulating (10k).

Could this be an issue with ZFS? Assuming some of those ZFS processes are causing the others to get stuck, how can they be terminated?

This is ZFS 2.1.0-1, currently running on Fedora 32, kernel 5.11.2.

At first glance, issue #10100 appears to be similar, but in this case it's not causing soft lockup errors. It seems to be somehow related to cifs and/or nfs exports (there are smbd processes from the same day). Now, running `ls`, `lsof` or even bash auto-complete on (some older) snapshots will get stuck as well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

umount, snapshot ZFS processes stuck in kernel forever causing high load #13327

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

umount, snapshot ZFS processes stuck in kernel forever causing high load #13327

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions