Skip to content

umount, snapshot ZFS processes stuck in kernel forever causing high load #13327

Open
@c0xc

Description

@c0xc

I'm observing a situation with ZFS processes stuck, causing the load to grow in the 5 digits. They are stuck in the kernel and therefore not killable. I'm wondering why this could be and if this could be fixed without rebooting the server?

root      279580  279503  0 Feb04 ?        00:00:00 bash /root/bin/zfs-snapshot z-bod/DUMP hourly 72
root      279599  279580  0 Feb04 ?        00:00:00 /sbin/zfs destroy -r z-bod/[email protected]
root     3117486 3115126  0 Feb04 ?        00:00:00 umount -t zfs -n /z-main/Share/.zfs/snapshot/weekly.4
root     3115126       2  0 Feb04 ?        00:00:00 [kworker/u113:4+events_unbound]

zfs-snapshot is a snapshot rotation script. There are tens of thousands of zfs processes like this but only 55 "umount" processes. Other processes like CROND are also accumulating (10k).

Could this be an issue with ZFS? Assuming some of those ZFS processes are causing the others to get stuck, how can they be terminated?

This is ZFS 2.1.0-1, currently running on Fedora 32, kernel 5.11.2.

At first glance, issue #10100 appears to be similar, but in this case it's not causing soft lockup errors. It seems to be somehow related to cifs and/or nfs exports (there are smbd processes from the same day). Now, running ls, lsof or even bash auto-complete on (some older) snapshots will get stuck as well.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions