Description
System information
Type | Version/Name |
---|---|
Distribution Name | CentOS |
Distribution Version | 7.7.1908 |
Linux Kernel | 3.10.0-1062.9.1.el7.x86_64 |
Architecture | x86_64 |
ZFS Version | 0.8.3-1 |
SPL Version | 0.8.3-1 |
Describe the problem you're observing
The problem happens on our file server every several days when amanda backups are running. The amanda software is configured to use zfs snapshots as a read-only filesystem for tar. Backups run every night on that server, backing up about 30 zfs filesystems. The umount problems happen on intervals of 3 or more days. Once one umount hangs, many other processes on the system hang. After a number of hours the system gets to a point where a reboot is necessary, however the hung processes prevent a clean shutdown, so a hardware reset is necessary. To be clear about this, umounts are the automatic ones zfs kicks off.
I will be happy to provide more information to see if we can track down and correct this issue.
Describe how to reproduce the problem
Unfortunately, I have not been able to trigger this problem on demand.
Include any warning/errors/backtraces from the system logs
Here is a backtrace from one such event:
backtrace:
:NMI watchdog: BUG: soft lockup - CPU#35 stuck for 22s! [umount:70970]
:Modules linked in: nfsv4 dns_resolver nfs fscache ib_core ipmi_si mpt2sas raid_class scsi_transport_sas mptctl mptbase dell_rbu uas usb_storage rpcsec_gss_krb5 8021q garp mrp stp llc ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 nf_log_ipv4 nf_log_common xt_LOG xt_conntrack nf_conntrack iptable_filter dm_mirror dm_region_hash dm_log osst pcc_cpufreq dell_smbios iTCO_wdt iTCO_vendor_support dell_wmi_descriptor dcdbas vfat fat skx_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel zfs(POE) kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr zunicode(POE) zlua(POE) ipmi_ssif zcommon(POE) znvpair(POE) zavl(POE) icp(POE) spl(OE) dm_round_robin st ch joydev sg lpc_ich i2c_i801 mei_me mei wmi ipmi_devintf ipmi_msghandler
:acpi_power_meter acpi_pad dm_multipath dm_mod nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic qla2xxx(T) mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ixgbe ahci crct10dif_pclmul igb crct10dif_common drm crc32c_intel nvme_fc libahci megaraid_sas nvme_fabrics nvme_core libata mdio scsi_transport_fc ptp pps_core scsi_tgt dca i2c_algo_bit drm_panel_orientation_quirks nfit libnvdimm [last unloaded: ipmi_si]
:CPU: 35 PID: 70970 Comm: umount Kdump: loaded Tainted: P OE ------------ T 3.10.0-1062.9.1.el7.x86_64 #1
:Hardware name: Dell Inc. PowerEdge R740, BIOS 2.2.11 06/13/2019
:task: ffff9381e1d85230 ti: ffff936d89bf0000 task.ti: ffff936d89bf0000
:RIP: 0010:[<ffffffff81983025>] [<ffffffff81983025>] _raw_spin_unlock_irqrestore+0x15/0x20
:RSP: 0018:ffff936d89bf3d50 EFLAGS: 00000246
:RAX: 0000000000000246 RBX: 00000000cf4dd401 RCX: 0000000100400030
:RDX: 0000000000000001 RSI: 0000000000000246 RDI: 0000000000000246
:RBP: ffff936d89bf3d50 R08: ffff936ecf4dd5c0 R09: 0000000100400030
:R10: 00000000cf4dd401 R11: ffff936ecf4dd5c0 R12: ffff935199121100
:R13: ffff936d89bf3d50 R14: ffff935199121100 R15: ffffffffc1029aaf
:FS: 00007f1d2ea93880(0000) GS:ffff9383fd840000(0000) knlGS:0000000000000000
:CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
:CR2: 00007f1d2e651120 CR3: 0000009522a62000 CR4: 00000000007607e0
:DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
:DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
:PKRU: 55555554
:Call Trace:
:[<ffffffffc0b6a1cd>] taskq_wait_outstanding+0x4d/0xf0 [spl]
:[<ffffffffc101c8c9>] zfsvfs_teardown+0x59/0x2d0 [zfs]
:[<ffffffffc101cb79>] zfs_umount+0x39/0x120 [zfs]
:[<ffffffffc1047b9c>] zpl_put_super+0x2c/0x40 [zfs]
:[<ffffffff8144d29d>] generic_shutdown_super+0x6d/0x100
:[<ffffffff8144d6a2>] kill_anon_super+0x12/0x20
:[<ffffffffc1047cfa>] zpl_kill_sb+0x1a/0x20 [zfs]
:[<ffffffff8144da7e>] deactivate_locked_super+0x4e/0x70
:[<ffffffff8144e206>] deactivate_super+0x46/0x60
:[<ffffffff8146cd7f>] cleanup_mnt+0x3f/0x80
:[<ffffffff8146ce12>] __cleanup_mnt+0x12/0x20
:[<ffffffff812c2d2b>] task_work_run+0xbb/0xe0
:[<ffffffff8122cc65>] do_notify_resume+0xa5/0xc0
:[<ffffffff8198e23b>] int_signal+0x12/0x17
:Code: 07 00 0f 1f 40 00 5d c3 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 c6 07 00 0f 1f 40 00 48 89 f7 57 9d <0f> 1f 44 00 00 5d c3 0f 1f 40 00 0f 1f 44 00 00 55 48 89 e5 48