Description
There appears to be a problem with the following two code paths executed concurrently:
- resulting from the zfs rename on zvol snapshot, taking place in txg_sync_thread()
#0 [ffff8802a6e7ba20] schedule at ffffffff815299d0
#1 [ffff8802a6e7baf8] __mutex_lock_slowpath at ffffffff8152b3a6
#2 [ffff8802a6e7bb68] mutex_lock at ffffffff8152aecb
#3 [ffff8802a6e7bb88] zvol_rename_minors at ffffffffa02fe634 [zfs]
#4 [ffff8802a6e7bbd8] dsl_dir_rename_sync at ffffffffa027c04d [zfs]
#5 [ffff8802a6e7bc48] dsl_sync_task_sync at ffffffffa02850f2 [zfs]
#6 [ffff8802a6e7bc78] dsl_pool_sync at ffffffffa027d31b [zfs]
#7 [ffff8802a6e7bcf8] spa_sync at ffffffffa0293587 [zfs]
#8 [ffff8802a6e7bdc8] txg_sync_thread at ffffffffa02a890b [zfs]
#9 [ffff8802a6e7beb8] thread_generic_wrapper at ffffffffa0169978 [spl]
#10 [ffff8802a6e7bee8] kthread at ffffffff8109e78e
#11 [ffff8802a6e7bf48] kernel_thread at ffffffff8100c28a
- resulting from opening zvol for I/O
#0 [ffff88020e53d9b8] schedule at ffffffff815299d0
Use Barriers in pre-2.6.24 kernels #1 [ffff88020e53da90] cv_wait_common at ffffffffa0171e65 [spl]
Implement zeventd daemon #2 [ffff88020e53db20] __cv_wait at ffffffffa0171f75 [spl]
Use New BIO_RW_FAILFAST_* API #3 [ffff88020e53db30] rrw_enter_read at ffffffffa028df7b [zfs]
Having ZVOL can cause EBUSY for certain operations #4 [ffff88020e53db60] rrw_enter at ffffffffa028e0d0 [zfs]
Verify On Disk Compatibility #5 [ffff88020e53db70] dsl_pool_config_enter at ffffffffa027c24d [zfs]
ZFS Test Suite #6 [ffff88020e53db80] dsl_pool_hold at ffffffffa027c2da [zfs]
Native ZPL Implementation #7 [ffff88020e53dbc0] dmu_objset_own at ffffffffa025e1a0 [zfs]
Fuse Implementation #8 [ffff88020e53dc20] zvol_open at ffffffffa02fd4fb [zfs]
ZVOL Performance #9 [ffff88020e53dc90] __blkdev_get at ffffffff811cbc4e
Enclosure Management Integration #10 [ffff88020e53dcf0] blkdev_get at ffffffff811cbf70
Support Debian/Ubuntu Style Packages #11 [ffff88020e53dd00] blkdev_open at ffffffff811cbff1
Update core ZFS code from OpenSolaris #12 [ffff88020e53dd30] __dentry_open at ffffffff8118b1ba
Split zpios off in to its own project #13 [ffff88020e53dd90] nameidata_to_filp at ffffffff8118b524
Use the Autotest Test Suite #14 [ffff88020e53ddb0] do_filp_open at ffffffff811a12d0
Reduce Stack Usage #15 [ffff88020e53df20] do_sys_open at ffffffff8118af77
SPL tests fails #16 [ffff88020e53df70] sys_open at ffffffff8118b080
UTS_RELEASE checking is depricated #17 [ffff88020e53df80] system_call_fastpath at ffffffff8100b0f2
The first code path takes dsl pool config lock, then tries to take zvol_state_lock, while the second, takes the zvol_state_lock, and then tries to get the dsl pool config lock.
It is interesting that zvol_first_open() called from zvol_open() takes care to obtain spa_namespace_lock, and fails if it cannot, in order to avoid the locking order inversion between spa_namespace_lock and the zvol_state_lock.
Yet the txg_sync_thread does not seem to hold the spa_namespace_lock when calling dsl_pool_sync() and further, before trying to get the zvol_state_lock. That is why I believe this defensive check is not effective in this particular case.