Skip to content

Commit 65d5608

Browse files
committed
Fix zpool on zvol lock inversion deadlock
In all but one case the spa_namespace_lock is taken before the bdev->bd_mutex lock. But Linux __blkdev_get() function calls fops->open() with the bdev->bd_mutex lock held and we must somehow still safely acquire the spa_namespace_lock. To avoid a potential lock inversion deadlock we preemptively try to take the spa_namespace_lock(). Normally it will not be contended and this is safe because spa_open_common() handles the case where the caller already holds the spa_namespace_lock. When it is contended we risk a lock inversion if we were to block waiting for the lock. Luckily, the __blkdev_get() function allows us to return -ERESTARTSYS which will result in bdev->bd_mutex being dropped, reacquired, and fops->open() being called again. This process can be repeated safely until both locks are acquired. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Jorgen Lundman <[email protected]> Closes #612
1 parent d5446cf commit 65d5608

File tree

1 file changed

+28
-0
lines changed

1 file changed

+28
-0
lines changed

module/zfs/zvol.c

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -891,11 +891,39 @@ zvol_first_open(zvol_state_t *zv)
891891
{
892892
objset_t *os;
893893
uint64_t volsize;
894+
int locked = 0;
894895
int error;
895896
uint64_t ro;
896897

898+
/*
899+
* In all other cases the spa_namespace_lock is taken before the
900+
* bdev->bd_mutex lock. But in this case the Linux __blkdev_get()
901+
* function calls fops->open() with the bdev->bd_mutex lock held.
902+
*
903+
* To avoid a potential lock inversion deadlock we preemptively
904+
* try to take the spa_namespace_lock(). Normally it will not
905+
* be contended and this is safe because spa_open_common() handles
906+
* the case where the caller already holds the spa_namespace_lock.
907+
*
908+
* When it is contended we risk a lock inversion if we were to
909+
* block waiting for the lock. Luckily, the __blkdev_get()
910+
* function allows us to return -ERESTARTSYS which will result in
911+
* bdev->bd_mutex being dropped, reacquired, and fops->open() being
912+
* called again. This process can be repeated safely until both
913+
* locks are acquired.
914+
*/
915+
if (!mutex_owned(&spa_namespace_lock)) {
916+
locked = mutex_tryenter(&spa_namespace_lock);
917+
if (!locked)
918+
return (-ERESTARTSYS);
919+
}
920+
897921
/* lie and say we're read-only */
898922
error = dmu_objset_own(zv->zv_name, DMU_OST_ZVOL, 1, zvol_tag, &os);
923+
924+
if (locked)
925+
mutex_exit(&spa_namespace_lock);
926+
899927
if (error)
900928
return (-error);
901929

0 commit comments

Comments
 (0)