Skip to content

Commit ab5036d

Browse files
kusumibehlendorf
authored andcommitted
Fix race in parallel mount's thread dispatching algorithm
Strategy of parallel mount is as follows. 1) Initial thread dispatching is to select sets of mount points that don't have dependencies on other sets, hence threads can/should run lock-less and shouldn't race with other threads for other sets. Each thread dispatched corresponds to top level directory which may or may not have datasets to be mounted on sub directories. 2) Subsequent recursive thread dispatching for each thread from 1) is to mount datasets for each set of mount points. The mount points within each set have dependencies (i.e. child directories), so child directories are processed only after parent directory completes. The problem is that the initial thread dispatching in zfs_foreach_mountpoint() can be multi-threaded when it needs to be single-threaded, and this puts threads under race condition. This race appeared as mount/unmount issues on ZoL for ZoL having different timing regarding mount(2) execution due to fork(2)/exec(2) of mount(8). `zfs unmount -a` which expects proper mount order can't unmount if the mounts were reordered by the race condition. There are currently two known patterns of input list `handles` in `zfs_foreach_mountpoint(..,handles,..)` which cause the race condition. 1) #8833 case where input is `/a /a /a/b` after sorting. The problem is that libzfs_path_contains() can't correctly handle an input list with two same top level directories. There is a race between two POSIX threads A and B, * ThreadA for "/a" for test1 and "/a/b" * ThreadB for "/a" for test0/a and in case of #8833, ThreadA won the race. Two threads were created because "/a" wasn't considered as `"/a" contains "/a"`. 2) #8450 case where input is `/ /var/data /var/data/test` after sorting. The problem is that libzfs_path_contains() can't correctly handle an input list containing "/". There is a race between two POSIX threads A and B, * ThreadA for "/" and "/var/data/test" * ThreadB for "/var/data" and in case of #8450, ThreadA won the race. Two threads were created because "/var/data" wasn't considered as `"/" contains "/var/data"`. In other words, if there is (at least one) "/" in the input list, the initial thread dispatching must be single-threaded since every directory is a child of "/", meaning they all directly or indirectly depend on "/". In both cases, the first non_descendant_idx() call fails to correctly determine "path1-contains-path2", and as a result the initial thread dispatching creates another thread when it needs to be single-threaded. Fix a conditional in libzfs_path_contains() to consider above two. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: Sebastien Roy <[email protected]> Signed-off-by: Tomohiro Kusumi <[email protected]> Closes #8450 Closes #8833 Closes #8878
1 parent a54f92b commit ab5036d

File tree

4 files changed

+123
-3
lines changed

4 files changed

+123
-3
lines changed

lib/libzfs/libzfs_mount.c

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1336,12 +1336,14 @@ mountpoint_cmp(const void *arga, const void *argb)
13361336
}
13371337

13381338
/*
1339-
* Return true if path2 is a child of path1.
1339+
* Return true if path2 is a child of path1 or path2 equals path1 or
1340+
* path1 is "/" (path2 is always a child of "/").
13401341
*/
13411342
static boolean_t
13421343
libzfs_path_contains(const char *path1, const char *path2)
13431344
{
1344-
return (strstr(path2, path1) == path2 && path2[strlen(path1)] == '/');
1345+
return (strcmp(path1, path2) == 0 || strcmp(path1, "/") == 0 ||
1346+
(strstr(path2, path1) == path2 && path2[strlen(path1)] == '/'));
13451347
}
13461348

13471349
/*

tests/runfiles/linux.run

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -182,7 +182,8 @@ tests = ['zfs_mount_001_pos', 'zfs_mount_002_pos', 'zfs_mount_003_pos',
182182
'zfs_mount_007_pos', 'zfs_mount_008_pos', 'zfs_mount_009_neg',
183183
'zfs_mount_010_neg', 'zfs_mount_011_neg', 'zfs_mount_012_neg',
184184
'zfs_mount_all_001_pos', 'zfs_mount_encrypted', 'zfs_mount_remount',
185-
'zfs_multi_mount', 'zfs_mount_all_fail', 'zfs_mount_all_mountpoints']
185+
'zfs_multi_mount', 'zfs_mount_all_fail', 'zfs_mount_all_mountpoints',
186+
'zfs_mount_test_race']
186187
tags = ['functional', 'cli_root', 'zfs_mount']
187188

188189
[tests/functional/cli_root/zfs_program]

tests/zfs-tests/tests/functional/cli_root/zfs_mount/Makefile.am

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ dist_pkgdata_SCRIPTS = \
1919
zfs_mount_all_mountpoints.ksh \
2020
zfs_mount_encrypted.ksh \
2121
zfs_mount_remount.ksh \
22+
zfs_mount_test_race.sh \
2223
zfs_multi_mount.ksh
2324

2425
dist_pkgdata_DATA = \
Lines changed: 116 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,116 @@
1+
#!/bin/ksh
2+
3+
#
4+
# This file and its contents are supplied under the terms of the
5+
# Common Development and Distribution License ("CDDL"), version 1.0.
6+
# You may only use this file in accordance with the terms of version
7+
# 1.0 of the CDDL.
8+
#
9+
# A full copy of the text of the CDDL should have accompanied this
10+
# source. A copy of the CDDL is also available via the Internet at
11+
# http://www.illumos.org/license/CDDL.
12+
#
13+
14+
#
15+
# Copyright (c) 2019 by Tomohiro Kusumi. All rights reserved.
16+
#
17+
18+
. $STF_SUITE/include/libtest.shlib
19+
. $STF_SUITE/tests/functional/cli_root/zfs_mount/zfs_mount.cfg
20+
21+
#
22+
# DESCRIPTION:
23+
# Verify parallel mount ordering is consistent.
24+
#
25+
# There was a bug in initial thread dispatching algorithm which put threads
26+
# under race condition which resulted in undefined mount order. The purpose
27+
# of this test is to verify `zfs unmount -a` succeeds (not `zfs mount -a`
28+
# succeeds, it always does) after `zfs mount -a`, which could fail if threads
29+
# race. See github.com/zfsonlinux/zfs/issues/{8450,8833,8878} for details.
30+
#
31+
# STRATEGY:
32+
# 1. Create pools and filesystems.
33+
# 2. Set same mount point for >1 datasets.
34+
# 3. Unmount all datasets.
35+
# 4. Mount all datasets.
36+
# 5. Unmount all datasets (verify this succeeds).
37+
#
38+
39+
verify_runnable "both"
40+
41+
TMPDIR=${TMPDIR:-$TEST_BASE_DIR}
42+
MNTPT=$TMPDIR/zfs_mount_test_race_mntpt
43+
DISK1="$TMPDIR/zfs_mount_test_race_disk1"
44+
DISK2="$TMPDIR/zfs_mount_test_race_disk2"
45+
46+
TESTPOOL1=zfs_mount_test_race_tp1
47+
TESTPOOL2=zfs_mount_test_race_tp2
48+
49+
export __ZFS_POOL_RESTRICT="$TESTPOOL1 $TESTPOOL2"
50+
log_must zfs $unmountall
51+
unset __ZFS_POOL_RESTRICT
52+
53+
function cleanup
54+
{
55+
zpool destroy $TESTPOOL1
56+
zpool destroy $TESTPOOL2
57+
rm -rf $MNTPT
58+
rm -rf /$TESTPOOL1
59+
rm -rf /$TESTPOOL2
60+
rm -f $DISK1
61+
rm -f $DISK2
62+
export __ZFS_POOL_RESTRICT="$TESTPOOL1 $TESTPOOL2"
63+
log_must zfs $mountall
64+
unset __ZFS_POOL_RESTRICT
65+
}
66+
log_onexit cleanup
67+
68+
log_note "Verify parallel mount ordering is consistent"
69+
70+
log_must truncate -s $MINVDEVSIZE $DISK1
71+
log_must truncate -s $MINVDEVSIZE $DISK2
72+
73+
log_must zpool create -f $TESTPOOL1 $DISK1
74+
log_must zpool create -f $TESTPOOL2 $DISK2
75+
76+
log_must zfs create $TESTPOOL1/$TESTFS1
77+
log_must zfs create $TESTPOOL2/$TESTFS2
78+
79+
log_must zfs set mountpoint=none $TESTPOOL1
80+
log_must zfs set mountpoint=$MNTPT $TESTPOOL1/$TESTFS1
81+
82+
# Note that unmount can fail (due to race condition on `zfs mount -a`) with or
83+
# without `canmount=off`. The race has nothing to do with canmount property,
84+
# but turn it off for convenience of mount layout used in this test case.
85+
log_must zfs set canmount=off $TESTPOOL2
86+
log_must zfs set mountpoint=$MNTPT $TESTPOOL2
87+
88+
# At this point, layout of datasets in two pools will look like below.
89+
# Previously, on next `zfs mount -a`, pthreads assigned to TESTFS1 and TESTFS2
90+
# could race, and TESTFS2 usually (actually always) won in ZoL. Note that the
91+
# problem is how two or more threads could initially be assigned to the same
92+
# top level directory, not this specific layout. This layout is just an example
93+
# that can reproduce race, and is also the layout reported in #8833.
94+
#
95+
# NAME MOUNTED MOUNTPOINT
96+
# ----------------------------------------------
97+
# /$TESTPOOL1 no none
98+
# /$TESTPOOL1/$TESTFS1 yes $MNTPT
99+
# /$TESTPOOL2 no $MNTPT
100+
# /$TESTPOOL2/$TESTFS2 yes $MNTPT/$TESTFS2
101+
102+
# Apparently two datasets must be mounted.
103+
log_must ismounted $TESTPOOL1/$TESTFS1
104+
log_must ismounted $TESTPOOL2/$TESTFS2
105+
# This unmount always succeeds, because potential race hasn't happened yet.
106+
log_must zfs unmount -a
107+
# This mount always succeeds, whether threads are under race condition or not.
108+
log_must zfs mount -a
109+
110+
# Verify datasets are mounted (TESTFS2 fails if the race broke mount order).
111+
log_must ismounted $TESTPOOL1/$TESTFS1
112+
log_must ismounted $TESTPOOL2/$TESTFS2
113+
# Verify unmount succeeds (fails if the race broke mount order).
114+
log_must zfs unmount -a
115+
116+
log_pass "Verify parallel mount ordering is consistent passed"

0 commit comments

Comments
 (0)