Skip to content

Commit 93c8e91

Browse files
authored
Fix dRAID self-healing short columns
When dRAID performs a normal read operation only the data columns in the raid map are read from disk. This is enough information to calculate the checksum, verify it, and return the needed data to the application. It's only in the event of a checksum failure that the additional parity and any empty columns must be read since they are required for parity reconstruction. Reading these additional columns is handled by vdev_raidz_read_all() which calls vdev_draid_map_alloc_empty() to expand the raid_map_t and submit IOs for the missing columns. This all works correctly, but it fails to account for any "short" columns. These are data columns which are padded with a empty skip sector at the end. Since that empty sector is not needed for a normal read it's not read when columns is first read from disk. However, like the parity and empty columns the skip sector is needed to perform reconstruction. The fix is to mark any "short" columns as never being read by clearing the rc_tried flag when expanding the raid_map_t. This will cause the entire column to re-read from disk in the event of a checksum failure allowing the self-healing functionality to repair the block. Note that this only effects the self-healing feature because when scrubbing a pool the parity, data, and empty columns are all read initially to verify their contents. Furthermore, only blocks which contain "short" columns would be effected, and only when the memory backing the skip sector wasn't already zeroed out. This change extends the existing redundancy_raidz.ksh test case to verify self-healing (as well as resilver and scrub). Then applies the same test case to dRAID with a slightly modified version of the test script called redundancy_draid.ksh. The unused variable combrec was also removed from both test cases. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Mark Maybee <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #12010
1 parent a27ab6d commit 93c8e91

File tree

5 files changed

+309
-4
lines changed

5 files changed

+309
-4
lines changed

module/zfs/vdev_draid.c

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -812,7 +812,12 @@ vdev_draid_map_alloc_empty(zio_t *zio, raidz_row_t *rr)
812812
/* this is a "big column", nothing to add */
813813
ASSERT3P(rc->rc_abd, !=, NULL);
814814
} else {
815-
/* short data column, add a skip sector */
815+
/*
816+
* short data column, add a skip sector and clear
817+
* rc_tried to force the entire column to be re-read
818+
* thereby including the missing skip sector data
819+
* which is needed for reconstruction.
820+
*/
816821
ASSERT3U(rc->rc_size + skip_size, ==, parity_size);
817822
ASSERT3U(rr->rr_nempty, !=, 0);
818823
ASSERT3P(rc->rc_abd, !=, NULL);
@@ -823,6 +828,7 @@ vdev_draid_map_alloc_empty(zio_t *zio, raidz_row_t *rr)
823828
abd_gang_add(rc->rc_abd, abd_get_offset_size(
824829
rr->rr_abd_empty, skip_off, skip_size), B_TRUE);
825830
skip_off += skip_size;
831+
rc->rc_tried = 0;
826832
}
827833

828834
/*

tests/runfiles/common.run

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -741,8 +741,8 @@ tests = ['raidz_001_neg', 'raidz_002_pos', 'raidz_003_pos', 'raidz_004_pos']
741741
tags = ['functional', 'raidz']
742742

743743
[tests/functional/redundancy]
744-
tests = ['redundancy_draid1', 'redundancy_draid2', 'redundancy_draid3',
745-
'redundancy_draid_spare1', 'redundancy_draid_spare2',
744+
tests = ['redundancy_draid', 'redundancy_draid1', 'redundancy_draid2',
745+
'redundancy_draid3', 'redundancy_draid_spare1', 'redundancy_draid_spare2',
746746
'redundancy_draid_spare3', 'redundancy_mirror', 'redundancy_raidz',
747747
'redundancy_raidz1', 'redundancy_raidz2', 'redundancy_raidz3',
748748
'redundancy_stripe']

tests/zfs-tests/tests/functional/redundancy/Makefile.am

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ pkgdatadir = $(datadir)/@PACKAGE@/zfs-tests/tests/functional/redundancy
22
dist_pkgdata_SCRIPTS = \
33
setup.ksh \
44
cleanup.ksh \
5+
redundancy_draid.ksh \
56
redundancy_draid1.ksh \
67
redundancy_draid2.ksh \
78
redundancy_draid3.ksh \
Lines changed: 248 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,248 @@
1+
#!/bin/ksh -p
2+
#
3+
# CDDL HEADER START
4+
#
5+
# The contents of this file are subject to the terms of the
6+
# Common Development and Distribution License (the "License").
7+
# You may not use this file except in compliance with the License.
8+
#
9+
# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
10+
# or http://www.opensolaris.org/os/licensing.
11+
# See the License for the specific language governing permissions
12+
# and limitations under the License.
13+
#
14+
# When distributing Covered Code, include this CDDL HEADER in each
15+
# file and include the License file at usr/src/OPENSOLARIS.LICENSE.
16+
# If applicable, add the following below this CDDL HEADER, with the
17+
# fields enclosed by brackets "[]" replaced with your own identifying
18+
# information: Portions Copyright [yyyy] [name of copyright owner]
19+
#
20+
# CDDL HEADER END
21+
#
22+
23+
#
24+
# Copyright (c) 2020 by vStack. All rights reserved.
25+
# Copyright (c) 2021 by Delphix. All rights reserved.
26+
# Copyright (c) 2021 by Lawrence Livermore National Security, LLC.
27+
#
28+
29+
. $STF_SUITE/include/libtest.shlib
30+
. $STF_SUITE/tests/functional/redundancy/redundancy.kshlib
31+
32+
#
33+
# DESCRIPTION:
34+
# dRAID should provide redundancy
35+
#
36+
# STRATEGY:
37+
# 1. Create block device files for the test draid pool
38+
# 2. For each parity value [1..3]
39+
# - create draid pool
40+
# - fill it with some directories/files
41+
# - verify self-healing by overwriting devices
42+
# - verify resilver by replacing devices
43+
# - verify scrub by zeroing devices
44+
# - destroy the draid pool
45+
46+
typeset -r devs=6
47+
typeset -r dev_size_mb=512
48+
49+
typeset -a disks
50+
51+
prefetch_disable=$(get_tunable PREFETCH_DISABLE)
52+
53+
function cleanup
54+
{
55+
poolexists "$TESTPOOL" && destroy_pool "$TESTPOOL"
56+
57+
for i in {0..$devs}; do
58+
rm -f "$TEST_BASE_DIR/dev-$i"
59+
done
60+
61+
set_tunable32 PREFETCH_DISABLE $prefetch_disable
62+
}
63+
64+
function test_selfheal # <pool> <parity> <dir>
65+
{
66+
typeset pool=$1
67+
typeset nparity=$2
68+
typeset dir=$3
69+
70+
log_must zpool export $pool
71+
72+
for (( i=0; i<$nparity; i=i+1 )); do
73+
log_must dd conv=notrunc if=/dev/zero of=$dir/dev-$i \
74+
bs=1M seek=4 count=$(($dev_size_mb-4))
75+
done
76+
77+
log_must zpool import -o cachefile=none -d $dir $pool
78+
79+
typeset mntpnt=$(get_prop mountpoint $pool/fs)
80+
log_must find $mntpnt -type f -exec cksum {} + >> /dev/null 2>&1
81+
log_must check_pool_status $pool "errors" "No known data errors"
82+
83+
#
84+
# Scrub the pool because the find command will only self-heal blocks
85+
# from the files which were read. Before overwriting additional
86+
# devices we need to repair all of the blocks in the pool.
87+
#
88+
log_must zpool scrub -w $pool
89+
log_must check_pool_status $pool "errors" "No known data errors"
90+
91+
log_must zpool clear $pool
92+
93+
log_must zpool export $pool
94+
95+
for (( i=$nparity; i<$nparity*2; i=i+1 )); do
96+
log_must dd conv=notrunc if=/dev/zero of=$dir/dev-$i \
97+
bs=1M seek=4 count=$(($dev_size_mb-4))
98+
done
99+
100+
log_must zpool import -o cachefile=none -d $dir $pool
101+
102+
typeset mntpnt=$(get_prop mountpoint $pool/fs)
103+
log_must find $mntpnt -type f -exec cksum {} + >> /dev/null 2>&1
104+
log_must check_pool_status $pool "errors" "No known data errors"
105+
106+
log_must zpool scrub -w $pool
107+
log_must check_pool_status $pool "errors" "No known data errors"
108+
109+
log_must zpool clear $pool
110+
}
111+
112+
function test_resilver # <pool> <parity> <dir>
113+
{
114+
typeset pool=$1
115+
typeset nparity=$2
116+
typeset dir=$3
117+
118+
for (( i=0; i<$nparity; i=i+1 )); do
119+
log_must zpool offline $pool $dir/dev-$i
120+
done
121+
122+
log_must zpool export $pool
123+
124+
for (( i=0; i<$nparity; i=i+1 )); do
125+
log_must zpool labelclear -f $dir/dev-$i
126+
done
127+
128+
log_must zpool import -o cachefile=none -d $dir $pool
129+
130+
for (( i=0; i<$nparity; i=i+1 )); do
131+
log_must zpool replace -fw $pool $dir/dev-$i
132+
done
133+
134+
log_must check_pool_status $pool "errors" "No known data errors"
135+
resilver_cksum=$(cksum_pool $pool)
136+
if [[ $resilver_cksum != 0 ]]; then
137+
log_must zpool status -v $pool
138+
log_fail "resilver cksum errors: $resilver_cksum"
139+
fi
140+
141+
log_must zpool clear $pool
142+
143+
for (( i=$nparity; i<$nparity*2; i=i+1 )); do
144+
log_must zpool offline $pool $dir/dev-$i
145+
done
146+
147+
log_must zpool export $pool
148+
149+
for (( i=$nparity; i<$nparity*2; i=i+1 )); do
150+
log_must zpool labelclear -f $dir/dev-$i
151+
done
152+
153+
log_must zpool import -o cachefile=none -d $dir $pool
154+
155+
for (( i=$nparity; i<$nparity*2; i=i+1 )); do
156+
log_must zpool replace -fw $pool $dir/dev-$i
157+
done
158+
159+
log_must check_pool_status $pool "errors" "No known data errors"
160+
resilver_cksum=$(cksum_pool $pool)
161+
if [[ $resilver_cksum != 0 ]]; then
162+
log_must zpool status -v $pool
163+
log_fail "resilver cksum errors: $resilver_cksum"
164+
fi
165+
166+
log_must zpool clear $pool
167+
}
168+
169+
function test_scrub # <pool> <parity> <dir>
170+
{
171+
typeset pool=$1
172+
typeset nparity=$2
173+
typeset dir=$3
174+
175+
log_must zpool export $pool
176+
177+
for (( i=0; i<$nparity; i=i+1 )); do
178+
dd conv=notrunc if=/dev/zero of=$dir/dev-$i \
179+
bs=1M seek=4 count=$(($dev_size_mb-4))
180+
done
181+
182+
log_must zpool import -o cachefile=none -d $dir $pool
183+
184+
log_must zpool scrub -w $pool
185+
log_must check_pool_status $pool "errors" "No known data errors"
186+
187+
log_must zpool clear $pool
188+
189+
log_must zpool export $pool
190+
191+
for (( i=$nparity; i<$nparity*2; i=i+1 )); do
192+
dd conv=notrunc if=/dev/zero of=$dir/dev-$i \
193+
bs=1M seek=4 count=$(($dev_size_mb-4))
194+
done
195+
196+
log_must zpool import -o cachefile=none -d $dir $pool
197+
198+
log_must zpool scrub -w $pool
199+
log_must check_pool_status $pool "errors" "No known data errors"
200+
201+
log_must zpool clear $pool
202+
}
203+
204+
log_onexit cleanup
205+
206+
log_must set_tunable32 PREFETCH_DISABLE 1
207+
208+
# Disk files which will be used by pool
209+
for i in {0..$(($devs - 1))}; do
210+
device=$TEST_BASE_DIR/dev-$i
211+
log_must truncate -s ${dev_size_mb}M $device
212+
disks[${#disks[*]}+1]=$device
213+
done
214+
215+
# Disk file which will be attached
216+
log_must truncate -s 512M $TEST_BASE_DIR/dev-$devs
217+
218+
for nparity in 1 2 3; do
219+
raid=draid$nparity
220+
dir=$TEST_BASE_DIR
221+
222+
log_must zpool create -f -o cachefile=none $TESTPOOL $raid ${disks[@]}
223+
log_must zfs set primarycache=metadata $TESTPOOL
224+
225+
log_must zfs create $TESTPOOL/fs
226+
log_must fill_fs /$TESTPOOL/fs 1 512 100 1024 R
227+
228+
log_must zfs create -o compress=on $TESTPOOL/fs2
229+
log_must fill_fs /$TESTPOOL/fs2 1 512 100 1024 R
230+
231+
log_must zfs create -o compress=on -o recordsize=8k $TESTPOOL/fs3
232+
log_must fill_fs /$TESTPOOL/fs3 1 512 100 1024 R
233+
234+
typeset pool_size=$(get_pool_prop size $TESTPOOL)
235+
236+
log_must zpool export $TESTPOOL
237+
log_must zpool import -o cachefile=none -d $dir $TESTPOOL
238+
239+
log_must check_pool_status $TESTPOOL "errors" "No known data errors"
240+
241+
test_selfheal $TESTPOOL $nparity $dir
242+
test_resilver $TESTPOOL $nparity $dir
243+
test_scrub $TESTPOOL $nparity $dir
244+
245+
log_must zpool destroy "$TESTPOOL"
246+
done
247+
248+
log_pass "draid redundancy test succeeded."

tests/zfs-tests/tests/functional/redundancy/redundancy_raidz.ksh

Lines changed: 51 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@
2323
#
2424
# Copyright (c) 2020 by vStack. All rights reserved.
2525
# Copyright (c) 2021 by Delphix. All rights reserved.
26+
# Copyright (c) 2021 by Lawrence Livermore National Security, LLC.
2627
#
2728

2829
. $STF_SUITE/include/libtest.shlib
@@ -37,6 +38,7 @@
3738
# 2. For each parity value [1..3]
3839
# - create raidz pool
3940
# - fill it with some directories/files
41+
# - verify self-healing by overwriting devices
4042
# - verify resilver by replacing devices
4143
# - verify scrub by zeroing devices
4244
# - destroy the raidz pool
@@ -59,6 +61,54 @@ function cleanup
5961
set_tunable32 PREFETCH_DISABLE $prefetch_disable
6062
}
6163

64+
function test_selfheal # <pool> <parity> <dir>
65+
{
66+
typeset pool=$1
67+
typeset nparity=$2
68+
typeset dir=$3
69+
70+
log_must zpool export $pool
71+
72+
for (( i=0; i<$nparity; i=i+1 )); do
73+
log_must dd conv=notrunc if=/dev/zero of=$dir/dev-$i \
74+
bs=1M seek=4 count=$(($dev_size_mb-4))
75+
done
76+
77+
log_must zpool import -o cachefile=none -d $dir $pool
78+
79+
typeset mntpnt=$(get_prop mountpoint $pool/fs)
80+
log_must find $mntpnt -type f -exec cksum {} + >> /dev/null 2>&1
81+
log_must check_pool_status $pool "errors" "No known data errors"
82+
83+
#
84+
# Scrub the pool because the find command will only self-heal blocks
85+
# from the files which were read. Before overwriting additional
86+
# devices we need to repair all of the blocks in the pool.
87+
#
88+
log_must zpool scrub -w $pool
89+
log_must check_pool_status $pool "errors" "No known data errors"
90+
91+
log_must zpool clear $pool
92+
93+
log_must zpool export $pool
94+
95+
for (( i=$nparity; i<$nparity*2; i=i+1 )); do
96+
log_must dd conv=notrunc if=/dev/zero of=$dir/dev-$i \
97+
bs=1M seek=4 count=$(($dev_size_mb-4))
98+
done
99+
100+
log_must zpool import -o cachefile=none -d $dir $pool
101+
102+
typeset mntpnt=$(get_prop mountpoint $pool/fs)
103+
log_must find $mntpnt -type f -exec cksum {} + >> /dev/null 2>&1
104+
log_must check_pool_status $pool "errors" "No known data errors"
105+
106+
log_must zpool scrub -w $pool
107+
log_must check_pool_status $pool "errors" "No known data errors"
108+
109+
log_must zpool clear $pool
110+
}
111+
62112
function test_resilver # <pool> <parity> <dir>
63113
{
64114
typeset pool=$1
@@ -121,7 +171,6 @@ function test_scrub # <pool> <parity> <dir>
121171
typeset pool=$1
122172
typeset nparity=$2
123173
typeset dir=$3
124-
typeset combrec=$4
125174

126175
log_must zpool export $pool
127176

@@ -189,6 +238,7 @@ for nparity in 1 2 3; do
189238

190239
log_must check_pool_status $TESTPOOL "errors" "No known data errors"
191240

241+
test_selfheal $TESTPOOL $nparity $dir
192242
test_resilver $TESTPOOL $nparity $dir
193243
test_scrub $TESTPOOL $nparity $dir
194244

0 commit comments

Comments
 (0)