Skip to content

Commit 5ed202c

Browse files
committed
zed: Add zedlet to power off slot when drive is faulted
If ZED_POWER_OFF_ENCLOUSRE_SLOT_ON_FAULT is enabled in zed.rc, then power off the drive's slot in the enclosure if it becomes FAULTED. This can help silence misbehaving drives. This assumes your drive enclosure fully supports slot power control via sysfs. Signed-off-by: Tony Hutter <[email protected]>
1 parent cae502c commit 5ed202c

File tree

2 files changed

+66
-0
lines changed

2 files changed

+66
-0
lines changed

cmd/zed/zed.d/statechange-slot_off.sh

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
#!/bin/sh
2+
#
3+
# Turn off disk's enclosure slot if it becomes FAULTED.
4+
#
5+
# Bad SCSI disks can often "disappear and reappear" causing all sorts of chaos
6+
# as they flip between FAULTED and ONLINE. If
7+
# ZED_POWER_OFF_ENCLOUSRE_SLOT_ON_FAULT is set in zed.rc, and the disk gets
8+
# FAULTED, then power down the slot via sysfs:
9+
#
10+
# /sys/class/enclosure/<enclosure>/<slot>/power_status
11+
#
12+
# We assume the user will be responsible for turning the slot back on again.
13+
#
14+
# Note that this script requires that your enclosure be supported by the
15+
# Linux SCSI Enclosure services (SES) driver. The script will do nothing
16+
# if you have no enclosure, or if your enclosure isn't supported.
17+
#
18+
# Exit codes:
19+
# 0: slot successfully powered off
20+
# 1: enclosure not available
21+
# 2: ZED_POWER_OFF_ENCLOUSRE_SLOT_ON_FAULT disabled
22+
# 3: vdev was not FAULTED
23+
# 4: The enclosure sysfs path passed from ZFS does not exist
24+
# 5: Enclosure slot didn't actually turn off after we told it to
25+
26+
[ -f "${ZED_ZEDLET_DIR}/zed.rc" ] && . "${ZED_ZEDLET_DIR}/zed.rc"
27+
. "${ZED_ZEDLET_DIR}/zed-functions.sh"
28+
29+
if [ ! -d /sys/class/enclosure ] ; then
30+
# No JBOD enclosure or NVMe slots
31+
exit 1
32+
fi
33+
34+
if [ "${ZED_POWER_OFF_ENCLOUSRE_SLOT_ON_FAULT}" != "1" ] ; then
35+
exit 2
36+
fi
37+
38+
if [ "$ZEVENT_VDEV_STATE_STR" != "FAULTED" ] ; then
39+
exit 3
40+
fi
41+
42+
if [ ! -f "$ZEVENT_VDEV_ENC_SYSFS_PATH/power_status" ] ; then
43+
exit 4
44+
fi
45+
46+
echo "off" | tee "$ZEVENT_VDEV_ENC_SYSFS_PATH/power_status"
47+
48+
# Wait for sysfs for report that the slot is off. It can take ~400ms on some
49+
# enclosures.
50+
for i in $(seq 1 20) ; do
51+
if [ "$(cat $ZEVENT_VDEV_ENC_SYSFS_PATH/power_status)" == "off" ] ; then
52+
break
53+
fi
54+
sleep 0.1
55+
done
56+
57+
if [ $i == 20 ] ; then
58+
exit 5
59+
fi
60+
61+
zed_log_msg "powered down slot $ZEVENT_VDEV_ENC_SYSFS_PATH for $ZEVENT_VDEV_PATH"

cmd/zed/zed.d/zed.rc

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -142,3 +142,8 @@ ZED_SYSLOG_SUBCLASS_EXCLUDE="history_event"
142142
# Disabled by default, 1 to enable and 0 to disable.
143143
#ZED_SYSLOG_DISPLAY_GUIDS=1
144144

145+
##
146+
# Power off the drive's slot in the enclosure if it becomes FAULTED. This can
147+
# help silence misbehaving drives. This assumes your drive enclosure fully
148+
# supports slot power control via sysfs.
149+
#ZED_POWER_OFF_ENCLOUSRE_SLOT_ON_FAULT=1

0 commit comments

Comments
 (0)