Skip to content

Tighten up 'zpool remove' space check #11409

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 10 additions & 9 deletions module/zfs/vdev_removal.c
Original file line number Diff line number Diff line change
Expand Up @@ -2003,17 +2003,18 @@ spa_vdev_remove_top_check(vdev_t *vd)
if (available < vd->vdev_stat.vs_alloc)
return (SET_ERROR(ENOSPC));
} else {
/* available space in the pool's normal class */
/*
* This is a normal device. There has to be enough free
* space in the pool's normal class to remove the device
* and leave 1.6% of the space available. The available
* space accounts for both reserved and "slop" space.
*/
uint64_t available = dsl_dir_space_available(
spa->spa_dsl_pool->dp_root_dir, NULL, 0, B_TRUE);
if (available <
vd->vdev_stat.vs_dspace + spa_get_slop_space(spa)) {
/*
* This is a normal device. There has to be enough free
* space to remove the device and leave double the
* "slop" space (i.e. we must leave at least 3% of the
* pool free, in addition to the normal slop space).
*/

if (available < vd->vdev_stat.vs_dspace ||
available - vd->vdev_stat.vs_dspace <
spa_get_dspace(spa) >> 6) {
Copy link
Member

@ahrens ahrens Jan 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this fixes the problem that removing a disproportionately-huge device can get ENOSPC when we want it to succeed. e.g. imagine a pool with a 1TB device and a 100TB device, with 0.5TB used. If we try to remove the 100TB device, the check would be 100.5 - 100 < 101*0.016, or 0.5 < 1.6, which is true. I think want to require that after removal, there is 1.6% space AVAIL.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you're right, @ahrens, and I'm on an old 2.0.4 version and currently unwilling to upgrade, if nothing has changed in this regards since 2.0.4 then a major tighten up would be massively beneficial, and/or a way to override this with -f when the user does indeed know better. I will tag you both where I record my story from the last 2 days.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return (SET_ERROR(ENOSPC));
}
}
Expand Down
3 changes: 2 additions & 1 deletion tests/runfiles/common.run
Original file line number Diff line number Diff line change
Expand Up @@ -784,7 +784,8 @@ tests = ['removal_all_vdev', 'removal_cancel', 'removal_check_space',
'removal_with_send_recv', 'removal_with_snapshot',
'removal_with_write', 'removal_with_zdb', 'remove_expanded',
'remove_mirror', 'remove_mirror_sanity', 'remove_raidz',
'remove_indirect', 'remove_attach_mirror']
'remove_indirect', 'remove_attach_mirror', 'remove_minimum',
'remove_unbalanced']
tags = ['functional', 'removal']

[tests/functional/rename_dirs]
Expand Down
2 changes: 1 addition & 1 deletion tests/zfs-tests/tests/functional/removal/Makefile.am
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ dist_pkgdata_SCRIPTS = \
removal_with_snapshot.ksh removal_with_write.ksh \
removal_with_zdb.ksh remove_mirror.ksh remove_mirror_sanity.ksh \
remove_raidz.ksh remove_expanded.ksh remove_indirect.ksh \
remove_attach_mirror.ksh
remove_attach_mirror.ksh remove_minimum.ksh remove_unbalanced.ksh

dist_pkgdata_DATA = \
removal.kshlib
Expand Down
56 changes: 56 additions & 0 deletions tests/zfs-tests/tests/functional/removal/remove_minimum.ksh
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
#!/bin/ksh -p
#
# CDDL HEADER START
#
# This file and its contents are supplied under the terms of the
# Common Development and Distribution License ("CDDL"), version 1.0.
# You may only use this file in accordance with the terms of version
# 1.0 of the CDDL.
#
# A full copy of the text of the CDDL should have accompanied this
# source. A copy of the CDDL is also available via the Internet at
# http://www.illumos.org/license/CDDL.
#
# CDDL HEADER END
#

#
# Copyright (c) 2020 by Lawrence Livermore National Security, LLC.
#

. $STF_SUITE/include/libtest.shlib
. $STF_SUITE/tests/functional/removal/removal.kshlib

#
# DESCRIPTION:
# Device removal is possible for minimum sized vdevs.
#
# STRATEGY:
# 1. Create a pool with minimum sized removable devices
# 2. Remove a top-level device
#

verify_runnable "global"

function cleanup
{
default_cleanup_noexit
log_must rm -f $TEST_BASE_DIR/device-{1,2}
}

log_assert "Device removal is possible for minimum sized vdevs."
log_onexit cleanup

# 1. Create a pool with minimum sized removable devices
log_must truncate -s $MINVDEVSIZE $TEST_BASE_DIR/device-{1,2}
log_must default_setup_noexit "$TEST_BASE_DIR/device-1 $TEST_BASE_DIR/device-2"

log_must dd if=/dev/urandom of=$TESTDIR/$TESTFILE0 bs=1M count=64

# 2. Remove a top-level device
log_must zpool remove $TESTPOOL $TEST_BASE_DIR/device-1
log_must wait_for_removal $TESTPOOL

log_note "Capacity $(get_pool_prop capacity $TESTPOOL)"

log_pass "Device removal is possible for minimum sized vdevs"
57 changes: 57 additions & 0 deletions tests/zfs-tests/tests/functional/removal/remove_unbalanced.ksh
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
#!/bin/ksh -p
#
# CDDL HEADER START
#
# This file and its contents are supplied under the terms of the
# Common Development and Distribution License ("CDDL"), version 1.0.
# You may only use this file in accordance with the terms of version
# 1.0 of the CDDL.
#
# A full copy of the text of the CDDL should have accompanied this
# source. A copy of the CDDL is also available via the Internet at
# http://www.illumos.org/license/CDDL.
#
# CDDL HEADER END
#

#
# Copyright (c) 2020 by Lawrence Livermore National Security, LLC.
#

. $STF_SUITE/include/libtest.shlib
. $STF_SUITE/tests/functional/removal/removal.kshlib

#
# DESCRIPTION:
# Device removal is possible for highly unbalanced vdevs.
#
# STRATEGY:
# 1. Create a pool with unbalanced removable devices
# 2. Remove the large top-level device
#

verify_runnable "global"

function cleanup
{
default_cleanup_noexit
log_must rm -f $TEST_BASE_DIR/device-{1,2}
}

log_assert "Device removal is possible for highly unbalanced vdevs."
log_onexit cleanup

# 1. Create a pool with unbalanced removable devices
log_must truncate -s 10G $TEST_BASE_DIR/device-1
log_must truncate -s 1T $TEST_BASE_DIR/device-2
log_must default_setup_noexit "$TEST_BASE_DIR/device-1 $TEST_BASE_DIR/device-2"

log_must dd if=/dev/urandom of=$TESTDIR/$TESTFILE0 bs=1M count=32

# 2. Remove the large top-level device
log_must zpool remove $TESTPOOL $TEST_BASE_DIR/device-1
log_must wait_for_removal $TESTPOOL

log_note "Capacity $(get_pool_prop capacity $TESTPOOL)"

log_pass "Device removal is possible for highly unbalanced vdevs"