-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Tighten up 'zpool remove' space check #11409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
becab0a
to
6952aad
Compare
we can see issues on DilOS with usecase:
|
it is not issue with this commit, it issue with |
@ikozhukhov your exact test case still fails but it's due to the very small size of the vdevs (256M) and the comparatively large |
6952aad
to
e2b5b90
Compare
@ikozhukhov I've updated the PR to also address your original test case with very small vdevs. This behavior was confusing for users particularly those who wanted to just test out device removal and would immediately hit this issue. |
It looks like the new check ensures that at the time the removal starts, there is enough free space on the remaining devices to hold all the space allocated on the device to remove. However, the free space on the device to be removed may also be necessary to satisfy [ref]reservations. Therefore I think that after the removal you could end up with more space "in use" (including reservations) than there is in the pool. The existing check (based on the DMU's view of space) doesn't have this problem.
Sounds good.
I don't think that's the case. The current code makes the space in the removing device unavailable for use by the DMU, so concurrent write system calls can't cause the removal to fail. Instead, they will (correctly) fail with ENOSPC. See
Where does that happen? I think that you'd hit this assertion in
As mentioned above, this shouldn't happen because the DMU would not write() system calls to use space on the removing device (and the slop space is large enough for any MOS writes). |
13cda05
to
22cd16d
Compare
That's a good point I hadn't considered. I've restored the use of
Makes sense. I've removed that comment from the commit message. Thanks for pointing to me to the relevant code. |
12ceac9
to
5d01f56
Compare
module/zfs/vdev_removal.c
Outdated
* pool free, in addition to the normal slop space). | ||
*/ | ||
|
||
if (available - (available >> 5) < vd->vdev_stat.vs_dspace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is calculating what the comment says. "leave 3% of the space available" to me means that after the removal, AVAIL (dsl_dir_space_available(root_dir)) should be at least 3% (1/2^5) of the size of the pool after the removal.
For example, imagine we have a 100TB pool, are removing a 1TB device, and have 1.04TB AVAIL. The code here would allow the removal to proceed (1.04-1.04>>5=1.0075TB). After the removal there would be 0.0075TB AVAIL, but 3% of 99TB=~3TB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. I've updated the calculation accordingly. It now checks that the available space after the removal isn't less than spa_get_dspace() / (1^6)
. i.e. we always leave at least 1.6% of the pools normal class available. So for your example.
1.04TB AVAIL - 1.00TB = 0.04TB after remove which is less than 100TB / (1^6) = 1.6TB.
5d01f56
to
8e408b0
Compare
|
||
if (available < vd->vdev_stat.vs_dspace || | ||
available - vd->vdev_stat.vs_dspace < | ||
spa_get_dspace(spa) >> 6) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this fixes the problem that removing a disproportionately-huge device can get ENOSPC when we want it to succeed. e.g. imagine a pool with a 1TB device and a 100TB device, with 0.5TB used. If we try to remove the 100TB device, the check would be 100.5 - 100 < 101*0.016, or 0.5 < 1.6, which is true. I think want to require that after removal, there is 1.6% space AVAIL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While you're right, @ahrens, and I'm on an old 2.0.4 version and currently unwilling to upgrade, if nothing has changed in this regards since 2.0.4 then a major tighten up would be massively beneficial, and/or a way to override this with -f when the user does indeed know better. I will tag you both where I record my story from the last 2 days.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The available space check in spa_vdev_remove_top_check() is intended to verify there is enough available space on the other devices before starting the removal process. This is obviously a good idea but the current check can significantly overestimate the space requirements for small pools preventing removal when it clearly should be possible. This change reworks the check to drop the additional slop space from the calculation. This solves two problems: 1) If a large device is accidentally added to a small pool the slop requirement is dramatically increased. This in turn can prevent removal of that larger device even if it has never been used. This was particularly unfortunate since a main motivation for this feature was to allow such mistakes to be corrected. 2) For very small pools with 256M vdevs the minimum slop size of 128M represents a huge fraction of the total pool capacity. Requiring twice this amount of space to be available effectively prevents device removal from ever working with these tiny pools. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#11356
8e408b0
to
2f8c9bd
Compare
Motivation and Context
Issue #11356. Handle additional cases where device removal should
be possible but is prevented by an overly aggressive free space check.
Description
The available space check in
spa_vdev_remove_top_check()
is intendedto verify there is enough available space on the other devices before
starting the removal process. This is obviously a good idea but the
current check can significantly overestimate the space requirements
and often prevent removal when it clearly should be possible.
This change reworks the check to use the per-vdev vs_stats. This
is sufficiently accurate and is convenient because it allows a direct
comparison to the allocated space. If using
dsl_dir_space_available()
then the available space of the device being removed is also included
in the return value and must somehow be accounted for.
Additionally, we reduce the slop space requirement to be inline with
with the capacity of the pool after the device has been removed.
This way if a large device is accidentally added to a small pool the
vastly increased slop requirement of the larger pool won't prevent
removal. For example, it was previously possible that even newly
added empty vdevs couldn't be removed due to the increased slop
requirement. This was particularly unfortunate since one of the
main motivations for this feature was to allow such mistakes to be
corrected.
Lastly it's worth mentioning that by allowing the removal to start
with close to the minimum required free space it is more likely that
an active pool close to capacity may fail the removal process. This
failure case has always been possible since we can't know what new
data will be written during the removal process. It is correctly
handled and there's no danger to the pool.
How Has This Been Tested?
Added a new
remove_unbalanced
ZTS test which verifies thatwhen adding new large vdev (1TB) to an existing small pool (1GB)
it may still be removed as long as it hasn't been used. This would
previously fail.
Additionally, locally ran the "removal" ZTS test group.
Types of changes
Checklist:
Signed-off-by
.