Skip to content

Commit 46c4f2c

Browse files
authored
dsl_dataset: put IO-inducing frees on the pool deadlist
dsl_free() calls zio_free() to free the block. For most blocks, this simply calls metaslab_free() without doing any IO or putting anything on the IO pipeline. Some blocks however require additional IO to free. This at least includes gang, dedup and cloned blocks. For those, zio_free() will issue a ZIO_TYPE_FREE IO and return. If a huge number of blocks are being freed all at once, it's possible for dsl_dataset_block_kill() to be called millions of time on a single transaction (eg a 2T object of 128K blocks is 16M blocks). If those are all IO-inducing frees, that then becomes 16M FREE IOs placed on the pipeline. At time of writing, a zio_t is 1280 bytes, so for just one 2T object that requires a 20G allocation of resident memory from the zio_cache. If that can't be satisfied by the kernel, an out-of-memory condition is raised. This would be better handled by improving the cases that the dmu_tx_assign() throttle will handle, or by reducing the overheads required by the IO pipeline, or with a better central facility for freeing blocks. For now, we simply check for the cases that would cause zio_free() to create a FREE IO, and instead put the block on the pool's freelist. This is the same place that blocks from destroyed datasets go, and the async destroy machinery will automatically see them and trickle them out as normal. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes #6783 Closes #16708 Closes #16722 Closes #16697
1 parent a60ed38 commit 46c4f2c

File tree

1 file changed

+26
-2
lines changed

1 file changed

+26
-2
lines changed

module/zfs/dsl_dataset.c

Lines changed: 26 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,7 @@
6868
#include <sys/zio_compress.h>
6969
#include <zfs_fletcher.h>
7070
#include <sys/zio_checksum.h>
71+
#include <sys/brt.h>
7172

7273
/*
7374
* The SPA supports block sizes up to 16MB. However, very large blocks
@@ -289,18 +290,41 @@ dsl_dataset_block_kill(dsl_dataset_t *ds, const blkptr_t *bp, dmu_tx_t *tx,
289290
if (BP_GET_LOGICAL_BIRTH(bp) > dsl_dataset_phys(ds)->ds_prev_snap_txg) {
290291
int64_t delta;
291292

292-
dprintf_bp(bp, "freeing ds=%llu", (u_longlong_t)ds->ds_object);
293-
dsl_free(tx->tx_pool, tx->tx_txg, bp);
293+
/*
294+
* Put blocks that would create IO on the pool's deadlist for
295+
* dsl_process_async_destroys() to find. This is to prevent
296+
* zio_free() from creating a ZIO_TYPE_FREE IO for them, which
297+
* are very heavy and can lead to out-of-memory conditions if
298+
* something tries to free millions of blocks on the same txg.
299+
*/
300+
boolean_t defer = spa_version(spa) >= SPA_VERSION_DEADLISTS &&
301+
(BP_IS_GANG(bp) || BP_GET_DEDUP(bp) ||
302+
brt_maybe_exists(spa, bp));
303+
304+
if (defer) {
305+
dprintf_bp(bp, "putting on free list: %s", "");
306+
bpobj_enqueue(&ds->ds_dir->dd_pool->dp_free_bpobj,
307+
bp, B_FALSE, tx);
308+
} else {
309+
dprintf_bp(bp, "freeing ds=%llu",
310+
(u_longlong_t)ds->ds_object);
311+
dsl_free(tx->tx_pool, tx->tx_txg, bp);
312+
}
294313

295314
mutex_enter(&ds->ds_lock);
296315
ASSERT(dsl_dataset_phys(ds)->ds_unique_bytes >= used ||
297316
!DS_UNIQUE_IS_ACCURATE(ds));
298317
delta = parent_delta(ds, -used);
299318
dsl_dataset_phys(ds)->ds_unique_bytes -= used;
300319
mutex_exit(&ds->ds_lock);
320+
301321
dsl_dir_diduse_transfer_space(ds->ds_dir,
302322
delta, -compressed, -uncompressed, -used,
303323
DD_USED_REFRSRV, DD_USED_HEAD, tx);
324+
325+
if (defer)
326+
dsl_dir_diduse_space(tx->tx_pool->dp_free_dir,
327+
DD_USED_HEAD, used, compressed, uncompressed, tx);
304328
} else {
305329
dprintf_bp(bp, "putting on dead list: %s", "");
306330
if (async) {

0 commit comments

Comments
 (0)