Skip to content

Commit a8c2560

Browse files
amotintonyhutter
authored andcommitted
ZIL: Call brt_pending_add() replaying TX_CLONE_RANGE
zil_claim_clone_range() takes references on cloned blocks before ZIL replay. Later zil_free_clone_range() drops them after replay or on dataset destroy. The total balance is neutral. It means on actual replay we must take additional references, which would stay in BRT. Without this blocks could be freed prematurely when either original file or its clone are destroyed. I've observed BRT being emptied and the feature being deactivated after ZIL replay completion, which should not have happened. With the patch I see expected stats. Reviewed-by: Kay Pedersen <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Rob Norris <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes #15603
1 parent eb34de0 commit a8c2560

File tree

4 files changed

+12
-15
lines changed

4 files changed

+12
-15
lines changed

include/sys/dmu.h

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1072,8 +1072,7 @@ int dmu_offset_next(objset_t *os, uint64_t object, boolean_t hole,
10721072
int dmu_read_l0_bps(objset_t *os, uint64_t object, uint64_t offset,
10731073
uint64_t length, struct blkptr *bps, size_t *nbpsp);
10741074
int dmu_brt_clone(objset_t *os, uint64_t object, uint64_t offset,
1075-
uint64_t length, dmu_tx_t *tx, const struct blkptr *bps, size_t nbps,
1076-
boolean_t replay);
1075+
uint64_t length, dmu_tx_t *tx, const struct blkptr *bps, size_t nbps);
10771076

10781077
/*
10791078
* Initial setup and final teardown.

module/zfs/brt.c

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -235,13 +235,13 @@
235235
* destination dataset is mounted and its ZIL replayed.
236236
* To address this situation we leverage zil_claim() mechanism where ZFS will
237237
* parse all the ZILs on pool import. When we come across TX_CLONE_RANGE
238-
* entries, we will bump reference counters for their BPs in the BRT and then
239-
* on mount and ZIL replay we will just attach BPs to the file without
240-
* bumping reference counters.
241-
* Note it is still possible that after zil_claim() we never mount the
242-
* destination, so we never replay its ZIL and we destroy it. This way we would
243-
* end up with leaked references in BRT. We address that too as ZFS gives us
244-
* a chance to clean this up on dataset destroy (see zil_free_clone_range()).
238+
* entries, we will bump reference counters for their BPs in the BRT. Then
239+
* on mount and ZIL replay we bump the reference counters once more, while the
240+
* first references are dropped during ZIL destroy by zil_free_clone_range().
241+
* It is possible that after zil_claim() we never mount the destination, so
242+
* we never replay its ZIL and just destroy it. In this case the only taken
243+
* references will be dropped by zil_free_clone_range(), since the cloning is
244+
* not going to ever take place.
245245
*/
246246

247247
static kmem_cache_t *brt_entry_cache;

module/zfs/dmu.c

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2267,7 +2267,7 @@ dmu_read_l0_bps(objset_t *os, uint64_t object, uint64_t offset, uint64_t length,
22672267

22682268
int
22692269
dmu_brt_clone(objset_t *os, uint64_t object, uint64_t offset, uint64_t length,
2270-
dmu_tx_t *tx, const blkptr_t *bps, size_t nbps, boolean_t replay)
2270+
dmu_tx_t *tx, const blkptr_t *bps, size_t nbps)
22712271
{
22722272
spa_t *spa;
22732273
dmu_buf_t **dbp, *dbuf;
@@ -2341,10 +2341,8 @@ dmu_brt_clone(objset_t *os, uint64_t object, uint64_t offset, uint64_t length,
23412341
* When data in embedded into BP there is no need to create
23422342
* BRT entry as there is no data block. Just copy the BP as
23432343
* it contains the data.
2344-
* Also, when replaying ZIL we don't want to bump references
2345-
* in the BRT as it was already done during ZIL claim.
23462344
*/
2347-
if (!replay && !BP_IS_HOLE(bp) && !BP_IS_EMBEDDED(bp)) {
2345+
if (!BP_IS_HOLE(bp) && !BP_IS_EMBEDDED(bp)) {
23482346
brt_pending_add(spa, bp, tx);
23492347
}
23502348
}

module/zfs/zfs_vnops.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1333,7 +1333,7 @@ zfs_clone_range(znode_t *inzp, uint64_t *inoffp, znode_t *outzp,
13331333
}
13341334

13351335
error = dmu_brt_clone(outos, outzp->z_id, outoff, size, tx,
1336-
bps, nbps, B_FALSE);
1336+
bps, nbps);
13371337
if (error != 0) {
13381338
dmu_tx_commit(tx);
13391339
break;
@@ -1467,7 +1467,7 @@ zfs_clone_range_replay(znode_t *zp, uint64_t off, uint64_t len, uint64_t blksz,
14671467
if (zp->z_blksz < blksz)
14681468
zfs_grow_blocksize(zp, blksz, tx);
14691469

1470-
dmu_brt_clone(zfsvfs->z_os, zp->z_id, off, len, tx, bps, nbps, B_TRUE);
1470+
dmu_brt_clone(zfsvfs->z_os, zp->z_id, off, len, tx, bps, nbps);
14711471

14721472
zfs_tstamp_update_setup(zp, CONTENT_MODIFIED, mtime, ctime);
14731473

0 commit comments

Comments
 (0)