Skip to content

Commit e12a85b

Browse files
committed
Improve zfs send performance by bypassing the ARC
When doing a zfs send on a dataset with small recordsize (e.g. 8K), performance is dominated by the per-block overheads. This is especially true with `zfs send --compressed`, which further reduces the amount of data sent, for the same number of blocks. Several threads are involved, but the limiting factor is the `send_prefetch` thread, which is 100% on CPU. The main job of the `send_prefetch` thread is to issue zio's for the data that will be needed by the main thread. It does this by calling `arc_read(ARC_FLAG_PREFETCH)`. This has an immediate cost of creating an arc_hdr, which takes around 14% of one CPU. It also induces later costs by other threads: * Since the data was only prefetched, dmu_send()->dmu_dump_write() will need to call arc_read() again to get the data. This will have to look up the arc_hdr in the hash table and copy the data from the scatter ABD in the arc_hdr to a linear ABD in arc_buf. This takes 27% of one CPU. * dmu_dump_write() needs to arc_buf_destroy() This takes 11% of one CPU. * arc_adjust() will need to evict this arc_hdr, taking about 50% of one CPU. All of these costs can be avoided by bypassing the ARC if the data is not already cached. This commit changes `zfs send` to check for the data in the ARC, and if it is not found then we directly call `zio_read()`, reading the data into a linear ABD which is used by dmu_dump_write() directly. The performance improvement is best expressed in terms of how many blocks can be processed by `zfs send` in one second. This change increases the metric by 50%, from ~100,000 to ~150,000. When the amount of data per block is small (e.g. 2KB), there is a corresponding reduction in the elapsed time of `zfs send >/dev/null` (from 86 minutes to 58 minutes in this test case). In addition to improving the performance of `zfs send`, this change makes `zfs send` not pollute the ARC cache. In most cases the data will not be reused, so this allows us to keep caching useful data in the MRU (hit-once) part of the ARC. Signed-off-by: Matthew Ahrens <[email protected]>
1 parent 5a1abc4 commit e12a85b

File tree

4 files changed

+235
-151
lines changed

4 files changed

+235
-151
lines changed

include/sys/arc.h

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -146,6 +146,12 @@ typedef enum arc_flags
146146
ARC_FLAG_COMPRESSED_ARC = 1 << 20,
147147
ARC_FLAG_SHARED_DATA = 1 << 21,
148148

149+
/*
150+
* Fail this arc_read() (with ENOENT) if the data is not already present
151+
* in cache.
152+
*/
153+
ARC_FLAG_CACHED_ONLY = 1 << 22,
154+
149155
/*
150156
* The arc buffer's compression mode is stored in the top 7 bits of the
151157
* flags field, so these dummy flags are included so that MDB can

include/sys/arc_impl.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -554,6 +554,7 @@ typedef struct arc_stats {
554554
kstat_named_t arcstat_need_free;
555555
kstat_named_t arcstat_sys_free;
556556
kstat_named_t arcstat_raw_size;
557+
kstat_named_t arcstat_cached_only_in_progress;
557558
} arc_stats_t;
558559

559560
typedef enum free_memory_reason_t {

module/zfs/arc.c

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -548,7 +548,8 @@ arc_stats_t arc_stats = {
548548
{ "demand_hit_prescient_prefetch", KSTAT_DATA_UINT64 },
549549
{ "arc_need_free", KSTAT_DATA_UINT64 },
550550
{ "arc_sys_free", KSTAT_DATA_UINT64 },
551-
{ "arc_raw_size", KSTAT_DATA_UINT64 }
551+
{ "arc_raw_size", KSTAT_DATA_UINT64 },
552+
{ "cached_only_in_progress", KSTAT_DATA_UINT64 },
552553
};
553554

554555
#define ARCSTAT_MAX(stat, val) { \
@@ -5563,6 +5564,13 @@ arc_read(zio_t *pio, spa_t *spa, const blkptr_t *bp,
55635564
if (HDR_IO_IN_PROGRESS(hdr)) {
55645565
zio_t *head_zio = hdr->b_l1hdr.b_acb->acb_zio_head;
55655566

5567+
if (*arc_flags & ARC_FLAG_CACHED_ONLY) {
5568+
mutex_exit(hash_lock);
5569+
ARCSTAT_BUMP(arcstat_cached_only_in_progress);
5570+
rc = SET_ERROR(ENOENT);
5571+
goto out;
5572+
}
5573+
55665574
ASSERT3P(head_zio, !=, NULL);
55675575
if ((hdr->b_flags & ARC_FLAG_PRIO_ASYNC_READ) &&
55685576
priority == ZIO_PRIORITY_SYNC_READ) {
@@ -5698,12 +5706,21 @@ arc_read(zio_t *pio, spa_t *spa, const blkptr_t *bp,
56985706
uint64_t size;
56995707
abd_t *hdr_abd;
57005708

5709+
if (*arc_flags & ARC_FLAG_CACHED_ONLY) {
5710+
rc = SET_ERROR(ENOENT);
5711+
if (hash_lock != NULL)
5712+
mutex_exit(hash_lock);
5713+
goto out;
5714+
}
5715+
57015716
/*
57025717
* Gracefully handle a damaged logical block size as a
57035718
* checksum error.
57045719
*/
57055720
if (lsize > spa_maxblocksize(spa)) {
57065721
rc = SET_ERROR(ECKSUM);
5722+
if (hash_lock != NULL)
5723+
mutex_exit(hash_lock);
57075724
goto out;
57085725
}
57095726

0 commit comments

Comments
 (0)