Skip to content

Commit 87c25d5

Browse files
ahrensbehlendorf
authored andcommitted
abd_alloc should use scatter for >1K allocations
abd_alloc() normally does scatter allocations, thus solving the problem that ABD originally set out to: the bulk of ZFS's allocations are single pages, which are faster to allocate and free, and don't suffer from internal fragmentation (and the inability to reclaim memory because some buffers in the slab are still allocated). However, the current code does linear allocations for 4KB and smaller allocations, defeating the purpose of ABD. Scatter ABD's use at least one page each, so sub-page allocations waste some space when allocated as scatter (e.g. 2KB scatter allocation wastes half of each page). Using linear ABD's for small allocations means that they will be put on slabs which contain many allocations. This can improve memory efficiency, but it also makes it much harder for ARC evictions to actually free pages, because all the buffers on one slab need to be freed in order for the slab (and underlying pages) to be freed. Typically, 512B and 1KB kmem caches have 16 buffers per slab, so it's possible for them to actually waste more memory than scatter (one page per buf = wasting 3/4 or 7/8th; one buf per slab = wasting 15/16th). Spill blocks are typically 512B and are heavily used on systems running selinux with the default dnode size and the `xattr=sa` property set. By default we will use linear allocations for 512B and 1KB, and scatter allocations for larger (1.5KB and up). Reviewed-by: George Melikov <[email protected]> Reviewed-by: DHE <[email protected]> Reviewed-by: Chunwei Chen <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Don Brady <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #8455
1 parent 3a1f2d5 commit 87c25d5

File tree

2 files changed

+43
-3
lines changed

2 files changed

+43
-3
lines changed

man/man5/zfs-module-parameters.5

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
'\" te
22
.\" Copyright (c) 2013 by Turbo Fredriksson <[email protected]>. All rights reserved.
3-
.\" Copyright (c) 2018 by Delphix. All rights reserved.
3+
.\" Copyright (c) 2019 by Delphix. All rights reserved.
44
.\" Copyright (c) 2019 Datto Inc.
55
.\" The contents of this file are subject to the terms of the Common Development
66
.\" and Distribution License (the "License"). You may not use this file except
@@ -537,6 +537,18 @@ Min time before an active prefetch stream can be reclaimed
537537
Default value: \fB2\fR.
538538
.RE
539539

540+
.sp
541+
.ne 2
542+
.na
543+
\fBzfs_abd_scatter_min_size\fR (uint)
544+
.ad
545+
.RS 12n
546+
This is the minimum allocation size that will use scatter (page-based)
547+
ABD's. Smaller allocations will use linear ABD's.
548+
.sp
549+
Default value: \fB1536\fR (512B and 1KB allocations will be linear).
550+
.RE
551+
540552
.sp
541553
.ne 2
542554
.na

module/zfs/abd.c

Lines changed: 30 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020
*/
2121
/*
2222
* Copyright (c) 2014 by Chunwei Chen. All rights reserved.
23-
* Copyright (c) 2016 by Delphix. All rights reserved.
23+
* Copyright (c) 2019 by Delphix. All rights reserved.
2424
*/
2525

2626
/*
@@ -209,6 +209,30 @@ static abd_stats_t abd_stats = {
209209
int zfs_abd_scatter_enabled = B_TRUE;
210210
unsigned zfs_abd_scatter_max_order = MAX_ORDER - 1;
211211

212+
/*
213+
* zfs_abd_scatter_min_size is the minimum allocation size to use scatter
214+
* ABD's. Smaller allocations will use linear ABD's which uses
215+
* zio_[data_]buf_alloc().
216+
*
217+
* Scatter ABD's use at least one page each, so sub-page allocations waste
218+
* some space when allocated as scatter (e.g. 2KB scatter allocation wastes
219+
* half of each page). Using linear ABD's for small allocations means that
220+
* they will be put on slabs which contain many allocations. This can
221+
* improve memory efficiency, but it also makes it much harder for ARC
222+
* evictions to actually free pages, because all the buffers on one slab need
223+
* to be freed in order for the slab (and underlying pages) to be freed.
224+
* Typically, 512B and 1KB kmem caches have 16 buffers per slab, so it's
225+
* possible for them to actually waste more memory than scatter (one page per
226+
* buf = wasting 3/4 or 7/8th; one buf per slab = wasting 15/16th).
227+
*
228+
* Spill blocks are typically 512B and are heavily used on systems running
229+
* selinux with the default dnode size and the `xattr=sa` property set.
230+
*
231+
* By default we use linear allocations for 512B and 1KB, and scatter
232+
* allocations for larger (1.5KB and up).
233+
*/
234+
int zfs_abd_scatter_min_size = 512 * 3;
235+
212236
static kmem_cache_t *abd_cache = NULL;
213237
static kstat_t *abd_ksp;
214238

@@ -581,7 +605,8 @@ abd_free_struct(abd_t *abd)
581605
abd_t *
582606
abd_alloc(size_t size, boolean_t is_metadata)
583607
{
584-
if (!zfs_abd_scatter_enabled || size <= PAGESIZE)
608+
/* see the comment above zfs_abd_scatter_min_size */
609+
if (!zfs_abd_scatter_enabled || size < zfs_abd_scatter_min_size)
585610
return (abd_alloc_linear(size, is_metadata));
586611

587612
VERIFY3U(size, <=, SPA_MAXBLOCKSIZE);
@@ -1532,6 +1557,9 @@ abd_scatter_bio_map_off(struct bio *bio, abd_t *abd,
15321557
module_param(zfs_abd_scatter_enabled, int, 0644);
15331558
MODULE_PARM_DESC(zfs_abd_scatter_enabled,
15341559
"Toggle whether ABD allocations must be linear.");
1560+
module_param(zfs_abd_scatter_min_size, int, 0644);
1561+
MODULE_PARM_DESC(zfs_abd_scatter_min_size,
1562+
"Minimum size of scatter allocations.");
15351563
/* CSTYLED */
15361564
module_param(zfs_abd_scatter_max_order, uint, 0644);
15371565
MODULE_PARM_DESC(zfs_abd_scatter_max_order,

0 commit comments

Comments
 (0)