Optimize allocation throttling. #12314

amotin · 2021-07-02T02:01:26Z

Remove mc_lock use from metaslab_class_throttle_*(). The math there
is based on refcounts and so atomic, so the only race possible there
is between zfs_refcount_count() and zfs_refcount_add(). But in most
cases metaslab_class_throttle_reserve() is called with the allocator
lock held, which covers the race. In cases where the lock is not
held, GANG_ALLOCATION() or METASLAB_MUST_RESERVE are set, and so we
do not use zfs_refcount_count(). And even if we assume some other
non-existing scenario, the worst that may happen from this race is
few more I/Os get to allocation earlier, that is not a problem.

Move locks and data of different allocators into different cache
lines to avoid false sharing. Group spa_alloc_* arrays together
into single array of aligned struct spa_alloc spa_allocs. Align
struct metaslab_class_allocator.

How Has This Been Tested?

On 80-thread FreeBSD system doing ~220K 16KB ZVOLs writes profiler shows reduction of lock contention on allocator locks from 7.3% to 5.8%. In case of file writes, where parallel dnode sync more actively uses multiple allocators, the difference must be more dramatic, since mc_lock is global and not per-allocator.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the OpenZFS code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
I have run the ZFS Test Suite with this change applied.
All commit messages are properly formatted and contain Signed-off-by.

amotin · 2021-07-15T17:21:34Z

Can somebody comment on this? It is open for two weeks and it is not a rocket science.

module/zfs/zio.c

IsaacVaughn · 2021-07-16T12:35:33Z

Is there any further improvement if you switch zfs_refcount_add to zfs_refcount_add_many and zfs_refcount_remove to zfs_refcount_remove_many? There is a comment claiming that refs must be added individually to be removed individually, but this seems contradictory to the implementation of zfs_refcount as an atomic uint64. Perhaps the slots were previously implemented in a different manner?

Also, perhaps add a comment somewhere documenting the possible racy behavior if the functions are called with no allocation lock and neither GANG_ALLOCATION or METASLAB_MUST_RESERVE set. Although the race does appear harmless, it seems a potential pitfall for future developers who may be surprised at the resulting behavior.

amotin · 2021-07-16T13:23:02Z

Is there any further improvement if you switch zfs_refcount_add to zfs_refcount_add_many and zfs_refcount_remove to zfs_refcount_remove_many? There is a comment claiming that refs must be added individually to be removed individually, but this seems contradictory to the implementation of zfs_refcount as an atomic uint64.

It would be good, but present implementation is required for debug builds, in which the refcount code can be made to match allocations with frees to detect any mismatches, and _many() would not match. But it affects only blocks with multiple copies, so the difference should not be very dramatic.

include/sys/spa_impl.h

Remove mc_lock use from metaslab_class_throttle_*(). The math there is based on refcounts and so atomic, so the only race possible there is between zfs_refcount_count() and zfs_refcount_add(). But in most cases metaslab_class_throttle_reserve() is called with the allocator lock held, which covers the race. In cases where the lock is not held, GANG_ALLOCATION() or METASLAB_MUST_RESERVE are set, and so we do not use zfs_refcount_count(). And even if we assume some other non-existing scenario, the worst that may happen from this race is few more I/Os get to allocation earlier, that is not a problem. Move locks and data of different allocators into different cache lines to avoid false sharing. Group spa_alloc_* arrays together into single array of aligned struct spa_alloc spa_allocs. Align struct metaslab_class_allocator. Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc.

module/zfs/metaslab.c

Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Issue #12314 Closes #12419

Remove mc_lock use from metaslab_class_throttle_*(). The math there is based on refcounts and so atomic, so the only race possible there is between zfs_refcount_count() and zfs_refcount_add(). But in most cases metaslab_class_throttle_reserve() is called with the allocator lock held, which covers the race. In cases where the lock is not held, GANG_ALLOCATION() or METASLAB_MUST_RESERVE are set, and so we do not use zfs_refcount_count(). And even if we assume some other non-existing scenario, the worst that may happen from this race is few more I/Os get to allocation earlier, that is not a problem. Move locks and data of different allocators into different cache lines to avoid false sharing. Group spa_alloc_* arrays together into single array of aligned struct spa_alloc spa_allocs. Align struct metaslab_class_allocator. Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Don Brady <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Closes openzfs#12314

Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Issue openzfs#12314 Closes openzfs#12419

Remove mc_lock use from metaslab_class_throttle_*(). The math there is based on refcounts and so atomic, so the only race possible there is between zfs_refcount_count() and zfs_refcount_add(). But in most cases metaslab_class_throttle_reserve() is called with the allocator lock held, which covers the race. In cases where the lock is not held, GANG_ALLOCATION() or METASLAB_MUST_RESERVE are set, and so we do not use zfs_refcount_count(). And even if we assume some other non-existing scenario, the worst that may happen from this race is few more I/Os get to allocation earlier, that is not a problem. Move locks and data of different allocators into different cache lines to avoid false sharing. Group spa_alloc_* arrays together into single array of aligned struct spa_alloc spa_allocs. Align struct metaslab_class_allocator. Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Don Brady <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Closes openzfs#12314

Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Issue openzfs#12314 Closes openzfs#12419

Remove mc_lock use from metaslab_class_throttle_*(). The math there is based on refcounts and so atomic, so the only race possible there is between zfs_refcount_count() and zfs_refcount_add(). But in most cases metaslab_class_throttle_reserve() is called with the allocator lock held, which covers the race. In cases where the lock is not held, GANG_ALLOCATION() or METASLAB_MUST_RESERVE are set, and so we do not use zfs_refcount_count(). And even if we assume some other non-existing scenario, the worst that may happen from this race is few more I/Os get to allocation earlier, that is not a problem. Move locks and data of different allocators into different cache lines to avoid false sharing. Group spa_alloc_* arrays together into single array of aligned struct spa_alloc spa_allocs. Align struct metaslab_class_allocator. Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Don Brady <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Closes #12314

Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Issue #12314 Closes #12419

Remove mc_lock use from metaslab_class_throttle_*(). The math there is based on refcounts and so atomic, so the only race possible there is between zfs_refcount_count() and zfs_refcount_add(). But in most cases metaslab_class_throttle_reserve() is called with the allocator lock held, which covers the race. In cases where the lock is not held, GANG_ALLOCATION() or METASLAB_MUST_RESERVE are set, and so we do not use zfs_refcount_count(). And even if we assume some other non-existing scenario, the worst that may happen from this race is few more I/Os get to allocation earlier, that is not a problem. Move locks and data of different allocators into different cache lines to avoid false sharing. Group spa_alloc_* arrays together into single array of aligned struct spa_alloc spa_allocs. Align struct metaslab_class_allocator. Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Don Brady <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Closes openzfs#12314

Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Issue openzfs#12314 Closes openzfs#12419

Remove mc_lock use from metaslab_class_throttle_*(). The math there is based on refcounts and so atomic, so the only race possible there is between zfs_refcount_count() and zfs_refcount_add(). But in most cases metaslab_class_throttle_reserve() is called with the allocator lock held, which covers the race. In cases where the lock is not held, GANG_ALLOCATION() or METASLAB_MUST_RESERVE are set, and so we do not use zfs_refcount_count(). And even if we assume some other non-existing scenario, the worst that may happen from this race is few more I/Os get to allocation earlier, that is not a problem. Move locks and data of different allocators into different cache lines to avoid false sharing. Group spa_alloc_* arrays together into single array of aligned struct spa_alloc spa_allocs. Align struct metaslab_class_allocator. Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Don Brady <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Closes openzfs#12314

amotin added Type: Performance Performance improvement or performance problem Status: Code Review Needed Ready for review and testing labels Jul 2, 2021

amotin requested review from pcd1193182 and don-brady July 2, 2021 02:01

amotin force-pushed the throt branch 3 times, most recently from 64e01b9 to 10b1064 Compare July 2, 2021 02:27

ahrens assigned mmaybee Jul 9, 2021

ahrens requested a review from grwilson July 9, 2021 04:01

ghost approved these changes Jul 15, 2021

View reviewed changes

module/zfs/zio.c Show resolved Hide resolved

pcd1193182 approved these changes Jul 16, 2021

View reviewed changes

include/sys/spa_impl.h Outdated Show resolved Hide resolved

amotin force-pushed the throt branch from 10b1064 to 5253d6f Compare July 16, 2021 18:18

don-brady approved these changes Jul 16, 2021

View reviewed changes

mmaybee added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Jul 20, 2021

mmaybee merged commit 1b50749 into openzfs:master Jul 21, 2021

grwilson reviewed Jul 22, 2021

View reviewed changes

module/zfs/metaslab.c Show resolved Hide resolved

amotin mentioned this pull request Jul 22, 2021

Add comment on metaslab_class_throttle_reserve() locking. #12419

Merged

13 tasks

behlendorf pushed a commit that referenced this pull request Jul 26, 2021

Add comment on metaslab_class_throttle_reserve() locking

dd3bda3

Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Issue #12314 Closes #12419

behlendorf pushed a commit that referenced this pull request Aug 31, 2021

Add comment on metaslab_class_throttle_reserve() locking

1c0a02d

Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Issue #12314 Closes #12419

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize allocation throttling. #12314

Optimize allocation throttling. #12314

Uh oh!

amotin commented Jul 2, 2021 •

edited

Loading

Uh oh!

amotin commented Jul 15, 2021

Uh oh!

Uh oh!

IsaacVaughn commented Jul 16, 2021

Uh oh!

amotin commented Jul 16, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Optimize allocation throttling. #12314

Optimize allocation throttling. #12314

Uh oh!

Conversation

amotin commented Jul 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How Has This Been Tested?

Types of changes

Checklist:

Uh oh!

amotin commented Jul 15, 2021

Uh oh!

Uh oh!

IsaacVaughn commented Jul 16, 2021

Uh oh!

amotin commented Jul 16, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

amotin commented Jul 2, 2021 •

edited

Loading