Skip to content

Optimize allocation throttling. #12314

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 21, 2021
Merged

Optimize allocation throttling. #12314

merged 1 commit into from
Jul 21, 2021

Conversation

amotin
Copy link
Member

@amotin amotin commented Jul 2, 2021

Remove mc_lock use from metaslab_class_throttle_*(). The math there
is based on refcounts and so atomic, so the only race possible there
is between zfs_refcount_count() and zfs_refcount_add(). But in most
cases metaslab_class_throttle_reserve() is called with the allocator
lock held, which covers the race. In cases where the lock is not
held, GANG_ALLOCATION() or METASLAB_MUST_RESERVE are set, and so we
do not use zfs_refcount_count(). And even if we assume some other
non-existing scenario, the worst that may happen from this race is
few more I/Os get to allocation earlier, that is not a problem.

Move locks and data of different allocators into different cache
lines to avoid false sharing. Group spa_alloc_* arrays together
into single array of aligned struct spa_alloc spa_allocs. Align
struct metaslab_class_allocator.

How Has This Been Tested?

On 80-thread FreeBSD system doing ~220K 16KB ZVOLs writes profiler shows reduction of lock contention on allocator locks from 7.3% to 5.8%. In case of file writes, where parallel dnode sync more actively uses multiple allocators, the difference must be more dramatic, since mc_lock is global and not per-allocator.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@amotin amotin added Type: Performance Performance improvement or performance problem Status: Code Review Needed Ready for review and testing labels Jul 2, 2021
@amotin amotin requested review from pcd1193182 and don-brady July 2, 2021 02:01
@amotin amotin force-pushed the throt branch 3 times, most recently from 64e01b9 to 10b1064 Compare July 2, 2021 02:27
@ahrens ahrens requested a review from grwilson July 9, 2021 04:01
@amotin
Copy link
Member Author

amotin commented Jul 15, 2021

Can somebody comment on this? It is open for two weeks and it is not a rocket science.

@IsaacVaughn
Copy link

Is there any further improvement if you switch zfs_refcount_add to zfs_refcount_add_many and zfs_refcount_remove to zfs_refcount_remove_many? There is a comment claiming that refs must be added individually to be removed individually, but this seems contradictory to the implementation of zfs_refcount as an atomic uint64. Perhaps the slots were previously implemented in a different manner?

Also, perhaps add a comment somewhere documenting the possible racy behavior if the functions are called with no allocation lock and neither GANG_ALLOCATION or METASLAB_MUST_RESERVE set. Although the race does appear harmless, it seems a potential pitfall for future developers who may be surprised at the resulting behavior.

@amotin
Copy link
Member Author

amotin commented Jul 16, 2021

Is there any further improvement if you switch zfs_refcount_add to zfs_refcount_add_many and zfs_refcount_remove to zfs_refcount_remove_many? There is a comment claiming that refs must be added individually to be removed individually, but this seems contradictory to the implementation of zfs_refcount as an atomic uint64.

It would be good, but present implementation is required for debug builds, in which the refcount code can be made to match allocations with frees to detect any mismatches, and _many() would not match. But it affects only blocks with multiple copies, so the difference should not be very dramatic.

Remove mc_lock use from metaslab_class_throttle_*().  The math there
is based on refcounts and so atomic, so the only race possible there
is between zfs_refcount_count() and zfs_refcount_add().  But in most
cases metaslab_class_throttle_reserve() is called with the allocator
lock held, which covers the race.  In cases where the lock is not
held, GANG_ALLOCATION() or METASLAB_MUST_RESERVE are set, and so we
do not use zfs_refcount_count().  And even if we assume some other
non-existing scenario, the worst that may happen from this race is
few more I/Os get to allocation earlier, that is not a problem.

Move locks and data of different allocators into different cache
lines to avoid false sharing.  Group spa_alloc_* arrays together
into single array of aligned struct spa_alloc spa_allocs.  Align
struct metaslab_class_allocator.

Signed-off-by: Alexander Motin <[email protected]>
Sponsored-By: iXsystems, Inc.
@mmaybee mmaybee added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Jul 20, 2021
@mmaybee mmaybee merged commit 1b50749 into openzfs:master Jul 21, 2021
behlendorf pushed a commit that referenced this pull request Jul 26, 2021
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Issue #12314
Closes #12419
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Aug 24, 2021
Remove mc_lock use from metaslab_class_throttle_*().  The math there
is based on refcounts and so atomic, so the only race possible there
is between zfs_refcount_count() and zfs_refcount_add().  But in most
cases metaslab_class_throttle_reserve() is called with the allocator
lock held, which covers the race.  In cases where the lock is not
held, GANG_ALLOCATION() or METASLAB_MUST_RESERVE are set, and so we
do not use zfs_refcount_count().  And even if we assume some other
non-existing scenario, the worst that may happen from this race is
few more I/Os get to allocation earlier, that is not a problem.

Move locks and data of different allocators into different cache
lines to avoid false sharing.  Group spa_alloc_* arrays together
into single array of aligned struct spa_alloc spa_allocs.  Align
struct metaslab_class_allocator.

Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Reviewed-by: Don Brady <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Sponsored-By: iXsystems, Inc.
Closes openzfs#12314
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Aug 24, 2021
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Issue openzfs#12314
Closes openzfs#12419
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Aug 24, 2021
Remove mc_lock use from metaslab_class_throttle_*().  The math there
is based on refcounts and so atomic, so the only race possible there
is between zfs_refcount_count() and zfs_refcount_add().  But in most
cases metaslab_class_throttle_reserve() is called with the allocator
lock held, which covers the race.  In cases where the lock is not
held, GANG_ALLOCATION() or METASLAB_MUST_RESERVE are set, and so we
do not use zfs_refcount_count().  And even if we assume some other
non-existing scenario, the worst that may happen from this race is
few more I/Os get to allocation earlier, that is not a problem.

Move locks and data of different allocators into different cache
lines to avoid false sharing.  Group spa_alloc_* arrays together
into single array of aligned struct spa_alloc spa_allocs.  Align
struct metaslab_class_allocator.

Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Reviewed-by: Don Brady <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Closes openzfs#12314
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Aug 24, 2021
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Issue openzfs#12314
Closes openzfs#12419
behlendorf pushed a commit that referenced this pull request Aug 31, 2021
Remove mc_lock use from metaslab_class_throttle_*().  The math there
is based on refcounts and so atomic, so the only race possible there
is between zfs_refcount_count() and zfs_refcount_add().  But in most
cases metaslab_class_throttle_reserve() is called with the allocator
lock held, which covers the race.  In cases where the lock is not
held, GANG_ALLOCATION() or METASLAB_MUST_RESERVE are set, and so we
do not use zfs_refcount_count().  And even if we assume some other
non-existing scenario, the worst that may happen from this race is
few more I/Os get to allocation earlier, that is not a problem.

Move locks and data of different allocators into different cache
lines to avoid false sharing.  Group spa_alloc_* arrays together
into single array of aligned struct spa_alloc spa_allocs.  Align
struct metaslab_class_allocator.

Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Reviewed-by: Don Brady <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Closes #12314
behlendorf pushed a commit that referenced this pull request Aug 31, 2021
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Issue #12314
Closes #12419
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Sep 15, 2021
Remove mc_lock use from metaslab_class_throttle_*().  The math there
is based on refcounts and so atomic, so the only race possible there
is between zfs_refcount_count() and zfs_refcount_add().  But in most
cases metaslab_class_throttle_reserve() is called with the allocator
lock held, which covers the race.  In cases where the lock is not
held, GANG_ALLOCATION() or METASLAB_MUST_RESERVE are set, and so we
do not use zfs_refcount_count().  And even if we assume some other
non-existing scenario, the worst that may happen from this race is
few more I/Os get to allocation earlier, that is not a problem.

Move locks and data of different allocators into different cache
lines to avoid false sharing.  Group spa_alloc_* arrays together
into single array of aligned struct spa_alloc spa_allocs.  Align
struct metaslab_class_allocator.

Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Reviewed-by: Don Brady <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Closes openzfs#12314
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Sep 15, 2021
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Issue openzfs#12314
Closes openzfs#12419
datacore-rm pushed a commit to DataCoreSoftware/openzfs that referenced this pull request Feb 9, 2024
Remove mc_lock use from metaslab_class_throttle_*().  The math there
is based on refcounts and so atomic, so the only race possible there
is between zfs_refcount_count() and zfs_refcount_add().  But in most
cases metaslab_class_throttle_reserve() is called with the allocator
lock held, which covers the race.  In cases where the lock is not
held, GANG_ALLOCATION() or METASLAB_MUST_RESERVE are set, and so we
do not use zfs_refcount_count().  And even if we assume some other
non-existing scenario, the worst that may happen from this race is
few more I/Os get to allocation earlier, that is not a problem.

Move locks and data of different allocators into different cache
lines to avoid false sharing.  Group spa_alloc_* arrays together
into single array of aligned struct spa_alloc spa_allocs.  Align
struct metaslab_class_allocator.

Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Reviewed-by: Don Brady <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Sponsored-By: iXsystems, Inc.
Closes openzfs#12314
datacore-rm pushed a commit to DataCoreSoftware/openzfs that referenced this pull request Feb 28, 2024
Remove mc_lock use from metaslab_class_throttle_*().  The math there
is based on refcounts and so atomic, so the only race possible there
is between zfs_refcount_count() and zfs_refcount_add().  But in most
cases metaslab_class_throttle_reserve() is called with the allocator
lock held, which covers the race.  In cases where the lock is not
held, GANG_ALLOCATION() or METASLAB_MUST_RESERVE are set, and so we
do not use zfs_refcount_count().  And even if we assume some other
non-existing scenario, the worst that may happen from this race is
few more I/Os get to allocation earlier, that is not a problem.

Move locks and data of different allocators into different cache
lines to avoid false sharing.  Group spa_alloc_* arrays together
into single array of aligned struct spa_alloc spa_allocs.  Align
struct metaslab_class_allocator.

Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Reviewed-by: Don Brady <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Sponsored-By: iXsystems, Inc.
Closes openzfs#12314
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested) Type: Performance Performance improvement or performance problem
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants