Skip to content

Fix abd leak, kmem_free correct size of abd_t #12295

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 1, 2021

Conversation

lundman
Copy link
Contributor

@lundman lundman commented Jun 29, 2021

... for macOS and Freebsd, and improve macOS abd performance (#56)

(macOS continuation to this is in 47b408f)

Motivation and Context

Freeing incorrect size will trigger panic in SPL layer on macOS/illumos. Linux and FreeBSD unfortunately do not
use the size component in free and is not affected by freeing the wrong size.

This PR has considerable number of debug-hours in it.

Description

  • Cleanup in macos abd_os.c to fix abd_t leak

Fix a leak of abd_t that manifested mostly when using
raidzN with at least as many columns as N (e.g. a
four-disk raidz2 but not a three-disk raidz2).
Sufficiently heavy raidz use would eventually run a system
out of memory.

The leak was introduced as a fix for a panic caused by
calculating the wrong size of an abd_t at free time if the
abd_t had been made using abd_get_offset_impl, since it
carried along the unnecessary tails of large ABDs, leading
to a mismatch between abd->abd_size and the original
allocation size of the abd_t. This would feed kmem_free a
bad size, which produces a heap corruption panic.

The fix now carries only the necessary chunk pointers,
leading to smaller abd_ts (especially those of
abd_get_zeros() ABDs) and a performance gain from the
reduction in copying and allocation activity.

We now calculate the correct size for the abd_t at free time.

This requires passing the number of bytes wanted in a
scatter ABD to abd_get_offset_scatter().

Additionally:

  • Switch abd_cache arena to FIRSTFIT, which empirically
    improves perofrmance.

  • Make abd_chunk_cache more performant and debuggable.

  • Allocate the abd_zero_buf from abd_chunk_cache rather
    than the heap.

  • Don't try to reap non-existent qcaches in abd_cache arena.

  • KM_PUSHPAGE->KM_SLEEP when allocating chunks from their
    own arena

  • having fixed the abd leaks, return to using KMF_LITE,
    but leave a commented example of audit kmem debugging

  • having made this work, abd_orig_size is no longer needed
    as a way to track the size originally kmem_zalloc-ed for
    a scatter abd_t

  • Update FreeBSD abd_os.c with the fix, and let Linux build

  • Minimal change to fix FreeBSD's abd_get_offset_scatter()
    carrying too many chunks for the desired ABD size

  • A size argument is added to abd_get_offset_scatter() for
    FreeBSD and macOS, which is unused by Linux

How Has This Been Tested?

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

Copy link
Member

@amotin amotin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, but seems like you may need to rebase to fix FreeBSD head build.

... for macOS and Freebsd, and improve macOS abd performance (#56)

(macOS continuation to this is in 47b408f)

* Cleanup in macos abd_os.c to fix abd_t leak

Fix a leak of abd_t that manifested mostly when using
raidzN with at least as many columns as N (e.g. a
four-disk raidz2 but not a three-disk raidz2).
Sufficiently heavy raidz use would eventually run a system
out of memory.

The leak was introduced as a fix for a panic caused by
calculating the wrong size of an abd_t at free time if the
abd_t had been made using abd_get_offset_impl, since it
carried along the unnecessary tails of large ABDs, leading
to a mismatch between abd->abd_size and the original
allocation size of the abd_t.  This would feed kmem_free a
bad size, which produces a heap corruption panic.

The fix now carries only the necessary chunk pointers,
leading to smaller abd_ts (especially those of
abd_get_zeros() ABDs) and a performance gain from the
reduction in copying and allocation activity.

We now calculate the correct size for the abd_t at free time.

This requires passing the number of bytes wanted in a
scatter ABD to abd_get_offset_scatter().

Additionally:

* Switch abd_cache arena to FIRSTFIT, which empirically
improves perofrmance.

* Make abd_chunk_cache more performant and debuggable.

* Allocate the abd_zero_buf from abd_chunk_cache rather
than the heap.

* Don't try to reap non-existent qcaches in abd_cache arena.

* KM_PUSHPAGE->KM_SLEEP when allocating chunks from their
own arena

- having fixed the abd leaks, return to using KMF_LITE,
but leave a commented example of audit kmem debugging

- having made this work, abd_orig_size is no longer needed
as a way to track the size originally kmem_zalloc-ed for
a scatter abd_t

* Update FreeBSD abd_os.c with the fix, and let Linux build

* Minimal change to fix FreeBSD's abd_get_offset_scatter()
carrying too many chunks for the desired ABD size

* A size argument is added to abd_get_offset_scatter() for
FreeBSD and macOS, which is unused by Linux

Signed-off-by: Jorgen Lundman <[email protected]>
@mmaybee mmaybee added the Status: Accepted Ready to integrate (reviewed, tested) label Jun 29, 2021
@mmaybee mmaybee merged commit c6d1112 into openzfs:master Jul 1, 2021
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Aug 23, 2021
Fix a leak of abd_t that manifested mostly when using
raidzN with at least as many columns as N (e.g. a
four-disk raidz2 but not a three-disk raidz2).
Sufficiently heavy raidz use would eventually run a system
out of memory.

Additionally:

* Switch abd_cache arena to FIRSTFIT, which empirically
improves perofrmance.

* Make abd_chunk_cache more performant and debuggable.

* Allocate the abd_zero_buf from abd_chunk_cache rather
than the heap.

* Don't try to reap non-existent qcaches in abd_cache arena.

* KM_PUSHPAGE->KM_SLEEP when allocating chunks from their
own arena

Reviewed-by: Matthew Ahrens <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Jorgen Lundman <[email protected]>
Co-authored-by: Sean Doran <[email protected]>
Closes openzfs#12295
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Aug 24, 2021
Fix a leak of abd_t that manifested mostly when using
raidzN with at least as many columns as N (e.g. a
four-disk raidz2 but not a three-disk raidz2).
Sufficiently heavy raidz use would eventually run a system
out of memory.

Additionally:

* Switch abd_cache arena to FIRSTFIT, which empirically
improves perofrmance.

* Make abd_chunk_cache more performant and debuggable.

* Allocate the abd_zero_buf from abd_chunk_cache rather
than the heap.

* Don't try to reap non-existent qcaches in abd_cache arena.

* KM_PUSHPAGE->KM_SLEEP when allocating chunks from their
own arena

Reviewed-by: Matthew Ahrens <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Jorgen Lundman <[email protected]>
Co-authored-by: Sean Doran <[email protected]>
Closes openzfs#12295
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Aug 24, 2021
Fix a leak of abd_t that manifested mostly when using
raidzN with at least as many columns as N (e.g. a
four-disk raidz2 but not a three-disk raidz2).
Sufficiently heavy raidz use would eventually run a system
out of memory.

Additionally:

* Switch abd_cache arena to FIRSTFIT, which empirically
improves perofrmance.

* Make abd_chunk_cache more performant and debuggable.

* Allocate the abd_zero_buf from abd_chunk_cache rather
than the heap.

* Don't try to reap non-existent qcaches in abd_cache arena.

* KM_PUSHPAGE->KM_SLEEP when allocating chunks from their
own arena

Reviewed-by: Matthew Ahrens <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Jorgen Lundman <[email protected]>
Co-authored-by: Sean Doran <[email protected]>
Closes openzfs#12295
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Aug 24, 2021
Fix a leak of abd_t that manifested mostly when using
raidzN with at least as many columns as N (e.g. a
four-disk raidz2 but not a three-disk raidz2).
Sufficiently heavy raidz use would eventually run a system
out of memory.

Additionally:

* Switch abd_cache arena to FIRSTFIT, which empirically
improves perofrmance.

* Make abd_chunk_cache more performant and debuggable.

* Allocate the abd_zero_buf from abd_chunk_cache rather
than the heap.

* Don't try to reap non-existent qcaches in abd_cache arena.

* KM_PUSHPAGE->KM_SLEEP when allocating chunks from their
own arena

Reviewed-by: Matthew Ahrens <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Jorgen Lundman <[email protected]>
Co-authored-by: Sean Doran <[email protected]>
Closes openzfs#12295
behlendorf pushed a commit that referenced this pull request Aug 31, 2021
Fix a leak of abd_t that manifested mostly when using
raidzN with at least as many columns as N (e.g. a
four-disk raidz2 but not a three-disk raidz2).
Sufficiently heavy raidz use would eventually run a system
out of memory.

Additionally:

* Switch abd_cache arena to FIRSTFIT, which empirically
improves perofrmance.

* Make abd_chunk_cache more performant and debuggable.

* Allocate the abd_zero_buf from abd_chunk_cache rather
than the heap.

* Don't try to reap non-existent qcaches in abd_cache arena.

* KM_PUSHPAGE->KM_SLEEP when allocating chunks from their
own arena

Reviewed-by: Matthew Ahrens <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Jorgen Lundman <[email protected]>
Co-authored-by: Sean Doran <[email protected]>
Closes #12295
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Sep 15, 2021
Fix a leak of abd_t that manifested mostly when using
raidzN with at least as many columns as N (e.g. a
four-disk raidz2 but not a three-disk raidz2).
Sufficiently heavy raidz use would eventually run a system
out of memory.

Additionally:

* Switch abd_cache arena to FIRSTFIT, which empirically
improves perofrmance.

* Make abd_chunk_cache more performant and debuggable.

* Allocate the abd_zero_buf from abd_chunk_cache rather
than the heap.

* Don't try to reap non-existent qcaches in abd_cache arena.

* KM_PUSHPAGE->KM_SLEEP when allocating chunks from their
own arena

Reviewed-by: Matthew Ahrens <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Jorgen Lundman <[email protected]>
Co-authored-by: Sean Doran <[email protected]>
Closes openzfs#12295
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants