Skip to content

assertion failed in arc_wait_for_eviction() #11397

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 8, 2021

Conversation

ahrens
Copy link
Member

@ahrens ahrens commented Dec 23, 2020

Motivation and Context

If the system is very low on memory (specifically,
arc_free_memory() < arc_sys_free/2, i.e. less than 1/16th of RAM
free), arc_evict_state_impl() will defer wakups. In this case, the
arc_evict_waiter_t's remain on the list, even though arc_evict_count
has been incremented past their aew_count.

The problem is that arc_wait_for_eviction() assumes that if there are
waiters on the list, the count they are waiting for has not yet been
reached. However, the deferred wakeups may violate this, causing
ASSERT(last->aew_count > arc_evict_count) to fail.

Closes #11285

Description

This commit resolves the issue by having new waiters use the greater of
arc_evict_count and the last aew_count.

cc @gamanakis

How Has This Been Tested?

I was able to reproduce the issue reliably in conjunction with some other changes, while running the zfs test suite on a system with 7 GB RAM. With the fix, the issue is not reproduced.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@ahrens ahrens added the Status: Code Review Needed Ready for review and testing label Dec 23, 2020
@ahrens ahrens requested a review from grwilson December 23, 2020 19:14
@behlendorf behlendorf mentioned this pull request Dec 23, 2020
13 tasks
@gamanakis
Copy link
Contributor

Thank you for addressing this, it resolves the panic seen in #11285.

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Dec 29, 2020
If the system is very low on memory (specifically,
`arc_free_memory() < arc_sys_free/2`, i.e. less than 1/16th of RAM
free), `arc_evict_state_impl()` will defer wakups.  In this case, the
arc_evict_waiter_t's remain on the list, even though `arc_evict_count`
has been incremented past their `aew_count`.

The problem is that `arc_wait_for_eviction()` assumes that if there are
waiters on the list, the count they are waiting for has not yet been
reached.  However, the deferred wakeups may violate this, causing
`ASSERT(last->aew_count > arc_evict_count)` to fail.

This commit resolves the issue by having new waiters use the greater of
`arc_evict_count` and the last `aew_count`.

Signed-off-by: Matthew Ahrens <[email protected]>
Closes openzfs#11285
@behlendorf behlendorf merged commit dc303dc into openzfs:master Jan 8, 2021
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Jan 22, 2021
If the system is very low on memory (specifically,
`arc_free_memory() < arc_sys_free/2`, i.e. less than 1/16th of RAM
free), `arc_evict_state_impl()` will defer wakups.  In this case, the
arc_evict_waiter_t's remain on the list, even though `arc_evict_count`
has been incremented past their `aew_count`.

The problem is that `arc_wait_for_eviction()` assumes that if there are
waiters on the list, the count they are waiting for has not yet been
reached.  However, the deferred wakeups may violate this, causing
`ASSERT(last->aew_count > arc_evict_count)` to fail.

This commit resolves the issue by having new waiters use the greater of
`arc_evict_count` and the last `aew_count`.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Wilson <[email protected]>
Reviewed-by: George Amanakis <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Closes openzfs#11285
Closes openzfs#11397
behlendorf pushed a commit that referenced this pull request Jan 23, 2021
If the system is very low on memory (specifically,
`arc_free_memory() < arc_sys_free/2`, i.e. less than 1/16th of RAM
free), `arc_evict_state_impl()` will defer wakups.  In this case, the
arc_evict_waiter_t's remain on the list, even though `arc_evict_count`
has been incremented past their `aew_count`.

The problem is that `arc_wait_for_eviction()` assumes that if there are
waiters on the list, the count they are waiting for has not yet been
reached.  However, the deferred wakeups may violate this, causing
`ASSERT(last->aew_count > arc_evict_count)` to fail.

This commit resolves the issue by having new waiters use the greater of
`arc_evict_count` and the last `aew_count`.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Wilson <[email protected]>
Reviewed-by: George Amanakis <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Closes #11285
Closes #11397
jsai20 pushed a commit to jsai20/zfs that referenced this pull request Mar 30, 2021
If the system is very low on memory (specifically,
`arc_free_memory() < arc_sys_free/2`, i.e. less than 1/16th of RAM
free), `arc_evict_state_impl()` will defer wakups.  In this case, the
arc_evict_waiter_t's remain on the list, even though `arc_evict_count`
has been incremented past their `aew_count`.

The problem is that `arc_wait_for_eviction()` assumes that if there are
waiters on the list, the count they are waiting for has not yet been
reached.  However, the deferred wakeups may violate this, causing
`ASSERT(last->aew_count > arc_evict_count)` to fail.

This commit resolves the issue by having new waiters use the greater of
`arc_evict_count` and the last `aew_count`.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Wilson <[email protected]>
Reviewed-by: George Amanakis <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Closes openzfs#11285
Closes openzfs#11397
sempervictus pushed a commit to sempervictus/zfs that referenced this pull request May 31, 2021
If the system is very low on memory (specifically,
`arc_free_memory() < arc_sys_free/2`, i.e. less than 1/16th of RAM
free), `arc_evict_state_impl()` will defer wakups.  In this case, the
arc_evict_waiter_t's remain on the list, even though `arc_evict_count`
has been incremented past their `aew_count`.

The problem is that `arc_wait_for_eviction()` assumes that if there are
waiters on the list, the count they are waiting for has not yet been
reached.  However, the deferred wakeups may violate this, causing
`ASSERT(last->aew_count > arc_evict_count)` to fail.

This commit resolves the issue by having new waiters use the greater of
`arc_evict_count` and the last `aew_count`.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Wilson <[email protected]>
Reviewed-by: George Amanakis <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Closes openzfs#11285
Closes openzfs#11397
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Memory pressure may result in violated assertion in arc_wait_for_eviction()
4 participants