Skip to content

do a cyclic scan for orphan objects in zstd memory pool #10969

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 30, 2020

Conversation

BrainSlayer
Copy link
Contributor

@BrainSlayer BrainSlayer commented Sep 23, 2020

in non regular usecases allocated memory might stay persistent
in memory pool. this small patch checks every minutes if there are
old objects which can be released from memory pool.

right now with regular use, the pool is checked for old objects on each allocation attempt from this pool. so basicly polling by its use. now consider what happens if someone writes alot of files and stops use of the volume or even unmounts it. so the code will no longer check if objects can be released from the pool. so what has been allocated still stays in pool cache. this is no big issue for common use. but someone discovered this issue while doing tests. personally i know this behaviour and i'm aware of it. its no big issue. just a enhancement. see also #10938

covers #10938

Signed-off-by: Sebastian Gottschall [email protected]

Motivation and Context

Description

How Has This Been Tested?

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

@mschilli87
Copy link
Contributor

@BrainSlayer:

in non regular usecases allocated memory might stay persistent in memory pool.

Why is that? Do we know? Don't we care?

this small patch checks every minutes if there are old objects which can be released from memory pool.

This seems a bit hacky IMHO. Wouldn't the proper fix be to investigate why the memory ends up orphaned in the first place and ensure it doesn't?
If you want this as a fallback safeguard for practical reasons, I think it should at least issue a warning when it has to release memory 'forgotten' previously. Otherwise this could end up masking more severe future issues with 'proper' memory release.

@BrainSlayer
Copy link
Contributor Author

@mschilli87 let me explain. right now with regular use, the pool is checked for old objects on each allocation attempt from this pool. so basicly polling by its use. now consider what happens if someone writes alot of files and stops use of the volume or even unmounts it. so the code will no longer check if objects can be released from the pool. so what has been allocated still stays in pool cache. this is no big issue for common use. but someone discovered this issue while doing tests. personally i know this behaviour and i'm aware of it. its no big issue. just a enhancement. see also #10938

@mschilli87
Copy link
Contributor

@BrainSlayer: Thank you for the explanation. My two cents:

  1. This explanation should be in the commit message to document the need for that change.
  2. I'd still prefer a fix more closer to the root cause. Why not trigger this check when umounting a volume instead of polling regularly in case it got unmounted? Do we keep pooling unused volumes even after freeing the leftover memory? Or could we stop considering this volume after dealing with it once (within a minute after stopping to use it) and add it back to our list once it get's used again (e.g. in the polling by usage code you mentioned)?

@BrainSlayer
Copy link
Contributor Author

BrainSlayer commented Sep 23, 2020

copied the my explaination to the title. will add it later to the commit message after testrun is done

regarding the check procedure. zstd is just a algorithm. it doesnt know of any higher level operation and does not know if a volume is in use, mounted or unmounted. the pool is also not used per volume, but used for the algorithm which can be used by multiple volumes of course. and yes the polling remains also if all is freed. but it only happens once in a minute and does not cause any measureable performance impact since the code which is executed is very small for checking. forget about thinking in volumes. just think about a compression algorithm which is available globally for everything you do. the case i mentioned was just a example. if a any volume is in use and you just dont write or isnt in use, the preallocated ram stays allocated without that patch. so its basicly just a optimization to reduce memory usage if everything related to zstd is in idle state. of course i can stop the thread after nothing is left and restart it on first new allocation. but this is a more complicated handling and may cause a latency spike on first allocation attempt. keep it simple as possible is the best way from my oppinion. stopping and restarting threads needs also special handling to avoid race conditions in this threaded context. the current simple approach is thread safe

@mschilli87
Copy link
Contributor

That's a good enough explanation for me, as long as it makes it into the commit message and/or a comment in the source code itself. Thanks for taking the time to walk me through.

Copy link
Contributor

@PrivatePuffin PrivatePuffin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good with a slight nitpick on the note not refering to the 60 seconds.

Ping for @c0d3z3r0 and @allanjude to check this out if they want.

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Sep 23, 2020
@richardelling
Copy link
Contributor

ISTM that this is the wrong approach. We know when an objectset is removed from the system (umount, destroy, close) and we already have async clean to clear its data from ARC. Why not just extend it to also clear out the zstd memory pool?

@allanjude
Copy link
Contributor

ISTM that this is the wrong approach. We know when an objectset is removed from the system (umount, destroy, close) and we already have async clean to clear its data from ARC. Why not just extend it to also clear out the zstd memory pool?

This is a pool of memory allocations used for compression. It is not tied to any specific objset. It is just that after a period of high activity, the amount of memory used by this pool may be high, and it doesn't shrink when there is no activity.

@mschilli87
Copy link
Contributor

@allanjude: That is what confused me at well until @BrainSlayer explained it.
@richardelling: Good to know I am not alone. 😉

I think this whole discussion just stresses that the change suggested here should come with good documentation.

@BrainSlayer
Copy link
Contributor Author

ISTM that this is the wrong approach. We know when an objectset is removed from the system (umount, destroy, close) and we already have async clean to clear its data from ARC. Why not just extend it to also clear out the zstd memory pool?

circular dependency. no call to zstd code can be made from functions within zfs. zzstd must be loaded before zfs is loaded. such solutions are only possible with installable hooks

@BrainSlayer
Copy link
Contributor Author

@allanjude: That is what confused me at well until @BrainSlayer explained it.
@richardelling: Good to know I am not alone. 😉

I think this whole discussion just stresses that the change suggested here should come with good documentation.

according to what he wrote, he did understand very well how it works.

@mschilli87

This comment has been minimized.

@BrainSlayer BrainSlayer force-pushed the liberator branch 3 times, most recently from 2b289f3 to be616dc Compare September 24, 2020 12:29
@BrainSlayer
Copy link
Contributor Author

@behlendorf please check the new solution for handling the memory cleanup.

@BrainSlayer BrainSlayer force-pushed the liberator branch 3 times, most recently from 211545a to aa33676 Compare September 29, 2020 10:51
@BrainSlayer
Copy link
Contributor Author

@behlendorf regarding the way when to reclaim space and when not. the pool allocator is self managing. waiting until memory is low, if avoidable is not a good way from my point of view. in our testcase we had about 1.5 gb unused memory left. thats no small piece. releasing it if not in use for a while avoids such situations which will slow down zfs. so if avoidable without performance impact is a good approach

in non regular usecases allocated memory might stay persistent in memory
pool. this small patch checks every minute if there are old object which
can be released from memory pool.

right now with regular use, the pool is checked for old objects on each
allocation attempt from this pool. so basicly polling by its use. now
consider what happens if someone writes alot of files and stops use of
the volume or even unmounts it. so the code will no longer check if
objects can be released from the pool. already allocated objects will
still stay in pool cache. this is no big issue for common use. but
someone discovered this issue while doing tests. personally i know this
behaviour and i'm aware of it. its no big issue. just a enhancement

Signed-off-by: Sebastian Gottschall <[email protected]>
@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Sep 30, 2020
@behlendorf behlendorf merged commit 8a171cc into openzfs:master Sep 30, 2020
behlendorf pushed a commit that referenced this pull request Oct 1, 2020
In non regular use cases allocated memory might stay persistent in memory
pool. This small patch checks every minute if there are old objects which
can be released from memory pool.

Right now with regular use, the pool is checked for old objects on each
allocation attempt from this pool. so basically polling by its use. Now
consider what happens if someone writes a lot of files and stops use of
the volume or even unmounts it. So the code will no longer check if
objects can be released from the pool. Already allocated objects will
still stay in pool cache. this is no big issue for common use. But
someone discovered this issue while doing tests. personally i know this
behavior and I'm aware of it. Its no big issue. just a enhancement

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Kjeld Schouten-Lebbing <[email protected]>
Signed-off-by: Sebastian Gottschall <[email protected]>
Closes #10938
Closes #10969
jsai20 pushed a commit to jsai20/zfs that referenced this pull request Mar 30, 2021
In non regular use cases allocated memory might stay persistent in memory
pool. This small patch checks every minute if there are old objects which
can be released from memory pool.

Right now with regular use, the pool is checked for old objects on each
allocation attempt from this pool. so basically polling by its use. Now
consider what happens if someone writes a lot of files and stops use of
the volume or even unmounts it. So the code will no longer check if
objects can be released from the pool. Already allocated objects will
still stay in pool cache. this is no big issue for common use. But
someone discovered this issue while doing tests. personally i know this
behavior and I'm aware of it. Its no big issue. just a enhancement

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Kjeld Schouten-Lebbing <[email protected]>
Signed-off-by: Sebastian Gottschall <[email protected]>
Closes openzfs#10938 
Closes openzfs#10969
sempervictus pushed a commit to sempervictus/zfs that referenced this pull request May 31, 2021
In non regular use cases allocated memory might stay persistent in memory
pool. This small patch checks every minute if there are old objects which
can be released from memory pool.

Right now with regular use, the pool is checked for old objects on each
allocation attempt from this pool. so basically polling by its use. Now
consider what happens if someone writes a lot of files and stops use of
the volume or even unmounts it. So the code will no longer check if
objects can be released from the pool. Already allocated objects will
still stay in pool cache. this is no big issue for common use. But
someone discovered this issue while doing tests. personally i know this
behavior and I'm aware of it. Its no big issue. just a enhancement

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Kjeld Schouten-Lebbing <[email protected]>
Signed-off-by: Sebastian Gottschall <[email protected]>
Closes openzfs#10938 
Closes openzfs#10969
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants