Skip to content

optimize locking checks in mempool allocator #11126

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 2, 2020

Conversation

BrainSlayer
Copy link
Contributor

@BrainSlayer BrainSlayer commented Oct 29, 2020

avoid checking the whole array of objects each time by removing the self
organized memory reaping. this can be managed by the global memory reap
callback which is called every 60 seconds. this will reduce the use of
locking operations significant. in addition the memory release function
has been optimized to use only lockings where really needed

Signed-off-by: Sebastian Gottschall [email protected]

Motivation and Context

Description

How Has This Been Tested?

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

@BrainSlayer
Copy link
Contributor Author

just a draft right now for checking

@PrivatePuffin
Copy link
Contributor

Nice work @BrainSlayer Looking forward to the benefids! :)

@mjguzik
Copy link
Contributor

mjguzik commented Oct 29, 2020

I think this is a nice step forward. Regardless, I think it should wait for #11129 and then get rebased.

@PrivatePuffin
Copy link
Contributor

I think this is a nice step forward. Regardless, I think it should wait for #11129 and then get rebased.

Considering the state of this draft and #11129 i'm pretty sure that issue will solve itself ;)

@BrainSlayer BrainSlayer marked this pull request as ready for review October 30, 2020 08:29
@mjguzik
Copy link
Contributor

mjguzik commented Oct 31, 2020

I'm not going to argue anymore about this patch. I think the commits combined provide a marked improvement and in particular the one i was looking for (overhead avoided if zstd is not in use). However, I presume someone else will have to accept this PR.

@BrainSlayer
Copy link
Contributor Author

@mjguzik why someone else?

@mjguzik
Copy link
Contributor

mjguzik commented Oct 31, 2020

@behlendorf I think the patch (once it finalizes) should land in 2.0. The zfs_zstd_cache_reap_now routine is called from arc_reap_cb_check regardless of zstd being in use or not. Without the patch the zstd shrinker walks the entire array trylocking each element. Since the array is sized to 4 * ncpus this amasses to significant single-threaded overhead on bigger machines (for example my test box has 104 threads giving 416 locks to take). With the patch the behavior is short-circuited if there are no buffers to begin with and some of the locking is elided even if the count is != 0.

@mjguzik
Copy link
Contributor

mjguzik commented Oct 31, 2020

@mjguzik why someone else?

I did not write any of the allocator modulo the cosmetic patch. I presume a buy in from other people who worked on this is necessary.

As for myself, I have more comments, but given the state reached they would be only getting in the way of closing this PR. I'll post them later.

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Oct 31, 2020
@PrivatePuffin
Copy link
Contributor

I did not write any of the allocator modulo the cosmetic patch. I presume a buy in from other people who worked on this is necessary.

Thats just @c0d3z3r0 (who has decided not to work on zfs anymore), @allanjude for the most part...
And me some testing, docs and performance work...

My two cents: This patch does solve an the issue as reported by @mjguzik and isn't too involved.
I agree with @mjguzik that we should all colaborate on improving on this PR and we don't have to PR all patches and tweaks at once.

I also agree that this should indeed by backported into 2.0.

Copy link
Contributor

@PrivatePuffin PrivatePuffin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just needs a little squash, but does what it is supposed to do.

@BrainSlayer
Copy link
Contributor Author

@Ornias1993 so you request a squash of all 3 commits?

@PrivatePuffin
Copy link
Contributor

PrivatePuffin commented Nov 1, 2020

@BrainSlayer The "one commit" "rule" still applies for small PR's like this one, so yes.
And a look at those tests ofc, but don't think those are any relevant.

But LGTM

avoid checking the whole array of objects each time by removing the self
organized memory reaping. this can be managed by the global memory reap
callback which is called every 60 seconds. this will reduce the use if
locking operations significant.

Signed-off-by: Sebastian Gottschall <[email protected]>
@BrainSlayer
Copy link
Contributor Author

done

Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make sure this gets backported to 2.0.

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Nov 2, 2020
@behlendorf behlendorf merged commit 7eefaf0 into openzfs:master Nov 2, 2020
@PrivatePuffin
Copy link
Contributor

PrivatePuffin commented Nov 2, 2020

@behlendorf Thanks!

@mjguzik and @BrainSlayer
Now that we have this out of the way, Concerning the proposed other improvements:
I've been thinking about what @mjguzik said about not wanting to work in things that aren't going to be accepted (and the catch 22 that creates), so I've come up with some citeria that I think the everyone with previous work on zstd-on-zfs all agrees on.

If the following criteria are met, merging a enhancement of the allocator would be most easily agreed on as "acceptable":

  • Not limiting the current capabilities (supported ZSTD levels, platforms, options, logging)
  • Not negatively impacting current performance more than a few percent (both compression ratio AND speeds)
  • Not too many platform dependent tweaks (we've seen on Allan's proposal how much discussion that creates)
  • Passing all tests and checklists (obviously)

As long as improvements don't cross any of those, I personally think any improvements would have a high chance of getting accepted/agreed-upon.

I hope that clears up some of the endless discussion about your proposed changes @mjguzik and gives you some idea of what you can expect in terms of responses on your proposed changes :)

@BrainSlayer
Copy link
Contributor Author

BrainSlayer commented Nov 3, 2020

@Ornias1993 i understand and will follow that guidance. in general i'm waiting now just for input/ideas for new enhancements. i'm not aware right now of any other issue left. except for maybe making the memory reaping polling time configurable or things like that.

@mjguzik
Copy link
Contributor

mjguzik commented Nov 3, 2020

Maybe my other reply came off as a little dramatic so let me re-state my key point here:

For any compression level + blocksize there seems to be max allocation associated with it, which is also trivial to provoke for every CPU. For the default it seems to be 1.2MB. I did a quick test with recordsize=1M and compression=zstd-19, that gave me 18MB buffers. On a 104 box that's 1.8GB RAM just to facilitate zstd. You can easily provoke all the allocation by running a few 'yes > somefile' processes. I don't think memory usage of this sort is sustainable in real deployments and the most viable option I see here is to reduce supported compression levels. It may be some tweaks can be made, but achievable memory usage has to go down significantly. The 1.2MB buffer per CPU may be tolerable. Short of this, I don't think any caching policy changes will be of significance.

That said, I'm not involved with zstd and I'm not willing to spend time debating the assessment made above. If you disagree with it, I'm just going to backoff from zstd altogether.

@PrivatePuffin
Copy link
Contributor

compression=zstd-19

I think we should be more clear that everything byond ZSTD-9 is "for future use" and not actually expected to be used in any sane capacity 👍

I don't think memory usage of this sort is sustainable in real deployments

You don't have to, it's not supposed to be used by many.

he most viable option I see here is to reduce supported compression levels

If someone wants to run idiotic levels of something, we shouldn't prevent it je because we think it's weird. There is a lot worse you can do in the ZFS CLI.

may be some tweaks can be made, but achievable memory usage has to go down significantly

You keep focussing on zstd-19, which is NOT even close to the default setting for a reason. Please focus on anything < ZSTD-9 in the discussion... We know ZSTD-19 is insane and added it anyway in case we have more of a use (and tweaks) for it in the future.

If you disagree with it, I'm just going to backoff from zstd altogether.

You don't have to run ZSTD-19, we adviced against it quite frequently.
IMHO, it's onto the downstreams and users running ZSTD natively to determine up-to what level of ZSTD they want to support or use, not for us to decide how insane someone wants to go.

To give you two examples:

  1. Myself, an enduser:
    Even with a 104 thread system, I would NOT run ZSTD-19, because the gain/cost ratio is off in terms of ratio/CPU. But when I would own a 104 thread system, I doubt I would bother much about 1 or two GB less or more.

  2. TrueNAS, a downstream implementation of ZFS including ZSTD:
    I implemented just 3, 5 and 7 into TrueNAS, because anything higher didn't have much of a gain and to prevent users selecting insane levels.

Anyhow @mjguzik just to TLDR:
As I understand it you are saying insanely high levels should be removed, because you don't want to use them because they are insane?


@BrainSlayer

in general i'm waiting now just for input/ideas for new enhancements. i'm not aware right now of any other issue left

How about making the multiplier in pool_count into a tuneable, I was thinking about that one and for pure-storage servers or small home-servers *4 is fine...
However: Hyperconverged systems might have many cores but only a fraction of them would be pressured by ZFS/ZSTD. (because most would be in use with actuall services being run, not zstd compression)

@mjguzik
Copy link
Contributor

mjguzik commented Nov 3, 2020

My experience is that if a setting is easily accessible, the assumption is it basically works and there will be people using it, in this case requiring the high ram usage to be fine. That said, I don't think we have common ground here so I'm going to back off.

@BrainSlayer
Copy link
Contributor Author

BrainSlayer commented Nov 3, 2020

the level is a user decision. the memory consumption for certain levels might be important for documentation. the default level has been choosen since it provides comparable levels with gzip. level like 19 are more like xz an very slow. so its not just about memory consumption. its also about performance. but even if 19 has high memory consumption, zstd will handle it and ignores compression if allocation fails.

making the multiplier configurable is pointless. i have choose 4 since zfs itself creates 4 times io threads based on the available cpu core/threads. so on a typical i7 with hyperthreading enabled. 32 io threads will run. so these are really required. if you lower the available mempool you will run into locking barriers while allocation under heavy load. so decreasing it, will slow down zstd. increasing it, does not have a impact at all and just creates bloat

@PrivatePuffin
Copy link
Contributor

My experience is that if a setting is easily accessible, the assumption is it basically works

Then your experience is wrong. There are A LOT of ZFS settings that definately don't "basically work" without significant downsides.

there will be people using it

I expect there will be and they should be aware of the consequences. Thats why I and @BrainSlayer agree updating the documentation to reflect this.

That said, I don't think we have common ground here so I'm going to back off.

Sorry to bother you, I was under the (wrong) impression that you had more suggestions than simply lower the amount of supported ZSTD levels, thats why I tried to find out what you wanted changed.

making the multiplier configurable is pointless. i have choose 4 since zfs itself creates 4 times io threads based on the available cpu core/threads. so on a typical i7 with hyperthreading enabled. 32 io threads will run. so these are really required. if you lower the available mempool you will run into locking barriers while allocation under heavy load. so decreasing it, will slow down zstd. increasing it, does not have a impact at all and just creates bloat

Ahh shoot, forgot about the IO threads, yes you're right...

behlendorf pushed a commit that referenced this pull request Nov 3, 2020
Avoid checking the whole array of objects each time by removing the self
organized memory reaping. this can be managed by the global memory reap
callback which is called every 60 seconds. this will reduce the use if
locking operations significant.

Reviewed-by: Kjeld Schouten <[email protected]>
Reviewed-by: Mateusz Guzik <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Sebastian Gottschall <[email protected]>
Closes #11126
jsai20 pushed a commit to jsai20/zfs that referenced this pull request Mar 30, 2021
Avoid checking the whole array of objects each time by removing the self
organized memory reaping. this can be managed by the global memory reap
callback which is called every 60 seconds. this will reduce the use if
locking operations significant.

Reviewed-by: Kjeld Schouten <[email protected]>
Reviewed-by: Mateusz Guzik <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Sebastian Gottschall <[email protected]>
Closes openzfs#11126
sempervictus pushed a commit to sempervictus/zfs that referenced this pull request May 31, 2021
Avoid checking the whole array of objects each time by removing the self
organized memory reaping. this can be managed by the global memory reap
callback which is called every 60 seconds. this will reduce the use if
locking operations significant.

Reviewed-by: Kjeld Schouten <[email protected]>
Reviewed-by: Mateusz Guzik <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Sebastian Gottschall <[email protected]>
Closes openzfs#11126
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants