-
Notifications
You must be signed in to change notification settings - Fork 1.9k
optimize locking checks in mempool allocator #11126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
just a draft right now for checking |
Nice work @BrainSlayer Looking forward to the benefids! :) |
I think this is a nice step forward. Regardless, I think it should wait for #11129 and then get rebased. |
950522f
to
5e1e176
Compare
I'm not going to argue anymore about this patch. I think the commits combined provide a marked improvement and in particular the one i was looking for (overhead avoided if zstd is not in use). However, I presume someone else will have to accept this PR. |
@mjguzik why someone else? |
bf7f1e2
to
2774004
Compare
@behlendorf I think the patch (once it finalizes) should land in 2.0. The zfs_zstd_cache_reap_now routine is called from arc_reap_cb_check regardless of zstd being in use or not. Without the patch the zstd shrinker walks the entire array trylocking each element. Since the array is sized to 4 * ncpus this amasses to significant single-threaded overhead on bigger machines (for example my test box has 104 threads giving 416 locks to take). With the patch the behavior is short-circuited if there are no buffers to begin with and some of the locking is elided even if the count is != 0. |
I did not write any of the allocator modulo the cosmetic patch. I presume a buy in from other people who worked on this is necessary. As for myself, I have more comments, but given the state reached they would be only getting in the way of closing this PR. I'll post them later. |
Thats just @c0d3z3r0 (who has decided not to work on zfs anymore), @allanjude for the most part... My two cents: This patch does solve an the issue as reported by @mjguzik and isn't too involved. I also agree that this should indeed by backported into 2.0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just needs a little squash, but does what it is supposed to do.
@Ornias1993 so you request a squash of all 3 commits? |
@BrainSlayer The "one commit" "rule" still applies for small PR's like this one, so yes. But LGTM |
avoid checking the whole array of objects each time by removing the self organized memory reaping. this can be managed by the global memory reap callback which is called every 60 seconds. this will reduce the use if locking operations significant. Signed-off-by: Sebastian Gottschall <[email protected]>
2774004
to
4d1df4d
Compare
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll make sure this gets backported to 2.0.
@behlendorf Thanks! @mjguzik and @BrainSlayer If the following criteria are met, merging a enhancement of the allocator would be most easily agreed on as "acceptable":
As long as improvements don't cross any of those, I personally think any improvements would have a high chance of getting accepted/agreed-upon. I hope that clears up some of the endless discussion about your proposed changes @mjguzik and gives you some idea of what you can expect in terms of responses on your proposed changes :) |
@Ornias1993 i understand and will follow that guidance. in general i'm waiting now just for input/ideas for new enhancements. i'm not aware right now of any other issue left. except for maybe making the memory reaping polling time configurable or things like that. |
Maybe my other reply came off as a little dramatic so let me re-state my key point here: For any compression level + blocksize there seems to be max allocation associated with it, which is also trivial to provoke for every CPU. For the default it seems to be 1.2MB. I did a quick test with recordsize=1M and compression=zstd-19, that gave me 18MB buffers. On a 104 box that's 1.8GB RAM just to facilitate zstd. You can easily provoke all the allocation by running a few 'yes > somefile' processes. I don't think memory usage of this sort is sustainable in real deployments and the most viable option I see here is to reduce supported compression levels. It may be some tweaks can be made, but achievable memory usage has to go down significantly. The 1.2MB buffer per CPU may be tolerable. Short of this, I don't think any caching policy changes will be of significance. That said, I'm not involved with zstd and I'm not willing to spend time debating the assessment made above. If you disagree with it, I'm just going to backoff from zstd altogether. |
I think we should be more clear that everything byond ZSTD-9 is "for future use" and not actually expected to be used in any sane capacity 👍
You don't have to, it's not supposed to be used by many.
If someone wants to run idiotic levels of something, we shouldn't prevent it je because we think it's weird. There is a lot worse you can do in the ZFS CLI.
You keep focussing on zstd-19, which is NOT even close to the default setting for a reason. Please focus on anything < ZSTD-9 in the discussion... We know ZSTD-19 is insane and added it anyway in case we have more of a use (and tweaks) for it in the future.
You don't have to run ZSTD-19, we adviced against it quite frequently. To give you two examples:
Anyhow @mjguzik just to TLDR:
How about making the multiplier in |
My experience is that if a setting is easily accessible, the assumption is it basically works and there will be people using it, in this case requiring the high ram usage to be fine. That said, I don't think we have common ground here so I'm going to back off. |
the level is a user decision. the memory consumption for certain levels might be important for documentation. the default level has been choosen since it provides comparable levels with gzip. level like 19 are more like xz an very slow. so its not just about memory consumption. its also about performance. but even if 19 has high memory consumption, zstd will handle it and ignores compression if allocation fails. making the multiplier configurable is pointless. i have choose 4 since zfs itself creates 4 times io threads based on the available cpu core/threads. so on a typical i7 with hyperthreading enabled. 32 io threads will run. so these are really required. if you lower the available mempool you will run into locking barriers while allocation under heavy load. so decreasing it, will slow down zstd. increasing it, does not have a impact at all and just creates bloat |
Then your experience is wrong. There are A LOT of ZFS settings that definately don't "basically work" without significant downsides.
I expect there will be and they should be aware of the consequences. Thats why I and @BrainSlayer agree updating the documentation to reflect this.
Sorry to bother you, I was under the (wrong) impression that you had more suggestions than simply lower the amount of supported ZSTD levels, thats why I tried to find out what you wanted changed.
Ahh shoot, forgot about the IO threads, yes you're right... |
Avoid checking the whole array of objects each time by removing the self organized memory reaping. this can be managed by the global memory reap callback which is called every 60 seconds. this will reduce the use if locking operations significant. Reviewed-by: Kjeld Schouten <[email protected]> Reviewed-by: Mateusz Guzik <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Sebastian Gottschall <[email protected]> Closes #11126
Avoid checking the whole array of objects each time by removing the self organized memory reaping. this can be managed by the global memory reap callback which is called every 60 seconds. this will reduce the use if locking operations significant. Reviewed-by: Kjeld Schouten <[email protected]> Reviewed-by: Mateusz Guzik <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Sebastian Gottschall <[email protected]> Closes openzfs#11126
Avoid checking the whole array of objects each time by removing the self organized memory reaping. this can be managed by the global memory reap callback which is called every 60 seconds. this will reduce the use if locking operations significant. Reviewed-by: Kjeld Schouten <[email protected]> Reviewed-by: Mateusz Guzik <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Sebastian Gottschall <[email protected]> Closes openzfs#11126
avoid checking the whole array of objects each time by removing the self
organized memory reaping. this can be managed by the global memory reap
callback which is called every 60 seconds. this will reduce the use of
locking operations significant. in addition the memory release function
has been optimized to use only lockings where really needed
Signed-off-by: Sebastian Gottschall [email protected]
Motivation and Context
Description
How Has This Been Tested?
Types of changes
Checklist:
Signed-off-by
.