-
Notifications
You must be signed in to change notification settings - Fork 1.9k
ARC reads from pools without an l2arc should not be accounted as l2arc misses #10921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
434ceb2
to
fa1c0c2
Compare
@gamanakis would you mind looking at this. |
module/zfs/arc.c
Outdated
/* | ||
* Check if the spa even has l2 configured; | ||
* | ||
* NOTE #1: not holding spa_namespace_lock - |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't need holding spa_namespace_lock because we are not making any changes to the pool configuration here.
We lock out L2ARC device removal with spa_config_tryenter() a few lines above. I think you can remove NOTE#1 entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good - I was hoping that was the case, the NOTEs were articulate my concerns so I could be corrected. :D
module/zfs/arc.c
Outdated
* only affects accounting. | ||
*/ | ||
const boolean_t spa_has_l2 = | ||
l2arc_ndev != 0 && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line can be moved to the previous one.
module/zfs/arc.c
Outdated
@@ -6265,17 +6285,26 @@ arc_read(zio_t *pio, spa_t *spa, const blkptr_t *bp, | |||
spa_config_exit(spa, SCL_L2ARC, vd); | |||
} | |||
} else { | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This empty line is not needed here.
@adamdmoss Thank you for this. I left comments in the code. Since you mention you have test-workflow in the description it would be worthwhile to add a test in |
Thanks for the review - will address feedback soon. |
It is a bit unfortunate that @gamanakis and @adamdmoss were not at the OpenZFS Leadership meeting on 16-Sep-2020. The team had a good discussion around per-pool ARC statistics and discussed several ways to implement them. See https://www.youtube.com/watch?v=fVcNnFHDwHY beginning at 5:10. |
fa1c0c2
to
640d4a1
Compare
I believe I've addressed all the review feedback @gamanakis - cheers. |
@richardelling Thank you for triggering this discussion, interesting indeed. I will take a look into the different types of stats mentioned, it will be useful in the case of L2ARC as you pointed out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing this case. I'd also recommend we add a new test case to tests/zfs-tests/functional/l2arc
I'd like to add a test case, it'll take a little while again due to the day job; is there a doc out there about running the test suite locally or should I rely on the upstream buildbot? |
@adamdmoss you can run zfs-tests via:
|
Thanks - I'm scared of borking my local system since I don't have multiple scratch disks, keep meaning to set up a VM for zfs dev. The github action thing is quite intriguing! |
Sorry, I may be more precise, you should use VM on local run, zfs-tests may easily kill something. |
My workflow for running a part of the ZTS suite (in a VM) is the following: compile: Take a look in the tests in the |
640d4a1
to
a9ee28d
Compare
(sorry for a bit of churn - github action in my repo was a very mixed success so I'm leaning on buildbot while I iterate on the new test) |
Resolves openzfs#10921 Includes new test - l2arc_l2miss_pos.ksh Signed-off-by: Adam Moss <[email protected]>
23b2aff
to
984f3ce
Compare
The new test passes on CentOS8 now, FWIW, Any idea on the 'random' other failing tests? If they're relevant I'll fix them, but it looks unlikely on a cursory look. Is there a test dashboard that'll let me visualize if there's a pattern? |
Ahh, I see a strong correlation with commonly-failing tests from here http://build.zfsonlinux.org/known-issues.html |
I've gone ahead and resubmitted those builds. It looks like they were caused by an issue with the CI. I suggest focusing on the test case you're adding. |
Includes new test - l2arc_l2miss_pos.ksh Signed-off-by: Adam Moss <[email protected]> Closes openzfs#10921
7a7cf9e
to
88a0790
Compare
|
Looks like all test platforms are succeeding apart from Fedora which doesn't compile (unrelated), and CentOS which is failing a lot of stuff (but not this test). |
Includes new test - l2arc_l2miss_pos.ksh Signed-off-by: Adam Moss <[email protected]> Closes openzfs#10921
88a0790
to
5439f80
Compare
Includes new test - l2arc_l2miss_pos.ksh Signed-off-by: Adam Moss <[email protected]> Closes openzfs#10921
5439f80
to
eb7ca77
Compare
Includes new test - l2arc_l2miss_pos.ksh Signed-off-by: Adam Moss <[email protected]> Closes openzfs#10921
eb7ca77
to
990b070
Compare
Includes new test - l2arc_l2miss_pos.ksh Signed-off-by: Adam Moss <[email protected]> Closes openzfs#10921
990b070
to
c1ff0c6
Compare
Includes new test - l2arc_l2miss_pos.ksh Signed-off-by: Adam Moss <[email protected]> Closes openzfs#10921
c1ff0c6
to
26893f9
Compare
Includes new test - l2arc_l2miss_pos.ksh Signed-off-by: Adam Moss <[email protected]> Closes openzfs#10921
26893f9
to
e22e6d8
Compare
(ping) |
Looks good to me! |
The current l2_misses accounting behavior treats all reads to pools without a configured l2arc as an l2arc miss, IFF there is at least one other pool on the system which does have an l2arc configured. This makes it extremely hard to tune for an improved l2arc hit/miss ratio because this ratio will be modulated by reads from pools which do not (and should not) have l2arc devices; its upper limit will depend on the ratio of reads from l2arc'd pools and non-l2arc'd pools. This PR prevents ARC reads affecting l2arc stats (n.b. l2_misses is the only relevant one) where the target spa doesn't have an l2arc. Includes new test - l2arc_l2miss_pos.ksh Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Amanakis <[email protected]> Signed-off-by: Adam Moss <[email protected]> Closes openzfs#10921
The current l2_misses accounting behavior treats all reads to pools without a configured l2arc as an l2arc miss, IFF there is at least one other pool on the system which does have an l2arc configured. This makes it extremely hard to tune for an improved l2arc hit/miss ratio because this ratio will be modulated by reads from pools which do not (and should not) have l2arc devices; its upper limit will depend on the ratio of reads from l2arc'd pools and non-l2arc'd pools. This PR prevents ARC reads affecting l2arc stats (n.b. l2_misses is the only relevant one) where the target spa doesn't have an l2arc. Includes new test - l2arc_l2miss_pos.ksh Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Amanakis <[email protected]> Signed-off-by: Adam Moss <[email protected]> Closes openzfs#10921
The other half of the fix for issue #10913 ; this PR only eliminates l2 arcstat mis-counting for pools without l2arcs. (Versus the intersecting-but-separate PR #10914 which only ignores misses from datasets with secondarycache!=all)
Note, this change doesn't affect situations where all the pools on a system have an l2arc or all the pools on a system don't have l2arc; it only changes the mixed case.
Motivation and Context
IMHO the current l2_misses accounting behavior is a bug; it treats all reads to pools without a configured l2arc as an l2arc miss, IFF there is at least one other pool on the system which does have an l2arc configured.
This makes it extremely hard to tune for an improved l2arc hit/miss ratio because this ratio will be modulated by reads from pools which do not (and should not) have l2arc devices; its upper limit will depend on the ratio of reads from l2arc'd pools and non-l2arc'd pools.
Description
This PR prevents ARC reads affecting l2arc stats (n.b. l2_misses is the only relevant one) where the target spa doesn't have an l2arc.
Note I've added a couple of paragraphs of hand-wringing around arc.c:6170 to clarify(?) exactly what's being checked for / ignored for the sake of cheapness.
How Has This Been Tested?
I've tested on a mix of pools where some have l2arc and some do not. L2 hits and misses are accounted correctly when reading from l2arc-backed pools, L2 misses (and hits) do not affect accounting when reading from non-l2arc-backed pools. An upper limit of ~100% l2_hits/l2_misses ratio is attainable with this change regardless of traffic on pools which do not use l2arc.
I've tested the behavior of secondarycache=... and it remains unaffected. (That possibly-more-controversial change is covered by PR #10914 )
Ubuntu 18.04.05, Ubuntu HWE kernel 5.4.0-42-generic, git zfs zfs-2.0-release @ 8e7fe49 + PR cherrypick.
Types of changes
Checklist:
Signed-off-by
.