Skip to content

ARC reads from pools without an l2arc should not be accounted as l2arc misses #10921

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 20, 2020

Conversation

adamdmoss
Copy link
Contributor

@adamdmoss adamdmoss commented Sep 12, 2020

The other half of the fix for issue #10913 ; this PR only eliminates l2 arcstat mis-counting for pools without l2arcs. (Versus the intersecting-but-separate PR #10914 which only ignores misses from datasets with secondarycache!=all)

Note, this change doesn't affect situations where all the pools on a system have an l2arc or all the pools on a system don't have l2arc; it only changes the mixed case.

Motivation and Context

IMHO the current l2_misses accounting behavior is a bug; it treats all reads to pools without a configured l2arc as an l2arc miss, IFF there is at least one other pool on the system which does have an l2arc configured.

This makes it extremely hard to tune for an improved l2arc hit/miss ratio because this ratio will be modulated by reads from pools which do not (and should not) have l2arc devices; its upper limit will depend on the ratio of reads from l2arc'd pools and non-l2arc'd pools.

Description

This PR prevents ARC reads affecting l2arc stats (n.b. l2_misses is the only relevant one) where the target spa doesn't have an l2arc.

Note I've added a couple of paragraphs of hand-wringing around arc.c:6170 to clarify(?) exactly what's being checked for / ignored for the sake of cheapness.

How Has This Been Tested?

I've tested on a mix of pools where some have l2arc and some do not. L2 hits and misses are accounted correctly when reading from l2arc-backed pools, L2 misses (and hits) do not affect accounting when reading from non-l2arc-backed pools. An upper limit of ~100% l2_hits/l2_misses ratio is attainable with this change regardless of traffic on pools which do not use l2arc.

I've tested the behavior of secondarycache=... and it remains unaffected. (That possibly-more-controversial change is covered by PR #10914 )

Ubuntu 18.04.05, Ubuntu HWE kernel 5.4.0-42-generic, git zfs zfs-2.0-release @ 8e7fe49 + PR cherrypick.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

@adamdmoss adamdmoss force-pushed the pr/l2checkmiss branch 3 times, most recently from 434ceb2 to fa1c0c2 Compare September 12, 2020 03:28
@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Sep 12, 2020
@behlendorf
Copy link
Contributor

@gamanakis would you mind looking at this.

module/zfs/arc.c Outdated
/*
* Check if the spa even has l2 configured;
*
* NOTE #1: not holding spa_namespace_lock -
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need holding spa_namespace_lock because we are not making any changes to the pool configuration here.
We lock out L2ARC device removal with spa_config_tryenter() a few lines above. I think you can remove NOTE#1 entirely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good - I was hoping that was the case, the NOTEs were articulate my concerns so I could be corrected. :D

module/zfs/arc.c Outdated
* only affects accounting.
*/
const boolean_t spa_has_l2 =
l2arc_ndev != 0 &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line can be moved to the previous one.

module/zfs/arc.c Outdated
@@ -6265,17 +6285,26 @@ arc_read(zio_t *pio, spa_t *spa, const blkptr_t *bp,
spa_config_exit(spa, SCL_L2ARC, vd);
}
} else {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This empty line is not needed here.

@gamanakis
Copy link
Contributor

@adamdmoss Thank you for this. I left comments in the code. Since you mention you have test-workflow in the description it would be worthwhile to add a test in tests/zfs-tests/tests/functional/l2arc.

@adamdmoss
Copy link
Contributor Author

Thanks for the review - will address feedback soon.

@richardelling
Copy link
Contributor

It is a bit unfortunate that @gamanakis and @adamdmoss were not at the OpenZFS Leadership meeting on 16-Sep-2020. The team had a good discussion around per-pool ARC statistics and discussed several ways to implement them. See https://www.youtube.com/watch?v=fVcNnFHDwHY beginning at 5:10.

@adamdmoss
Copy link
Contributor Author

I believe I've addressed all the review feedback @gamanakis - cheers.

@adamdmoss
Copy link
Contributor Author

n.b. as the reporter of #10913 I'd be happy enough to call that closed with just with PR; I don't feel very strongly about the alternative solution/workaround in #10914

@gamanakis
Copy link
Contributor

@richardelling Thank you for triggering this discussion, interesting indeed. I will take a look into the different types of stats mentioned, it will be useful in the case of L2ARC as you pointed out.
Regarding the present PR, I think it is a step in the right direction.

Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing this case. I'd also recommend we add a new test case to tests/zfs-tests/functional/l2arc

@adamdmoss
Copy link
Contributor Author

I'd like to add a test case, it'll take a little while again due to the day job; is there a doc out there about running the test suite locally or should I rely on the upstream buildbot?

@gmelikov
Copy link
Member

gmelikov commented Sep 21, 2020

@adamdmoss you can run zfs-tests via:

@adamdmoss
Copy link
Contributor Author

Thanks - I'm scared of borking my local system since I don't have multiple scratch disks, keep meaning to set up a VM for zfs dev. The github action thing is quite intriguing!

@gmelikov
Copy link
Member

Sorry, I may be more precise, you should use VM on local run, zfs-tests may easily kill something.

@gamanakis
Copy link
Contributor

My workflow for running a part of the ZTS suite (in a VM) is the following:

compile: ./autogen.sh; ./configure --enable-debug --enable-debuginfo --enable-debug-kmem --enable-debug-kmem-tracking; make -j8
load modules: sudo ./scripts/zfs.sh -v
run part of ZTS: ./scripts/zfs-tests.sh -T l2arc
unload modules: sudo ./scripts/zfs.sh -vu

Take a look in the tests in the tests/zfs-tests/tests/functional/l2arc directory (this is where the test for this PR should go).

adamdmoss added a commit to adamdmoss/zfs that referenced this pull request Sep 26, 2020
adamdmoss added a commit to adamdmoss/zfs that referenced this pull request Sep 26, 2020
@adamdmoss
Copy link
Contributor Author

(sorry for a bit of churn - github action in my repo was a very mixed success so I'm leaning on buildbot while I iterate on the new test)

adamdmoss added a commit to adamdmoss/zfs that referenced this pull request Sep 26, 2020
adamdmoss added a commit to adamdmoss/zfs that referenced this pull request Sep 26, 2020
Resolves openzfs#10921
Includes new test - l2arc_l2miss_pos.ksh

Signed-off-by: Adam Moss <[email protected]>
@adamdmoss
Copy link
Contributor Author

The new test passes on CentOS8 now, FWIW, Any idea on the 'random' other failing tests? If they're relevant I'll fix them, but it looks unlikely on a cursory look.

Is there a test dashboard that'll let me visualize if there's a pattern?

@adamdmoss
Copy link
Contributor Author

Ahh, I see a strong correlation with commonly-failing tests from here http://build.zfsonlinux.org/known-issues.html

@behlendorf
Copy link
Contributor

If they're relevant I'll fix them

I've gone ahead and resubmitted those builds. It looks like they were caused by an issue with the CI. I suggest focusing on the test case you're adding.

adamdmoss added a commit to adamdmoss/zfs that referenced this pull request Sep 30, 2020
Includes new test - l2arc_l2miss_pos.ksh

Signed-off-by: Adam Moss <[email protected]>
Closes openzfs#10921
@adamdmoss
Copy link
Contributor Author

buildbot/Fedora 32 x86_64 (TEST) failure appears unrelated - failing to compile!

@adamdmoss
Copy link
Contributor Author

Looks like all test platforms are succeeding apart from Fedora which doesn't compile (unrelated), and CentOS which is failing a lot of stuff (but not this test).

adamdmoss added a commit to adamdmoss/zfs that referenced this pull request Oct 2, 2020
Includes new test - l2arc_l2miss_pos.ksh

Signed-off-by: Adam Moss <[email protected]>
Closes openzfs#10921
adamdmoss added a commit to adamdmoss/zfs that referenced this pull request Oct 7, 2020
Includes new test - l2arc_l2miss_pos.ksh

Signed-off-by: Adam Moss <[email protected]>
Closes openzfs#10921
adamdmoss added a commit to adamdmoss/zfs that referenced this pull request Oct 7, 2020
Includes new test - l2arc_l2miss_pos.ksh

Signed-off-by: Adam Moss <[email protected]>
Closes openzfs#10921
adamdmoss added a commit to adamdmoss/zfs that referenced this pull request Oct 10, 2020
Includes new test - l2arc_l2miss_pos.ksh

Signed-off-by: Adam Moss <[email protected]>
Closes openzfs#10921
adamdmoss added a commit to adamdmoss/zfs that referenced this pull request Oct 12, 2020
Includes new test - l2arc_l2miss_pos.ksh

Signed-off-by: Adam Moss <[email protected]>
Closes openzfs#10921
Includes new test - l2arc_l2miss_pos.ksh

Signed-off-by: Adam Moss <[email protected]>
Closes openzfs#10921
@adamdmoss
Copy link
Contributor Author

(ping)

@gamanakis
Copy link
Contributor

Looks good to me!

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Oct 20, 2020
@behlendorf behlendorf merged commit 666aa69 into openzfs:master Oct 20, 2020
jsai20 pushed a commit to jsai20/zfs that referenced this pull request Mar 30, 2021
The current l2_misses accounting behavior treats all reads to pools 
without a configured l2arc as an l2arc miss, IFF there is at least 
one other pool on the system which does have an l2arc configured.

This makes it extremely hard to tune for an improved l2arc hit/miss 
ratio because this ratio will be modulated by reads from pools which 
do not (and should not) have l2arc devices; its upper limit will 
depend on the ratio of reads from l2arc'd pools and non-l2arc'd pools.

This PR prevents ARC reads affecting l2arc stats (n.b. l2_misses is 
the only relevant one) where the target spa doesn't have an l2arc.

Includes new test - l2arc_l2miss_pos.ksh

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Amanakis <[email protected]>
Signed-off-by: Adam Moss <[email protected]>
Closes openzfs#10921
sempervictus pushed a commit to sempervictus/zfs that referenced this pull request May 31, 2021
The current l2_misses accounting behavior treats all reads to pools 
without a configured l2arc as an l2arc miss, IFF there is at least 
one other pool on the system which does have an l2arc configured.

This makes it extremely hard to tune for an improved l2arc hit/miss 
ratio because this ratio will be modulated by reads from pools which 
do not (and should not) have l2arc devices; its upper limit will 
depend on the ratio of reads from l2arc'd pools and non-l2arc'd pools.

This PR prevents ARC reads affecting l2arc stats (n.b. l2_misses is 
the only relevant one) where the target spa doesn't have an l2arc.

Includes new test - l2arc_l2miss_pos.ksh

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Amanakis <[email protected]>
Signed-off-by: Adam Moss <[email protected]>
Closes openzfs#10921
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants