ZTS: Remove ashift setting from dedup_quota test #17250

amotin · 2025-04-16T19:27:32Z

The test writes 1M of 1KB blocks, which may produce up to 1GB of dirty data. On top of that ashift=12 likely produces additional 4GB of ZIO buffers during sync process. On top of that we likely need some (up to 4GB of) page cache since the pool reside on files. And finally we need to cache the DDT. Not surprising that the test regularly ends up in OOMs, possibly depending on TXG size variations.

Also replace fio with pretty strange parameter set with a set of dd writes and TXG commits, just as we neeed here.

While here, remove compression. It has nothing to do here, but waste CI CPU time.

How Has This Been Tested?

Very briefly by hands. Lets see what CI say about it.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the OpenZFS code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
I have run the ZFS Test Suite with this change applied.
All commit messages are properly formatted and contain Signed-off-by.

amotin · 2025-04-17T14:50:16Z

Hooray! It passed on all platforms from the first attempts, and seems faster than before, even though the variation is significant, from ~1.5 to ~8 minutes.

pcd1193182 · 2025-04-17T17:53:20Z

While it passed on all the CI platforms, it now fails intermittently when I run it locally (where previously it did not):

PR change:		 6/10 pass
remove compression:	10/10 pass
remove ashift:		 7/10 pass
original:		10/10 pass

Removing the ashift from the pool appears to cause the test to become flaky, at least on my system. Removing compression only had no negative affect. I am investigating the failure to see if I can get insight into why this is causing problems.

pcd1193182 · 2025-04-17T18:17:53Z

Okay, I noticed that the failing runs always used a dedup device, while the passing runs used a special device in ddt_dedup_vdev_limit, so I think something is going wrong there. Investigating further.

amotin · 2025-04-17T18:18:15Z

I've restarted the CI to try my luck again. @pcd1193182 I wonder how exactly it fails for you.

pcd1193182 · 2025-04-17T18:20:55Z

So the failure mode on my runs is that the entries ends up greater than the target of 650000 in the ddt_dedup_vdev_limit test. And from looking at the zdb output, it looks like the on-disk size is larger than the disk allocated for the dedup class. It seems like we're not respecting that limit for some reason.

amotin · 2025-04-17T18:28:00Z

@pcd1193182 In the CI logs I regularly saw big discrepancy between the dedup_table_size property and output of zpool list -v. But I don't know why, though it might be a question of TXG commits when the data is updated, since the test obviously passed. I actually wonder where the 650000 comes from. Why not a million, as the number of written blocks? The value of 800,000 entries the comment is even more mysterious to me.

pcd1193182 · 2025-04-17T18:43:11Z

@amotin By expanding the dbgmsg output we gather from the test, it looks like we flush ~400k log entries before we start spilling into the normal class, and then we keep flushing another ~380k after that. It seems like the log kept building up while we were doing the writes, and it didn't actually notice we were over quota (and so stop adding things to the log) until the dedup device filled up.

My guess is that this is because the dedup log is stored on the special device (as a metadata object), which means that if we're using a special device we notice we're over dedup quota once the log + DDT is big enough to fill the object. Whereas if we are using a dedup device, we only store the actual ddt entries there, and so it doesn't trip the quota until it's too late, and the dedup log has hundreds of thousands of extra entries to handle.

As for why removing the 4k made this appear... my guess is that the dirty data inflation was actually helping us by forcing things to spread into more TXGs. That slowed it down enough that the dedup code noticed that we were at the limit before it added the extra entries to the log.

EDIT: Also it makes sense why this was only appearing on my runs and not the CI; I'm probably using a significantly beefier VM than the CI runs are, and the problem only appears if we manage to shove all the extra stuff into the log before we can flush enough out to trip the quota.

amotin · 2025-04-17T19:01:18Z

@pcd1193182 IIRC we introduced zp_storage_type specifically to make sure DDT logs are stored same as DDT. But considering that this test uses 1KB recordsize, I wonder if special vdev might see significant additional usage from indirect blocks to affect the results. Actually I quickly run a manual test with both dedup and special vdevs and got this:

# zpool list -v
NAME        SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
optane      254G  1.27G   253G        -         -     0%     0%  1.00x    ONLINE  -
  nvd2p1    256G  1.00G   253G        -         -     0%  0.39%      -    ONLINE
dedup          -      -      -        -         -      -      -      -         -
  md1       200M   179M  12.9M        -         -    27%  93.3%      -    ONLINE
special        -      -      -        -         -      -      -      -         -
  md0       200M  96.5M  95.5M        -         -    23%  50.3%      -    ONLINE

, so special vdev usage by metadata (I think) is pretty significant here to affect the result.

PS: May be in "dedup" case we could reduce the size to 100MB to compensate?

pcd1193182 · 2025-04-17T19:08:01Z

Ah interesting, I didn't know about that feature! But you're right that indirect blocks would also make a significant difference, at this recordsize. We could try having a smaller dedup device than special to account for that. I also tried the simple hack of increasing DEDUP_LOG_FLUSH_ENTRIES_MIN to 500000 for this test, and with that it now passes consistently on my VM.

pcd1193182 · 2025-04-17T19:17:19Z

Okay, with the size set to 100M for dedup and 200M for special, the test also passes consistently.

diff --git a/tests/zfs-tests/tests/functional/dedup/dedup_quota.ksh b/tests/zfs-tests/tests/functional/dedup/dedup_quota.ksh
index 3d5b80599..d62ac434e 100755
--- a/tests/zfs-tests/tests/functional/dedup/dedup_quota.ksh
+++ b/tests/zfs-tests/tests/functional/dedup/dedup_quota.ksh
@@ -189,10 +189,12 @@ function ddt_dedup_vdev_limit
        # add a dedicated dedup/special VDEV and enable an automatic quota
        if (( RANDOM % 2 == 0 )) ; then
                class="special"
+               size="200M"
        else
                class="dedup"
+               size="100M"
        fi
-       log_must truncate -s 200M $VDEV_DEDUP
+       log_must truncate -s $size $VDEV_DEDUP
        log_must zpool add $POOL $class $VDEV_DEDUP
        log_must zpool set dedup_table_quota=auto $POOL

tonyhutter

Nice detective work!

amotin · 2025-04-18T01:37:55Z

And of course the final version triggered OOMs on two platforms again. In a lack of additional ideas about OOMs I decided to replace the fio with honestly very weird set of parameters with a loop of dd and zpool sync calls. If nothing else, it should generate very predictable workload just as we need for this test. Lets see what happen now.

The test writes 1M of 1KB blocks, which may produce up to 1GB of dirty data. On top of that ashift=12 likely produces additional 4GB of ZIO buffers during sync process. On top of that we likely need some page cache since the pool reside on files. And finally we need to cache the DDT. Not surprising that the test regularly ends up in OOMs, possibly depending on TXG size variations. Also replace fio with pretty strange parameter set with a set of dd writes and TXG commits, just as we neeed here. While here, remove compression. It has nothing to do here, but waste CI CPU time. Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc.

amotin · 2025-04-18T13:17:02Z

It passed this time. Rebasing and restarting once more just in case.

Meanwhile, while formally the test passes, looking on the logs I still see a big discrepancy between reported DDT size 153105408 and dedup device ALLOC of 40.1M. I wonder if we somehow leak some part of DDT to the main pool. Though it may be complicated by the very small sizes of special/dedup vdevs.

The test writes 1M of 1KB blocks, which may produce up to 1GB of dirty data. On top of that ashift=12 likely produces additional 4GB of ZIO buffers during sync process. On top of that we likely need some page cache since the pool reside on files. And finally we need to cache the DDT. Not surprising that the test regularly ends up in OOMs, possibly depending on TXG size variations. Also replace fio with pretty strange parameter set with a set of dd writes and TXG commits, just as we neeed here. While here, remove compression. It has nothing to do here, but waste CI CPU time. Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Tony Hutter <[email protected]>

The test writes 1M of 1KB blocks, which may produce up to 1GB of dirty data. On top of that ashift=12 likely produces additional 4GB of ZIO buffers during sync process. On top of that we likely need some page cache since the pool reside on files. And finally we need to cache the DDT. Not surprising that the test regularly ends up in OOMs, possibly depending on TXG size variations. Also replace fio with pretty strange parameter set with a set of dd writes and TXG commits, just as we neeed here. While here, remove compression. It has nothing to do here, but waste CI CPU time. Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Tony Hutter <[email protected]> (cherry picked from commit 1d8f625)

amotin added the Status: Code Review Needed Ready for review and testing label Apr 17, 2025

amotin requested review from pcd1193182 and tonyhutter April 17, 2025 14:50

amotin force-pushed the dedup_quota_ashift branch from dabba94 to 0755384 Compare April 17, 2025 19:23

pcd1193182 approved these changes Apr 17, 2025

View reviewed changes

tonyhutter approved these changes Apr 17, 2025

View reviewed changes

amotin force-pushed the dedup_quota_ashift branch from 0755384 to 3a36804 Compare April 18, 2025 01:30

amotin force-pushed the dedup_quota_ashift branch from 3a36804 to cd75ac8 Compare April 18, 2025 01:47

amotin force-pushed the dedup_quota_ashift branch from cd75ac8 to 2e3d984 Compare April 18, 2025 13:12

amotin requested review from pcd1193182 and tonyhutter April 18, 2025 13:17

tonyhutter merged commit 1d8f625 into openzfs:master Apr 18, 2025
22 of 24 checks passed

amotin deleted the dedup_quota_ashift branch April 19, 2025 00:35

amotin added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Apr 19, 2025

robn mentioned this pull request May 23, 2025

2.3.3 staging prep #17371

Closed

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ZTS: Remove ashift setting from dedup_quota test #17250

ZTS: Remove ashift setting from dedup_quota test #17250

Uh oh!

amotin commented Apr 16, 2025 •

edited

Loading

Uh oh!

amotin commented Apr 17, 2025

Uh oh!

pcd1193182 commented Apr 17, 2025 •

edited

Loading

Uh oh!

pcd1193182 commented Apr 17, 2025

Uh oh!

amotin commented Apr 17, 2025

Uh oh!

pcd1193182 commented Apr 17, 2025

Uh oh!

amotin commented Apr 17, 2025

Uh oh!

pcd1193182 commented Apr 17, 2025 •

edited

Loading

Uh oh!

amotin commented Apr 17, 2025 •

edited

Loading

Uh oh!

pcd1193182 commented Apr 17, 2025

Uh oh!

pcd1193182 commented Apr 17, 2025

Uh oh!

tonyhutter left a comment

Uh oh!

amotin commented Apr 18, 2025

Uh oh!

amotin commented Apr 18, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

ZTS: Remove ashift setting from dedup_quota test #17250

ZTS: Remove ashift setting from dedup_quota test #17250

Uh oh!

Conversation

amotin commented Apr 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How Has This Been Tested?

Types of changes

Checklist:

Uh oh!

amotin commented Apr 17, 2025

Uh oh!

pcd1193182 commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pcd1193182 commented Apr 17, 2025

Uh oh!

amotin commented Apr 17, 2025

Uh oh!

pcd1193182 commented Apr 17, 2025

Uh oh!

amotin commented Apr 17, 2025

Uh oh!

pcd1193182 commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amotin commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pcd1193182 commented Apr 17, 2025

Uh oh!

pcd1193182 commented Apr 17, 2025

Uh oh!

tonyhutter left a comment

Choose a reason for hiding this comment

Uh oh!

amotin commented Apr 18, 2025

Uh oh!

amotin commented Apr 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

amotin commented Apr 16, 2025 •

edited

Loading

pcd1193182 commented Apr 17, 2025 •

edited

Loading

pcd1193182 commented Apr 17, 2025 •

edited

Loading

amotin commented Apr 17, 2025 •

edited

Loading

amotin commented Apr 18, 2025 •

edited

Loading