RAIDZ: Use cache blocking during parity math #15448

amotin · 2023-10-24T21:16:20Z

RAIDZ parity is calculated by adding data one column at a time. It works OK for small blocks, but for large blocks results of previous addition may already be evicted from CPU caches to main memory, and in addition to extra memory write require extra read to get it back.

This patch splits large parity operations into 64KB chunks, that should in most cases fit into CPU L2 caches from the last decade. I haven't touched more complicated cases of data reconstruction to not overcomplicate the code. Those should be relatively rare.

My tests on Xeon Gold 6242R CPU with 1MB of L2 cache per core show up to 10/20% memory traffic reduction when writing to 4-wide RAIDZ/RAIDZ2 blocks of ~4MB and up. Older CPUs with 256KB of L2 cache should see the effect even on smaller blocks. Wider vdevs may need bigger blocks to be affected.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the OpenZFS code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
I have run the ZFS Test Suite with this change applied.
All commit messages are properly formatted and contain Signed-off-by.

amotin · 2023-10-24T21:22:34Z

Writing ~6.8GB/s to 4-wide RAIDZ with different block sizes:

Writing ~4.8GB/s to 4-wide RAIDZ2 with different block sizes:

RAIDZ parity is calculated by adding data one column at a time. It works OK for small blocks, but for large blocks results of previous addition may already be evicted from CPU caches to main memory, and in addition to extra memory write require extra read to get it back. This patch splits large parity operations into 64KB chunks, that should in most cases fit into CPU L2 caches from the last decade. I haven't touched more complicated cases of data reconstruction to not overcomplicate the code. Those should be relatively rare. My tests on Xeon Gold 6242R CPU with 1MB of L2 cache per core show up to 10/20% memory traffic reduction when writing to 4-wide RAIDZ/ RAIDZ2 blocks of ~4MB and up. Older CPUs with 256KB of L2 cache should see the effect even on smaller blocks. Wider vdevs may need bigger blocks to be affected. Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc.

RAIDZ parity is calculated by adding data one column at a time. It works OK for small blocks, but for large blocks results of previous addition may already be evicted from CPU caches to main memory, and in addition to extra memory write require extra read to get it back. This patch splits large parity operations into 64KB chunks, that should in most cases fit into CPU L2 caches from the last decade. I haven't touched more complicated cases of data reconstruction to not over complicate the code. Those should be relatively rare. My tests on Xeon Gold 6242R CPU with 1MB of L2 cache per core show up to 10/20% memory traffic reduction when writing to 4-wide RAIDZ/ RAIDZ2 blocks of ~4MB and up. Older CPUs with 256KB of L2 cache should see the effect even on smaller blocks. Wider vdevs may need bigger blocks to be affected. Reviewed-by: Brian Atkinson <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#15448

behlendorf added Type: Performance Performance improvement or performance problem Status: Code Review Needed Ready for review and testing labels Oct 24, 2023

amotin force-pushed the raidz_cache branch 2 times, most recently from 3c1266e to 585b9a3 Compare October 24, 2023 21:47

amotin force-pushed the raidz_cache branch from 585b9a3 to bd44fa2 Compare October 25, 2023 13:58

amotin requested review from bwatkinson and behlendorf October 25, 2023 14:21

bwatkinson approved these changes Oct 26, 2023

View reviewed changes

behlendorf approved these changes Oct 30, 2023

View reviewed changes

behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Oct 30, 2023

behlendorf merged commit 05a7348 into openzfs:master Oct 30, 2023

amotin deleted the raidz_cache branch March 22, 2024 17:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RAIDZ: Use cache blocking during parity math #15448

RAIDZ: Use cache blocking during parity math #15448

Uh oh!

amotin commented Oct 24, 2023 •

edited

Loading

Uh oh!

amotin commented Oct 24, 2023

Uh oh!

Uh oh!

RAIDZ: Use cache blocking during parity math #15448

RAIDZ: Use cache blocking during parity math #15448

Uh oh!

Conversation

amotin commented Oct 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Types of changes

Checklist:

Uh oh!

amotin commented Oct 24, 2023

Uh oh!

Uh oh!

amotin commented Oct 24, 2023 •

edited

Loading