Add AVX512BW variant of fletcher #9517

rdolbeau · 2019-10-27T09:23:37Z

It is much faster than AVX512F when byteswapping on Skylake-SP and newer, as we can do the byteswap in a single vshufb insteads of many instructions.

Signed-off-by: Romain Dolbeau [email protected]

Motivation and Context

Faster Fletcher on AXV512BW-capable machine. Performance only.

Description

This is the same code as AVX512, but changing the byteswap to the same principle as in the AVX (Intel) code: single pshufb instead of multiple instructions.

All AVX512-capable machine are also AVX512BW-capable, except for the Knight Landings processor (Intel Xeon Phi x1xx), so this should be generally useful.

How Has This Been Tested?

Pretty much only 'ztest' for validity.

Performance on a Xeon Platinum 8170:

̀cat /proc/spl/kstat/zfs/fletcher_4_bench 
0 0 0x01 -1 0 4687155613279 5306314643165
implementation   native         byteswap       
scalar           6598567882     2914387751     
superscalar      8619575466     3807285484     
superscalar4     7909377800     4574394878     
sse2             14823627397    8382296968     
ssse3            14871946829    13268853915    
avx2             22377596138    20723574960    
avx512f          33799684523    12442937636    
avx512bw         33785735010    29645697974    
fastest          avx512f        avx512bw

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
[ x] Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (a change to man pages or other documentation)

Checklist:

[x ] My code follows the ZFS on Linux code style requirements.
[x ] I have updated the documentation accordingly.
[ x] I have read the contributing document.
I have added tests to cover my changes.
[ x] All new and existing tests passed.
[ x] All commit messages are properly formatted and contain Signed-off-by.

codecov · 2019-10-27T17:50:11Z

Codecov Report

Merging #9517 into master will decrease coverage by 0.05%.
The diff coverage is 0%.

@@            Coverage Diff             @@
##           master    #9517      +/-   ##
==========================================
- Coverage   79.16%    79.1%   -0.06%     
==========================================
  Files         416      416              
  Lines      123661   123667       +6     
==========================================
- Hits        97895    97827      -68     
- Misses      25766    25840      +74

Flag	Coverage Δ
#kernel	`79.73% <0%> (-0.01%)`	⬇️
#user	`66.83% <0%> (-0.11%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d46f0de...2c365a4. Read the comment docs.

behlendorf

Thanks! This is definitely a nice improvement for the byteswap case, and I agree I'm sure this will be helpful to some systems. The code and approach taken look correct to me. I haven't yet been able to verify the patch on the appropriate hardware, but I'll see if I can round something up.

ironMann

Looks good! (I cannot try the code atm)

Perhaps avoid duplicating the code by adding the new byteswap method to zfs_fletcher_avx512.c, like it was done for sse case.

rdolbeau · 2019-10-29T18:24:43Z

@behlendorf @ironMann Thanks; I'll try to follow @ironMann suggestion and merge with avx512, as indeed that's a lot of duplicated code.

It is much faster than AVX512F when byteswapping on Skylake-SP and newer, as we can do the byteswap in a single vshufb insteads of many instructions. Signed-off-by: Romain Dolbeau <[email protected]>

behlendorf · 2019-10-30T18:43:58Z

I've manually tested the patched with ztest and everything looks good. This is ready to go. Also, just another data point for future reference:

$ cat /proc/spl/kstat/zfs/fletcher_4_bench 
0 0 0x01 -1 0 18336380950 24249484731
implementation   native         byteswap       
scalar           6228223463     4987307307     
superscalar      8169752501     4502635203     
superscalar4     7460699360     6105662245     
sse2             14031638315    7912012361     
ssse3            14027234462    12551782074    
avx2             21087205076    19234057034    
avx512f          32509506092    11911328563    
avx512bw         32521718573    28283999556    
fastest          avx512bw       avx512bw

It is much faster than AVX512F when byteswapping on Skylake-SP and newer, as we can do the byteswap in a single vshufb instead of many instructions. Reviewed by: Gvozden Neskovic <[email protected]> Reviewed-by: Chunwei Chen <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Romain Dolbeau <[email protected]> Closes openzfs#9517

rdolbeau force-pushed the simd-avx512bw-fletcher branch from d41614d to 6b14f8a Compare October 27, 2019 09:36

behlendorf added the Status: Code Review Needed Ready for review and testing label Oct 28, 2019

behlendorf approved these changes Oct 28, 2019

View reviewed changes

behlendorf requested review from ironMann and tuxoko October 28, 2019 23:11

ironMann approved these changes Oct 29, 2019

View reviewed changes

Add AVX512BW variant of fletcher

2c365a4

It is much faster than AVX512F when byteswapping on Skylake-SP and newer, as we can do the byteswap in a single vshufb insteads of many instructions. Signed-off-by: Romain Dolbeau <[email protected]>

rdolbeau force-pushed the simd-avx512bw-fletcher branch from 6b14f8a to 2c365a4 Compare October 30, 2019 07:52

ironMann approved these changes Oct 30, 2019

View reviewed changes

tuxoko approved these changes Oct 30, 2019

View reviewed changes

behlendorf approved these changes Oct 30, 2019

View reviewed changes

behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Oct 30, 2019

behlendorf merged commit 0b2a642 into openzfs:master Oct 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add AVX512BW variant of fletcher #9517

Add AVX512BW variant of fletcher #9517

Uh oh!

rdolbeau commented Oct 27, 2019 •

edited

Loading

Uh oh!

codecov bot commented Oct 27, 2019 •

edited

Loading

Uh oh!

behlendorf left a comment

Uh oh!

ironMann left a comment

Uh oh!

rdolbeau commented Oct 29, 2019

Uh oh!

behlendorf commented Oct 30, 2019

Uh oh!

Uh oh!

Add AVX512BW variant of fletcher #9517

Add AVX512BW variant of fletcher #9517

Uh oh!

Conversation

rdolbeau commented Oct 27, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist:

Uh oh!

codecov bot commented Oct 27, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

behlendorf left a comment

Choose a reason for hiding this comment

Uh oh!

ironMann left a comment

Choose a reason for hiding this comment

Uh oh!

rdolbeau commented Oct 29, 2019

Uh oh!

behlendorf commented Oct 30, 2019

Uh oh!

Uh oh!

rdolbeau commented Oct 27, 2019 •

edited

Loading

codecov bot commented Oct 27, 2019 •

edited

Loading