-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Add AVX512BW variant of fletcher #9517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d41614d
to
6b14f8a
Compare
Codecov Report
@@ Coverage Diff @@
## master #9517 +/- ##
==========================================
- Coverage 79.16% 79.1% -0.06%
==========================================
Files 416 416
Lines 123661 123667 +6
==========================================
- Hits 97895 97827 -68
- Misses 25766 25840 +74
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This is definitely a nice improvement for the byteswap case, and I agree I'm sure this will be helpful to some systems. The code and approach taken look correct to me. I haven't yet been able to verify the patch on the appropriate hardware, but I'll see if I can round something up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! (I cannot try the code atm)
Perhaps avoid duplicating the code by adding the new byteswap method to zfs_fletcher_avx512.c
, like it was done for sse case.
@behlendorf @ironMann Thanks; I'll try to follow @ironMann suggestion and merge with avx512, as indeed that's a lot of duplicated code. |
It is much faster than AVX512F when byteswapping on Skylake-SP and newer, as we can do the byteswap in a single vshufb insteads of many instructions. Signed-off-by: Romain Dolbeau <[email protected]>
6b14f8a
to
2c365a4
Compare
I've manually tested the patched with
|
It is much faster than AVX512F when byteswapping on Skylake-SP and newer, as we can do the byteswap in a single vshufb instead of many instructions. Reviewed by: Gvozden Neskovic <[email protected]> Reviewed-by: Chunwei Chen <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Romain Dolbeau <[email protected]> Closes openzfs#9517
It is much faster than AVX512F when byteswapping on Skylake-SP and newer, as we can do the byteswap in a single vshufb insteads of many instructions.
Signed-off-by: Romain Dolbeau [email protected]
Motivation and Context
Faster Fletcher on AXV512BW-capable machine. Performance only.
Description
This is the same code as AVX512, but changing the byteswap to the same principle as in the AVX (Intel) code: single pshufb instead of multiple instructions.
All AVX512-capable machine are also AVX512BW-capable, except for the Knight Landings processor (Intel Xeon Phi x1xx), so this should be generally useful.
How Has This Been Tested?
Pretty much only 'ztest' for validity.
Performance on a Xeon Platinum 8170:
Types of changes
Checklist:
Signed-off-by
.