You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SIMD: faster vint4 load/store with unsigned char conversion
vint4::load from unsigned char pointer got pre-SSE4 code path. Testing
on Ryzen 5950X / VS2022 (with only SSE2 enabled in the build):
- vint4 load from unsigned char[]: 946.1 -> 4232.8 Mvals/sec
vint4::store to unsigned char pointer got simpler/faster SSE code path,
and a NEON code path. Additionally, it got test correctness coverage,
including what happens to values outside of unsigned char range
(current behavior just masks lowest byte, i.e. does not clamp the
integer lanes).
- vint4 store to unsigned char[]: 3489.8 -> 3979.3 Mvals/sec
- vint8 store to unsigned char[]: 5516.9 -> 7325.3 Mvals/sec
NEON code path as tested on Mac M1 Max (clang 15):
- vint4 store to unsigned char[]: 4137.2 -> 6074.8 Mvals/sec
Signed-off-by: Aras Pranckevicius <[email protected]>
0 commit comments