Skip to content

Commit c726be7

Browse files
committed
ARM: Fix signed/unsigned simd mismatch in vbool4::load
Fixes 3721 Please read the comments in 3721. This is the "local" fix for the build break due to the type mismatch. It is a band-aid. And it may be the best solution for the 2.4 branch if we don't want to potentially break ABIs by changing the definition of any public types. Still pending is to examine the issue of whether it was a mistake to define vbool4 storage for neon as uint32x4_t or if we should change it to int32x4_t to better match the non-simd reference implementation. After debating that (and identifying somebody with access to an ARM-based machine to test the solution for us), we may return to tackle this more fundamental change.
1 parent fc2c261 commit c726be7

File tree

1 file changed

+11
-4
lines changed

1 file changed

+11
-4
lines changed

src/include/OpenImageIO/simd.h

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3261,7 +3261,9 @@ OIIO_FORCEINLINE void vbool4::load (bool a, bool b, bool c, bool d) {
32613261
m_simd = _mm_castsi128_ps(_mm_set_epi32(-int(d), -int(c), -int(b), -int(a)));
32623262
#elif OIIO_SIMD_NEON
32633263
int values[4] = { -int(a), -int(b), -int(c), -int(d) };
3264-
m_simd = vld1q_s32 (values);
3264+
m_simd = vld1q_u32((const uint32_t*)values);
3265+
// this if we were using int:
3266+
// m_simd = vld1q_s32(values);
32653267
#else
32663268
m_val[0] = -int(a);
32673269
m_val[1] = -int(b);
@@ -3501,7 +3503,9 @@ OIIO_FORCEINLINE bool extract (const vbool4& a) {
35013503
#if OIIO_SIMD_SSE >= 4
35023504
return _mm_extract_epi32(_mm_castps_si128(a.simd()), i); // SSE4.1 only
35033505
#elif OIIO_SIMD_NEON
3504-
return vgetq_lane_s32(a, i);
3506+
return vgetq_lane_u32(a, i);
3507+
// this if we were using int:
3508+
// return vgetq_lane_s32(a, i);
35053509
#else
35063510
return a[i];
35073511
#endif
@@ -3514,8 +3518,11 @@ OIIO_FORCEINLINE vbool4 insert (const vbool4& a, bool val) {
35143518
int ival = -int(val);
35153519
return _mm_castsi128_ps (_mm_insert_epi32 (_mm_castps_si128(a), ival, i));
35163520
#elif OIIO_SIMD_NEON
3517-
int ival = -int(val);
3518-
return vld1q_lane_s32(&ival, a, i);
3521+
uint32_t ival = uint32_t(val ? -1 : 0);
3522+
return vld1q_lane_u32(&ival, a, i);
3523+
// this if we were using int:
3524+
// int ival = -int(val);
3525+
// return vld1q_lane_s32(&ival, a, i);
35193526
#else
35203527
vbool4 tmp = a;
35213528
tmp[i] = -int(val);

0 commit comments

Comments
 (0)