基于浮点向量的SSE缩减,我试图对unsigned long long的数组求和,但不幸的是没有任何成功。
uint64_t vsum_uint64 (uint64_t *a, int n)
{
uint64_t sum; // lets say sum fits
__m128 vsum = _mm_set1_ps(0);
for (int i = 0; i < n; i += 2) { // 2 unit64 in single __m128
__m128 v = _mm_loadl_epi64(&a[i]);
vsum = _mm_add_epi64(vsum, v);
}
_mm_store_ss(&sum, vsum);
uint64_t *p = &vsum;
sum+=*(p+1);
// vsum = _mm_hadd_ps(vsum, vsum);
// vsum = _mm_hadd_ps(vsum, vsum);
return sum;
}
这应该是正确的,但是gcc仍然不能编译它。我寻找答案,但没有找到。
这是gcc说的:
main.cpp: In function ‘uint64_t vsum_uint64(const uint64_t*, int)’:
main.cpp:73:35: error: cannot convert ‘const uint64_t* {aka const long unsigned int*}’ to ‘const __m128i* {aka const __vector(2) long long int*}’ for argument ‘1’ to ‘__m128i _mm_loadl_epi64(const __m128i*)’
main.cpp:74:31: error: cannot convert ‘__m128 {aka __vector(4) float}’ to ‘__m128i {aka __vector(2) long long int}’ for argument ‘1’ to ‘__m128i _mm_add_epi64(__m128i, __m128i)’
main.cpp:77:25: error: cannot convert ‘uint64_t* {aka long unsigned int*}’ to ‘float*’ for argument ‘1’ to ‘void _mm_store_ss(float*, __m128)’
main.cpp:78:17: error: cannot convert ‘__m128* {aka __vector(4) float*}’ to ‘uint64_t* {aka long unsigned int*}’ in initialization
你能帮我一下吗?我真的很感激。
谢谢
以下是一些事情:
-
用
__m128i
代替__m128
-
您可以使用
__m128i vsum = _mm_setzero_si128()
零初始化vsum
; -
用于数据加载,强制转换为适当的__m128i类型并使用打包加载版本(_mm_loadl_epi64只加载一个64位整数)。所以,要么
for (int i = 0; i < n; i += 2) { // 2 uint64 in single __m128i __m128i v = _mm_loadu_si128(reinterpret_cast<__m128i*>(&a[i]));
或
__m128i* pa = reinterpret_cast<__m128i*>(a); for (int i = 0; i < n; i += 2) { // 2 uint64 in single __m128i __m128i v = _mm_loadu_si128(pa); pa++;
-
最后你可以使用
sum = vsum.m128i_u64[0] + vsum.m128i_u64[1];
分配给sum,如果有一个定义的联合(有在windows/Visual-Studio,但你使用不同的环境)。