Armadillo vs循环向量乘法

当我需要逐元素乘以两个复数向量时，我想比较Armadillo的性能。我写了一个计算处理时间的简单测试。乘法有两种实现方式：Armadillo元素乘法和std：：vector上的简单循环。以下是测试来源：

#include <iostream>
#include <armadillo>
#include <stdlib.h>
using namespace std;
using namespace arma;
#include <complex>
#include <chrono>
using namespace std::chrono;
#define VEC_SIZE 204800
main(int argc, char** argv) {
const int iterations = 1000;
cout << "Armadillo version: " << arma_version::as_string() << endl;
//duration<double> lib_cnt, vec_cnt;
uint32_t lib_cnt = 0, vec_cnt = 0;
for (int it = 0; it < iterations; it++) {
// init input vectors
std::vector<complex<float>> vf1(VEC_SIZE);
std::fill(vf1.begin(), vf1.end(), complex<float>(4., 6.));
std::vector<complex<float>> vf2(VEC_SIZE);
std::fill(vf2.begin(), vf2.end(), 5.);
std::vector<complex<float>> vf_res(VEC_SIZE);
// init arma vectors
Col<complex<float>> vec1(vf1);
Col<complex<float>> vec2(vf2);

// time for loop duration
auto t0 = high_resolution_clock::now();
for (int vec_idx = 0; vec_idx < VEC_SIZE; vec_idx++) {
vf_res[vec_idx] = vf1[vec_idx] * vf2[vec_idx];
}
auto t1 = high_resolution_clock::now();
vec_cnt += duration_cast<milliseconds>(t1 - t0).count();
for (int vec_idx = 0; vec_idx < VEC_SIZE; vec_idx++) {
complex<float> s = vf_res[vec_idx];
}

Col<complex<float>> mul_res(VEC_SIZE);
// time arma element wise duration
t0 = high_resolution_clock::now();
mul_res = vec1 % vec2;
t1 = high_resolution_clock::now();
lib_cnt += duration_cast<milliseconds>(t1 - t0).count();
}
cout << "for loop time " << vec_cnt << " msecn";
cout << "arma time " << lib_cnt << " msecn";
return 0;
}

结果如下：

$ g++ example1.cpp -o example1 -O2 -larmadillo 
$ ./example1
Armadillo version: 9.200.5 (Carpe Noctem)
for loop time 2060 msec
arma time 3049 msec

我预计armadillo的繁殖速度会比简单的循环更快。或者我错了？是否期望for循环更快地乘以两个向量？

这不是问题的答案，更像是一种观察。如果您将代码重组为两个独立的循环：

#define VEC_SIZE 204800
main(int argc, char** argv)
{
const int iterations = 1000;
cout << "Armadillo version: " << arma_version::as_string() << endl;
//duration<double> lib_cnt, vec_cnt;
uint32_t lib_cnt = 0, vec_cnt = 0;
// init input vectors
std::vector<complex<float>> vf1(VEC_SIZE);
std::fill(vf1.begin(), vf1.end(), complex<float>(4., 6.));
std::vector<complex<float>> vf2(VEC_SIZE);
std::fill(vf2.begin(), vf2.end(), 5.);
std::vector<complex<float>> vf_res(VEC_SIZE);
// init arma vectors
Col<complex<float>> vec1(vf1);
Col<complex<float>> vec2(vf2);
Col<complex<float>> mul_res(VEC_SIZE);
high_resolution_clock::time_point t0,t1;
for (int it = 0; it < iterations; it++){
// time for loop duration
t0 = high_resolution_clock::now();
for (int vec_idx = 0; vec_idx < VEC_SIZE; vec_idx++){
vf_res[vec_idx] = vf1[vec_idx] * vf2[vec_idx];
}
t1 = high_resolution_clock::now();
vec_cnt += duration_cast<milliseconds>(t1 - t0).count();
#if 1
}
for (int it = 0; it < iterations; it++){
#endif
// time arma element wise duration
t0 = high_resolution_clock::now();
mul_res = vec1 % vec2;
t1 = high_resolution_clock::now();
lib_cnt += duration_cast<milliseconds>(t1 - t0).count();
}
cout << "for loop time " << vec_cnt << " msecn";
cout << "arma time " << lib_cnt << " msecn";
return 0;
}

然后的结果

Armadillo version: 8.500.1 (Caffeine Raider)
for loop time 169 msec
arma time 244 msec

至

Armadillo version: 8.500.1 (Caffeine Raider)
for loop time 187 msec
arma time 22 msec

这更像是一个预期的结果。然而，我无法解释为什么。。。

使用gcc7.3.0和openBlas在Core i5 M520上编译，Ubuntu 18.04

相关内容

最新更新

热门标签：