奇怪的基准测试结果

我编写了以下基准：

#include <iostream> // cout
#include <math.h>   // pow
#include <chrono>   // high_resolution_clock    
using namespace std;
using namespace std::chrono;
int64_t calculate(int);
int main()
{
    high_resolution_clock::time_point t1, t2;
    // Test 1
    t1 = high_resolution_clock::now();
    calculate(200);
    t2 = high_resolution_clock::now();
    cout << "RUNTIME = " <<  duration_cast<nanoseconds>(t2 - t1).count() << " nano seconds" << endl;
    // Test 2   
    t1 = high_resolution_clock::now();
    calculate(200000);
    t2 = high_resolution_clock::now();
    cout << "RUNTIME = " <<  duration_cast<nanoseconds>(t2 - t1).count() << " nano seconds" << endl;
}
int64_t calculate(const int max_exponent)
{
    int64_t num = 0;
    for(int i = 0; i < max_exponent; i++)
    {
        num += pow(2, i);
    }
    return num;
}

在Odroid XU3上运行此基准测试时，会产生以下输出（8次运行）：

RUNTIME TEST 1 = 1250 nano seconds
RUNTIME TEST 2 = 1041 nano seconds
RUNTIME TEST 1 = 1292 nano seconds
RUNTIME TEST 2 = 1042 nano seconds
RUNTIME TEST 1 = 1250 nano seconds
RUNTIME TEST 2 = 1083 nano seconds
RUNTIME TEST 1 = 1292 nano seconds
RUNTIME TEST 2 = 1083 nano seconds
RUNTIME TEST 1 = 1209 nano seconds
RUNTIME TEST 2 = 1084 nano seconds
RUNTIME TEST 1 = 1166 nano seconds
RUNTIME TEST 2 = 1083 nano seconds
RUNTIME TEST 1 = 1292 nano seconds
RUNTIME TEST 2 = 1042 nano seconds
RUNTIME TEST 1 = 1166 nano seconds
RUNTIME TEST 2 = 1250 nano seconds
RUNTIME TEST 1 = 1250 nano seconds
RUNTIME TEST 2 = 1250 nano seconds

第二个指数是第一个指数的1000倍。为什么第二次通话有时完成得更快？

我使用GCC（4.8）作为带有-Ofast标志的编译器。

更新：我可以在i7 4770k上重现类似的行为。

简单的答案是"死代码消除"。编译器发现，您永远不会使用调用函数的结果（而且函数没有副作用），因此它只是取消了对函数的调用。

打印出函数的结果，情况会发生一些变化。例如：

Ignore: -9223372036854775808    RUNTIME = 0 nano seconds
Ignore: -9223372036854775808    RUNTIME = 23001300 nano seconds

修改后的代码，如果你关心：

#include <iostream> // cout
#include <math.h>   // pow
#include <chrono>   // high_resolution_clock    
using namespace std;
using namespace std::chrono;
int64_t calculate(int);
int main() {
    high_resolution_clock::time_point t1, t2;
    // Test 1
    t1 = high_resolution_clock::now();
    auto a = calculate(200);
    t2 = high_resolution_clock::now();
    std::cout << "Ignore: " << a << "t";
    cout << "RUNTIME = " << duration_cast<nanoseconds>(t2 - t1).count() << " nano seconds" << endl;
    // Test 2   
    t1 = high_resolution_clock::now();
    auto b = calculate(200000);
    t2 = high_resolution_clock::now();
    std::cout << "Ignore: " << b << "t";
    cout << "RUNTIME = " << duration_cast<nanoseconds>(t2 - t1).count() << " nano seconds" << endl;
}
int64_t calculate(const int max_exponent) {
    int64_t num = 0;
    for (int i = 0; i < max_exponent; i++) {
        num += pow(2, i);
    }
    return num;
}

从那里你得到了一个小细节，即你溢出了int64_t的范围（多次），给出了未定义的行为——但至少有了这个，打印出来的时间反映了执行指定计算的时间是合理的。

这可能是在CPU缓存的帮助下发生的或者，很可能是编译器的优化。尝试使用-O0禁用优化并比较结果。我在我的机器上重复了一遍，有没有"-O0"，得到了完全不同的结果。

相关内容

最新更新

热门标签：