关于使用类型转换优化除法代码的问题



假设我有这样的代码

int value, store;
store = value / 2;

据我所知,除法比乘法重,因此,作为优化,我更改了代码如下:

store = value * 0.5f;

但是,由于 value 是一个整数,我必须强制转换它

store = (int)(value * 0.5f);

但据我所知,类型演员也被认为是一个沉重的动作。因此,我使用带有浮点数和整数的类型转换进行了优化测试

#include <iostream>
#include <Windows.h>
void main()
{
LARGE_INTEGER StartingTime, EndingTime, ElapsedMicroseconds = { 0 };
LARGE_INTEGER Frequency;
int store[INT16_MAX] = { 0, }, value = 10;
SetProcessPriorityBoost(GetCurrentProcess(), true);
// get frequency
QueryPerformanceFrequency(&Frequency);
// get starttime
QueryPerformanceCounter(&StartingTime);
for (int i = 0; i < INT16_MAX; i++)
store[i] = (int)(value * 0.5f);
// get endtime
QueryPerformanceCounter(&EndingTime);
// get process time
ElapsedMicroseconds.QuadPart = EndingTime.QuadPart - StartingTime.QuadPart;
ElapsedMicroseconds.QuadPart *= 1000000;
ElapsedMicroseconds.QuadPart /= Frequency.QuadPart;
std::cout << "Test Case 1 "<< std::endl;
std::cout << "I * f with Type Cast : "<< ElapsedMicroseconds.QuadPart << std::endl << std::endl;
QueryPerformanceCounter(&StartingTime);
for (int i = 0; i < INT16_MAX; i++)
store[i] = value / 2;
QueryPerformanceCounter(&EndingTime);
ElapsedMicroseconds.QuadPart = EndingTime.QuadPart - StartingTime.QuadPart;
ElapsedMicroseconds.QuadPart *= 1000000;
ElapsedMicroseconds.QuadPart /= Frequency.QuadPart;
std::cout << "Test Case 2 " << std::endl;
std::cout << "I / i with No Type Cast   : " << ElapsedMicroseconds.QuadPart << std::endl << std::endl;
float store2 = 0.f;
float value2 = 10.f;
QueryPerformanceCounter(&StartingTime);
for (int i = 0; i < INT16_MAX; i++)
{
store2 = value2 * 0.5f;
store[i] = (int)store2;
}
QueryPerformanceCounter(&EndingTime);
ElapsedMicroseconds.QuadPart = EndingTime.QuadPart - StartingTime.QuadPart;
ElapsedMicroseconds.QuadPart *= 1000000;
ElapsedMicroseconds.QuadPart /= Frequency.QuadPart;
std::cout << "Test Case 3 " << std::endl;
std::cout << "f * f with Type Cast   : " << ElapsedMicroseconds.QuadPart << std::endl << std::endl;

QueryPerformanceCounter(&StartingTime);
for (int i = 0; i < INT16_MAX; i++)
{
store2 = value2 / 2.f;
store[i] = (int)store2;
}
QueryPerformanceCounter(&EndingTime);
ElapsedMicroseconds.QuadPart = EndingTime.QuadPart - StartingTime.QuadPart;
ElapsedMicroseconds.QuadPart *= 1000000;
ElapsedMicroseconds.QuadPart /= Frequency.QuadPart;
std::cout << "Test Case 4 " << std::endl;
std::cout << "f / f with Type Cast   : " << ElapsedMicroseconds.QuadPart << std::endl << std::endl;

QueryPerformanceCounter(&StartingTime);
for (int i = 0; i < INT16_MAX; i++)
{
store2 = value2 / 2;
store[i] = (int)store2;
}
QueryPerformanceCounter(&EndingTime);
ElapsedMicroseconds.QuadPart = EndingTime.QuadPart - StartingTime.QuadPart;
ElapsedMicroseconds.QuadPart *= 1000000;
ElapsedMicroseconds.QuadPart /= Frequency.QuadPart;
std::cout << "Test Case 5 " << std::endl;
std::cout << "f / i with Type Cast   : " << ElapsedMicroseconds.QuadPart << std::endl << std::endl;
return ;
}
Run 1
Test Case 2
I / i with No Type Cast   : 63
Test Case 3
f * f with Type Cast   : 56

Run2
Test Case 2
I / i with No Type Cast   : 48
Test Case 3
f * f with Type Cast   : 55

但结果表明,有时情况2 更快,或者有时情况 3 更快。其他情况太大。

对此,你怎么看?你认为这个测试合适吗?还是我犯了什么错误?

对于像这样的紧密循环和小规模情况,通常最好查看生成的程序集(并记住打开优化)。

https://godbolt.org/z/a63g4y

如果您查看输出 - 您将看到编译器/优化器完成了它的工作,并且最终结果是相同的。

最新更新