我在8核的Windows PC上运行了以下程序。函数("function"(的运行时间非常短(几百微秒(。然而,线程3所需的运行时间通常是线程1的1.5倍。对此有何解释?
using namespace std::chrono;
double FUNCTION(){
double result=0.0;
for(int i=0; i<20'000; i++){
result=result+sqrt(i);
}
return result;
}
int main()
{
auto start1 = steady_clock::now();
auto start2 = steady_clock::now();
auto start3 = steady_clock::now();
auto thread1= std::async( std::launch::async, FUNCTION);
auto thread2= std::async( std::launch::async, FUNCTION);
auto thread3= std::async( std::launch::async, FUNCTION);
double res1 = thread1.get();
auto stop1 = steady_clock::now();
double res2 = thread2.get();
auto stop2 = steady_clock::now();
double res3 = thread3.get();
auto stop3 = steady_clock::now();
auto duration1 = std::chrono::duration_cast<std::chrono::microseconds>(stop1 - start1);
std::cout << "Duration Thread 1: "<<duration1.count() << std::endl;
auto duration2 = duration_cast<microseconds>(stop2 - start2);
std::cout << "Duration Thread 2: "<<duration2.count()<<std::endl;
auto duration3 = duration_cast<microseconds>(stop3 - start3);
std::cout << "Duration Thread 3: "<<duration3.count()<<std::endl;
return 0;
}
是因为处理线程所需的时间吗?
如果是这样,是否有一个近似的估计,在函数的哪个运行时,并行化调用是有意义的?
当您获得最后一个线程的结束时间时,您正在测量对std::async
的所有调用,并等待全部结果。
我建议您一次只测量单个线程的时间,存储时间,然后单独报告。
也许是这样的:
using clock = std::chrono::high_resolution_clock;
constexpr size_t number_of_threads = 3;
std::vector<std::pair<clock::time_point, clock::time_point>> times(number_of_threads);
for (size_t t = 0; t < number_of_threads; ++t)
{
auto start = clock::now();
// Start the thread and wait for it to finish
auto thread = std::async(std::launch::async, FUNCTION);
(void) thread.get();
auto end = clock::now();
// Store the times
times[t] = std::make_pair(start, end);
}
// All threads are now finished, report the times
for (size_t t = 0; t < number_of_threads; ++t)
{
auto duration = std::chrono::duration_cast<std::chrono::microseconds>(times[t].second - times[t].first);
std::cout << "Duration thread #" << (t + 1) << ": " << duration.count() << " usn";
}