std::for_each和std::execution::par_unseq不适用于GCC,但适用于MSVC



我想将for循环并行化,并了解了std::for_each及其execution policies。令人惊讶的是,当使用GCC:时,它没有并行化

#include <iostream>
#include <algorithm>
#include <execution>
#include <chrono>
#include <thread>
#include <random>
int main() {
std::vector<int> foo;
foo.reserve(1000);
for (int i = 0; i < 1000; i++) {
foo.push_back(i);
}
std::for_each(std::execution::par_unseq,
foo.begin(), foo.end(),
[](auto &&item) {
std::cout << item << std::endl;
std::random_device dev;
std::mt19937 rng(dev());
std::uniform_int_distribution<std::mt19937::result_type> dist6(10, 100);
std::this_thread::sleep_for(std::chrono::milliseconds(dist6(rng)));
std::cout << "Thread ID: " << std::this_thread::get_id() << std::endl;
});
}

此代码仍然按顺序运行。

使用MSVC,代码被并行化,并且完成得更快。

GCC:

$ gcc --version
gcc (Ubuntu 10.1.0-2ubuntu1~18.04) 10.1.0
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

MSVC:

>cl.exe
Microsoft (R) C/C++ Optimizing Compiler Version 19.27.29112 for x86
Copyright (C) Microsoft Corporation.  All rights reserved.
usage: cl [ option... ] filename... [ /link linkoption... ]

CMakeLists.txt:

cmake_minimum_required(VERSION 3.17)
project(ParallelTesting)
set(CMAKE_CXX_STANDARD 20)
add_executable(ParallelTesting main.cpp)

我还需要做些什么来实现GCC的并行化吗?

我的二进制文件的ldd输出:

$ ldd my_binary
linux-vdso.so.1 (0x00007ffe9e6b9000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f79efaa0000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f79ef881000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f79ef4ad000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f79ef295000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f79eeea4000)
/lib64/ld-linux-x86-64.so.2 (0x00007f79f041a000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f79eeb06000)

二进制整体的debugrelease版本具有相同的ldd输出。

我首先将WSLUbuntu发行版从版本18.04升级到20.04,解决了这个问题,因为在运行sudo apt install gcc libtbb-dev安装TBB后,我仍然收到以下错误:#error Intel(R) Threading Building Blocks 2018 is required; older versions are not supported.这是由于TBB太旧造成的。

现在安装了TBB 2002.1-2,它可以按预期工作:

$ sudo apt install libtbb-dev
[sudo] password for ubuntu:
Reading package lists... Done
Building dependency tree
Reading state information... Done
libtbb-dev is already the newest version (2020.1-2).
0 upgraded, 0 newly installed, 0 to remove and 10 not upgraded.

这个答案很好地描述了所有的细节。

由于我使用的是CMake,我还必须在CMakeLists.txt中添加以下行:

# Link against the dependency of Intel TBB (for parallel C++17 algorithms)
target_link_libraries(${PROJECT_NAME} tbb)

我也遇到了同样的问题,@BullyWiiPlaza的回答帮助我使用了所需的库,并验证了编译器的操作。

我面临的另一个问题是,库认为我提供给for_each(execution::par_unseq, …的工作太小,无法进行并行化。我的假设是,库会安排每个线程在迭代器序列的不同部分多次调用函数。

我通过自己创建更大的块来解决这个问题。

typedef pair<micro_work_type::iterator, micro_work_type::iterator> work_type;
void
worker(work_type &be)
{
for (auto v = be.first; v != be.second; v++)
// Work on *v
}
[…]
vector <work_type> chunks;
auto pos = micro_work.begin();
auto begin = pos;
size_t i;
for (i = 0; i < micro_work.size(); i++, pos++) {
if (i > 0 && i % BATCH_SIZE == 0) {
chunks.push_back(pair{begin, pos});
begin = pos;
}
}
if (i > 0 && i % BATCH_SIZE != 0)
chunks.push_back(pair{begin, pos});
for_each(execution::par_unseq, chunks.begin(), chunks.end(), worker);

相关内容

最新更新