在C++中使用OpenMP实现递归函数的并行化



我有以下递归程序,我想使用OpenMP:进行并行处理

#include <iostream>
#include <cmath>
#include <numeric>
#include <vector>
#include <algorithm>
#include <thread>
#include <omp.h>

// Determines if a point of dimension point.size() is within the sphere
bool isPointWithinSphere(std::vector<int> point, const double &radius) {
// Since we know that the sphere is centered at the origin, we can simply
// find the euclidean distance (square root of the sum of squares) and check to
// see if it is less than or equal to the length of the radius 
//square each element inside the point vector
std::transform(point.begin(), point.end(), point.begin(), [](auto &x){return std::pow(x,2);});
//find the square root of the sum of squares and check if it is less than or equal to the radius
return std::sqrt(std::accumulate(point.begin(), point.end(), 0, std::plus<int>())) <= radius;    
}
// Counts the number of lattice points inside the sphere( all points (x1 .... xn) such that xi is an integer )
// The algorithm: If the radius is a floating point value, first find the floor of the radius and cast it to 
// an integer. For example, if the radius is 2.43 then the only integer points we must check are those between
// -2 and 2. We generate these points by simulating n - nested loops using recursion and passing each point
// in to the boolean function isPointWithinSphere(...), if the function returns true, we add one to the count
// (we have found a lattice point on the sphere). 
int countLatticePoints(std::vector<int> point, const double radius, const int dimension, int count = 0) {
const int R = static_cast<int>(std::floor(radius));
#pragma omp parallel for
for(int i = -R; i <= R; i++) {
point.push_back(i);
if(point.size() == dimension){
if(isPointWithinSphere(point, radius)) count++;
}else count = countLatticePoints(point, radius, dimension, count);
point.pop_back();
}
return count;
}
int main(int argc, char ** argv) {
std::vector<int> vec;
#pragma omp parallel
std::cout << countLatticePoints(vec, 5, 7) << std::endl;   
return 0;
}

我曾尝试在主函数中添加一个并行区域,并在countLatticePoints中并行化for循环,但与顺序运行算法相比,并行化几乎没有任何改进。对于我可以使用的其他OpenMP策略,任何帮助/建议都将不胜感激。

我将采取建议路线。在尝试使用线程使程序更快之前,您首先希望在单线程的情况下使程序更快。你可以做一些改进。您正在对点向量进行大量复制,这会导致大量昂贵的内存分配。

point传递到isPointWithinSphere中作为参考。然后,使用一个循环对point中的每个元素进行平方和累加,而不是使用两个循环。然后,在检查半径时,比较距离的平方,而不是距离。这避免了sqrt调用,并将其替换为一个简单的正方形。

countLatticePoints也应参考point。与其调用point.size(),不如每次递归时从dimension中减去1,然后只检查dimension == 1,而不是计算大小。

尽管如此,如果您仍然希望/需要引入线程,那么由于逐点传递引用,您需要进行一些调整。countLatticePoint需要有两个变体,一个是包含OpenMP指令的初始调用,另一个是不包含它们的递归调用。

main中的#pragma omp parallel不会执行任何操作,因为只有一个代码块要执行。

最新更新