c-OpenMP并行的两种方法之间的差异

我只想通过OpenMP的摘要来评估函数的集成，方法是使用数组来保存每个步骤中计算的每个值>取所有值的和；并在没有数组的情况下求和。

代码为：

double f(double x)
{
return sin(x)*sin(x)/(x*x+1);
}

方法1

long i = 0;
const long NUM_STEP = 100000;
double sum[NUM_STEP];
double from = 0.0, to = 1.0;
double step = (to - from)/NUM_STEP;
double result = 0;
#pragma omp parallel for shared(sum) num_threads(4)
for(i=0; i<NUM_STEP; i++)
sum[i] = step*f(from+i*step);
for(i=0; i<NUM_STEP; i++)
result += sum[i];
printf("%lf", result);

方法2

long i = 0;
const long NUM_STEP = 100000;
double from = 0.0, to = 1.0;
double step = (to - from)/NUM_STEP;
double result = 0;
#pragma omp parallel for shared(result) num_threads(4)
for(i=0; i<NUM_STEP; i++)
result += step*f(from+i*step);
printf("%lf", result);

但结果差别太大了。方法1给出一个稳定的值，但方法2给出一个可更改的值。这里有一个例子：

方法1:0.178446

方法2:0.158738

METHOD 1的值为true(由另一个工具检查(。

TL；DR第一种方法没有竞赛条件，而第二种方法有。

第一种方法不具有竞争条件，而第二种方法具有竞争条件。即，在第一种方法中：

#pragma omp parallel for shared(sum) num_threads(4)
for(i=0; i<NUM_STEP; i++)
sum[i] = step*f(from+i*step);
for(i=0; i<NUM_STEP; i++)
result += sum[i];

每个线程将运算结果CCD_ 1保存在数组CCD_。然后master线程依次减少保存在数组sum上的值，即：

for(i=0; i<NUM_STEP; i++)
result += sum[i];

事实上，你可以在这个版本上做一点小小的改进；不必将数组sum分配为与NUM_STEP数量相同的大小，只需将其分配为与线程数量相同的尺寸，每个线程将保存在与其ID相等的位置，即：

int total_threads = 4;
double sum[total_threads];
#pragma omp parallel num_threads(total_threads)
{
int thread_id = omp_get_thread_num();
for(i=0; i<NUM_STEP; i++)
sum[thread_id] += step*f(from+i*step);
for(i=0; i< total_threads; i++)
result += sum[i];
}

尽管如此，最好的方法将是实际修复第二种方法。

在第二种方法中，在变量result:的更新上存在竞赛条件

#pragma omp parallel for shared(result) num_threads(4)
for(i=0; i<NUM_STEP; i++)
result += step*f(from+i*step);

因为CCD_ 8变量正由多个线程以非线程安全的方式同时更新。

要解决这个竞赛条件，您需要使用reduction子句：

#pragma omp parallel for reduction(+:result) num_threads(4)
for(i=0; i<NUM_STEP; i++)
result += step*f(from+i*step);

相关内容

最新更新

热门标签：