c-OPENMP串行版本的代码比并行版本的代码快,我该如何修复它



这里有一个程序可以进行一些计算,我正试图使用线程使它运行得更快,但我无法使它比串行版本运行得更快。

串行输出为

Function: y(x) = sin(x) [note that x is in radians]
Limits of integration: x1 = 7.000000, x2 = 2.000000 
Riemann sum with 5000 steps, a.k.a. dx=-0.001000 
Computed integral: 1.170175 
Exact integral:    1.170049 
Percent error:     0.010774 % 
Work took 0.182533 milliseconds

并行输出为

Function: y(x) = sin(x) [note that x is in radians]
Limits of integration: x1 = 7.000000, x2 = 2.000000 
Riemann sum with 5000 steps, a.k.a. dx=-0.001000 
Computed integral: 1.170175 
Exact integral:    1.170049 
Percent error:     0.010774 % 
Work took 0.667334 milliseconds

这是代码

#include <stddef.h>  // for size_t
#include <stdio.h>
#include <stdlib.h>     /* atoi, atof */
#include <math.h>
#include "omp.h" // just used for timing
int main(int argc, char *argv[]) {
double start, end;
start = omp_get_wtime(); // Start our work timer
// BEGIN TIMED CODE BLOCK
double x, y;
double x1, x2; // Limits of integration
double dx;
double ysum, integral;
size_t i;
size_t nsteps;
// Read in command line arguments
x1 = atof(argv[1]); // lower x limit
x2 = atof(argv[2]); // upper x limit
nsteps = atof(argv[3]); // number of steps in Riemann sum
omp_set_num_threads(2); 
// Compute delta x
dx = (x2 - x1)/nsteps; // delta x for the Riemann sum
// Perform numeric integration via Riemann sum
ysum = 0;
// Temporary variable to hold the sum prior to multiplication by dx
#pragma omp parallel shared(ysum) private(x,y)
{
#pragma omp for 
for (i=0; i<nsteps; i++) {
x = x1 + i*dx; // x value at this step
y = sin(x); // y(x) at this step; note that x is always in radians
#pragma omp critical
ysum += y; // summation of y(x)
}               
#pragma omp critical
integral = ysum * dx; // Our computed integral: the summation of y(x)*dx
// END TIMED CODE BLOCK
}

end = omp_get_wtime(); // Stop our work timer
double analytic = -cos(x2) + cos(x1); // The known, exact answer to this integration problem
printf("Function: y(x) = sin(x) [note that x is in radians]n");
printf("Limits of integration: x1 = %lf, x2 = %lf n", x1, x2);
printf("Riemann sum with %ld steps, a.k.a. dx=%lf n", nsteps, dx); 
printf("Computed integral: %lf n", integral);
printf("Exact integral:    %lf n", analytic);
printf("Percent error:     %lf %% n", fabs((integral - analytic) / analytic)*100);
printf("Work took %f millisecondsn", 1000 * (end - start));
return 0;
}

当我删除关键部分时,输出发生了变化,所以我认为我在做了正确的事情

每次使用#pragma omp critical都会对有效的多线程设置障碍。您可以使用#pragma omp parallel for指令和reduction子句来并行化循环。

#pragma omp parallel for reduction(+:ysum)
for (int i = 0; i < nsteps; ++i) {
auto x = x1 + i * dx;
auto y = sin(x);
ysum += y;
}
integral = ysum * dx;

循环中使用的临时变量在那里声明,这样每个线程都有自己的副本(循环体可以重写为不需要xy(。reduce子句将(在本例中(为每个线程保留一个单独的ysum值,然后在最后将所有这些值相加。

最新更新