这里有一个程序可以进行一些计算,我正试图使用线程使它运行得更快,但我无法使它比串行版本运行得更快。
串行输出为
Function: y(x) = sin(x) [note that x is in radians]
Limits of integration: x1 = 7.000000, x2 = 2.000000
Riemann sum with 5000 steps, a.k.a. dx=-0.001000
Computed integral: 1.170175
Exact integral: 1.170049
Percent error: 0.010774 %
Work took 0.182533 milliseconds
并行输出为
Function: y(x) = sin(x) [note that x is in radians]
Limits of integration: x1 = 7.000000, x2 = 2.000000
Riemann sum with 5000 steps, a.k.a. dx=-0.001000
Computed integral: 1.170175
Exact integral: 1.170049
Percent error: 0.010774 %
Work took 0.667334 milliseconds
这是代码
#include <stddef.h> // for size_t
#include <stdio.h>
#include <stdlib.h> /* atoi, atof */
#include <math.h>
#include "omp.h" // just used for timing
int main(int argc, char *argv[]) {
double start, end;
start = omp_get_wtime(); // Start our work timer
// BEGIN TIMED CODE BLOCK
double x, y;
double x1, x2; // Limits of integration
double dx;
double ysum, integral;
size_t i;
size_t nsteps;
// Read in command line arguments
x1 = atof(argv[1]); // lower x limit
x2 = atof(argv[2]); // upper x limit
nsteps = atof(argv[3]); // number of steps in Riemann sum
omp_set_num_threads(2);
// Compute delta x
dx = (x2 - x1)/nsteps; // delta x for the Riemann sum
// Perform numeric integration via Riemann sum
ysum = 0;
// Temporary variable to hold the sum prior to multiplication by dx
#pragma omp parallel shared(ysum) private(x,y)
{
#pragma omp for
for (i=0; i<nsteps; i++) {
x = x1 + i*dx; // x value at this step
y = sin(x); // y(x) at this step; note that x is always in radians
#pragma omp critical
ysum += y; // summation of y(x)
}
#pragma omp critical
integral = ysum * dx; // Our computed integral: the summation of y(x)*dx
// END TIMED CODE BLOCK
}
end = omp_get_wtime(); // Stop our work timer
double analytic = -cos(x2) + cos(x1); // The known, exact answer to this integration problem
printf("Function: y(x) = sin(x) [note that x is in radians]n");
printf("Limits of integration: x1 = %lf, x2 = %lf n", x1, x2);
printf("Riemann sum with %ld steps, a.k.a. dx=%lf n", nsteps, dx);
printf("Computed integral: %lf n", integral);
printf("Exact integral: %lf n", analytic);
printf("Percent error: %lf %% n", fabs((integral - analytic) / analytic)*100);
printf("Work took %f millisecondsn", 1000 * (end - start));
return 0;
}
当我删除关键部分时,输出发生了变化,所以我认为我在做了正确的事情
每次使用#pragma omp critical
都会对有效的多线程设置障碍。您可以使用#pragma omp parallel for
指令和reduction
子句来并行化循环。
#pragma omp parallel for reduction(+:ysum)
for (int i = 0; i < nsteps; ++i) {
auto x = x1 + i * dx;
auto y = sin(x);
ysum += y;
}
integral = ysum * dx;
循环中使用的临时变量在那里声明,这样每个线程都有自己的副本(循环体可以重写为不需要x
或y
(。reduce
子句将(在本例中(为每个线程保留一个单独的ysum
值,然后在最后将所有这些值相加。