如何用OpenMP在Fortran中完全并行顺序内部循环



我正试图将一些旧的固定格式Fortran代码与OpenMP并行。我不知道如何将以下嵌套循环结构完全并行,包括一个外循环和两个顺序的内循环:

do y = 1,ny
do x = 1,nx
calculation 1
enddo
intermediate calculation (calculation 1)
do x = 1,nx
calculation 2 (intermediate calculation)
enddo
enddo

calculation 2不能包含在第一个内循环中,必须包含在外循环中,而不是单独的外循环中。这是由于依赖于中间计算,其本身依赖于calculation 1的所有值

我正在使用gfortran进行编译,并设置环境变量OMP_NUM_THREADS=4。下面的代码演示了我测试过的3种方法:

c     Test of using OpenMP to parallelise sequential inner loops and an
c       outer loop
c     Use export OMP_NUM_THREADS=4
program parallelTest
c     declarations
implicit none
integer nx, ny
parameter (nx = 2, ny = 5)
integer omp_get_thread_num, i, j, A(nx,ny), B(nx,ny), C(nx,ny),
>    D(nx,ny), E(nx,ny), F(nx,ny)
c     initialisation
A = 7
B = 7
C = 7
D = 7
E = 7
F = 7
c     attempt 1: just executes first loop, 2nd loop is ignored
c$omp parallel do shared(A,B) private(i,j) schedule(static) collapse(2)
do j = 1,ny
do i = 1,nx
A(i,j) = omp_get_thread_num()
enddo
do i = 1,nx
B(i,j) = omp_get_thread_num()
enddo
enddo
c$omp end parallel do
c     attempt 2: only parallelises outer loop
c$omp parallel do shared(C,D) private(i,j) schedule(static)
do j = 1,ny
do i = 1,nx
C(i,j) = omp_get_thread_num()
enddo
do i = 1,nx
D(i,j) = omp_get_thread_num()
enddo
enddo
c$omp end parallel do
c     attempt 3: only parallelises inner loops
c$omp parallel shared(E,F) private(i,j)
do j = 1,ny
c$omp   do schedule(static)
do i = 1,nx
E(i,j) = omp_get_thread_num()
enddo
c$omp   end do
c$omp   do schedule(static)
do i = 1,nx
F(i,j) = omp_get_thread_num()
enddo
c$omp   end do
enddo
c$omp end parallel
c     print output to terminal
do i = 1,nx
print *, 'A(', i, ',:) = ', A(i,:)
enddo
print *
do i = 1,nx
print *, 'B(', i, ',:) = ', B(i,:)
enddo
print *
print *
do i = 1,nx
print *, 'C(', i, ',:) = ', C(i,:)
enddo
print *
do i = 1,nx
print *, 'D(', i, ',:) = ', D(i,:)
enddo
print *
print *
do i = 1,nx
print *, 'E(', i, ',:) = ', E(i,:)
enddo
print *
do i = 1,nx
print *, 'F(', i, ',:) = ', F(i,:)
enddo
end

这会产生以下输出:

A(           1 ,:) =            0           0           1           2           3
A(           2 ,:) =            0           1           1           2           3
B(           1 ,:) =            7           7           7           7           7
B(           2 ,:) =            7           7           7           7           7

C(           1 ,:) =            0           0           1           2           3
C(           2 ,:) =            0           0           1           2           3
D(           1 ,:) =            0           0           1           2           3
D(           2 ,:) =            0           0           1           2           3

E(           1 ,:) =            0           0           0           0           0
E(           2 ,:) =            1           1           1           1           1
F(           1 ,:) =            0           0           0           0           0
F(           2 ,:) =            1           1           1           1           1

在方法1中,正如我所期望的,外循环和第一个内循环是并行的,但第二个循环没有执行(可能是因为do没有立即跟随c$omp do?(。在方法2中,只有外环是平行的。在方法3中,只有内环是平行的。

我的问题是:如何使矩阵B与矩阵A相同?这似乎应该是一项简单明了的任务;我假设有一个OpenMP子句或结构我没有使用,但如果能指出正确的搜索方向(OpenMP的新功能!(,我将不胜感激。

如果信息流与您所描述的一样,我会将您的循环重写为

do y = 1,ny
do x = 1,nx
calculation 1
enddo
enddo
do y = 1,ny
intermediate calculation (calculation 1)
enddo
do y = 1,ny
do x = 1,nx
calculation 2 (intermediate calculation)
enddo
enddo

然后分别对每个y环路进行并行处理。

最新更新