R程序中Fortran中的OpenMP似乎毫无理由地挂起了



我在R中有一个程序,它调用了两个启用了openMP的Fortran例程。有两个Fortran例程sub_1sub_2。在R函数中,第一个被调用两次,而第二个被调用一次。除了一些小事情外,两种程序几乎完全相同。我调用第一个例程,然后调用第二个例程,再调用第一个。然而,如果我启用了这两个openMP,当函数第二次使用第一个fortran例程时,它会停止执行任何操作(没有错误或停止执行,只是停留在那里)。

如果我在sub_1中禁用openMP,那么一切都会正常运行。如果我在sub_2中禁用了openMP,那么它在第二次使用sub_1时再次以相同的方式挂起。这很奇怪,因为它显然通过了第一次使用。

我认为这可能是由于线程没有正确关闭或其他原因(我对openMP了解不多)。然而,另一个奇怪的地方是,调用这三个例程的R函数被调用了四次,如果我只在sub_2中启用openMP,那么这很好(即,对sub_2的第二次、第三次等调用没有挂起)。我只是不知道它为什么会这么做!供参考,这是sub_1:的代码

subroutine correlation_dd_rad(s_bins,min_s,end_s,n,pos1,dd,r)   
!!! INTENT IN !!!!!!!!
integer             :: s_bins       !Number of separation bins
integer             :: N            !Number of objects
real(8)             :: pos1(3,N)    !Cartesian Positions of particles
real(8)             :: min_s        !The smallest separation calculated.
real(8)             :: end_s        !The largest separation calculated.
real(8)             :: r(N)         !The radii of each particle (ascending)
!!! INTENT OUT !!!!!!!
real(8)             :: dd(N,s_bins)         !The binned data.
!!! LOCAL !!!!!!!!!!!!
integer             :: i,j      !Iterators
integer             :: bin
real(8)             :: d            !Distance between particles.
real(8)             :: dr,mins,ends
real(8),parameter   :: pi = 3.14159653589
integer             :: counter
dd(:,:) = 0.d0
dr = (end_s-min_s)/s_bins
!Perform the separation binning
mins = min_s**2
ends = end_s**2
counter = 1000
!$OMP parallel do private(d,bin,j)
do i=1,N
!$omp critical (count_it)
counter = counter - 1
!$omp end critical (count_it)
if(counter==0)then
counter = 1000
write(*,*) "Another Thousand"
end if
do j=i+1,N
if(r(j)-r(i) .GT. end_s)then
exit
end if
d=(pos1(1,j)-pos1(1,i))**2+&
&(pos1(2,j)-pos1(2,i))**2+&
&(pos1(3,j)-pos1(3,i))**2
if(d.LT.ends .AND. d.GT.mins)then
d = Sqrt(d)
bin = Floor((d-min_s)/dr)+1
dd(i,bin) = dd(i,bin)+1.d0
dd(j,bin) = dd(j,bin)+1.d0
end if
end do
end do
!$OMP end parallel do
write(*,*) "done"
end subroutine

有人知道为什么会发生这种事吗??

干杯。

我将添加一个我能想到的最小的例子,它确实再现了这个问题(顺便说一句,这一定是一个R问题——我在这里给出的类型的一个小例子,但用fortran编写很好)。所以我在fortran中有上面的代码和下面的代码,编译到共享对象correlate.so:

subroutine correlation_dr_rad(s_bins,min_s,end_s,n,pos1,n2,pos2,dd,r1,r2)
!!! INTENT IN !!!!!!!!
integer             :: s_bins       !Number of separation bins
integer             :: N            !Number of objects
integer             :: n2
real(8)             :: pos1(3,N)    !Cartesian Positions of particles
real(8)             :: pos2(3,n2)   !random particles
real(8)             :: end_s        !The largest separation calculated.
real(8)             :: min_s        !The smallest separation
real(8)             :: r1(N),r2(N2) !The radii of particles (ascending)
!!! INTENT OUT !!!!!!!
real(8)             :: dd(N,s_bins)         !The binned data.
!!! LOCAL !!!!!!!!!!!!
integer             :: i,j      !Iterators
integer             :: bin
real(8)             :: d            !Distance between particles.
real(8)             :: dr,mins,ends
real(8),parameter   :: pi = 3.14159653589
integer             :: counter
dd(:,:) = 0.d0
dr = (end_s-min_s)/s_bins
!Perform the separation binning
mins = min_s**2
ends = end_s**2
write(*,*) "Got just before parallel dr"
counter = 1000
!$OMP parallel do private(d,bin,j)
do i=1,N
!$OMP critical (count)
counter = counter - 1
!$OMP end critical (count)
if(counter==0)then
write(*,*) "Another thousand"
counter = 1000
end if
do j=1,N2

if(r2(j)-r1(i) .GT. end_s)then
exit
end if
d=(pos1(1,j)-pos2(1,i))**2+&
&(pos1(2,j)-pos2(2,i))**2+&
&(pos1(3,j)-pos2(3,i))**2
if(d.GT.mins .AND. d.LT.ends)then
d = Sqrt(d)
bin = Floor((d-min_s)/dr)+1
dd(i,bin) = dd(i,bin)+1.d0
end if
end do
end do
!$OMP end parallel do
write(*,*) "Done"
end subroutine

然后在R中,我有以下函数——前两个函数只是包装了上面的fortran代码。第三个调用它的方式与我实际的代码类似:

correlate_dd_rad = function(pos,r,min_r,end_r,bins){
#A wrapper for the fortran routine of the same name.
dyn.load('correlate.so')
out = .Fortran('correlation_dd_rad',
s_bins = as.integer(bins),
min_s = as.double(min_r),
end_s = as.double(end_r),
n = as.integer(length(r)),
pos = as.double(t(pos)),
dd = matrix(0,length(r),bins), #The output matrix.
r = as.double(r))
dyn.unload('correlate.so')
return(out$dd)
}
correlate_dr_rad = function(pos1,r1,pos2,r2,min_r,end_r,bins){
#A wrapper for the fortran routine of the same name
N = length(r1)
N2 = length(r2)
dyn.load('correlate.so')
out = .Fortran('correlation_dr_rad',
s_bins = as.integer(bins),
min_s = as.double(min_r),
end_s = as.double(end_r),
n = N,
pos1 = as.double(t(pos1)),
n2 = N2,
pos2 = as.double(t(pos2)),
dr = matrix(0,nrow=N,ncol=bins),
r1 = as.double(r1),
r2 = as.double(r2))
dyn.unload('correlate.so')
return(out$dr)
}
the_calculation = function(){
#Generate some data to use
pos1 = matrix(rnorm(30000),10000,3)
pos2 = matrix(rnorm(30000),10000,3)
#Find the radii
r1 = sqrt(pos1[,1]^2 + pos1[,2]^2+pos1[,3]^2)
r2 = sqrt(pos2[,1]^2 + pos2[,2]^2+pos2[,3]^2)
#usually sort them but it doesn't matter here.
#Now call the functions
print("Calculating the data-data pairs")
dd = correlate_dd_rad(pos=pos1,r=r1,min_r=0.001,end_r=0.8,bins=15)
print("Calculating the data-random pairs")
dr = correlate_dr_rad(pos1,r1,pos2,r2,min_r=0.001,end_r=0.8,bins=15)
print("Calculating the random-random pairs")
rr = correlate_dd_rad(pos=pos2,r=r2,min_r=0.001,end_r=0.8,bins=15)
#Now we would do something with it but I don't care in this example.
print("Done")
}

运行此操作,我得到输出:

[1] "Calculating the data-data pairs"
Another Thousand
Another Thousand
Another Thousand
Another Thousand
Another Thousand
Another Thousand
Another Thousand
Another Thousand
Another Thousand
Another Thousand
done
[1] "Calculating the data-random pairs"
Got just before parallel dr
Another thousand
Another thousand

然后它就停在那里。。。事实上,运行它几次表明,它每次挂起的位置都会发生变化。有时,它在对correlate_dd_rad的第二次调用中完成了大部分,而在其他情况下,它只完成了对correlate_dr_rad的调用的一半。

我不确定这是否能解决您的问题,但这确实是一个bug。在子程序correlation_dd_rad中,当您打算关闭平行区域时,您实际上放置了一个注释。为了更清楚地显示:

!OMP end parallel do

应转换为:

!$OMP end parallel do

作为旁注:

  1. 如果不调用库函数,则不需要use omp_lib
  2. 可以使用atomic构造(请参阅最新OpenMP规范的2.8.5节)来原子访问特定的存储位置,而不是使用critical构造
  3. 始终将critical构造命名为(规范第2.8.2节)

所有没有名称的关键构造都被认为具有相同的未指定名称。

最新更新