Fortran OpenACC 派生类型具有可分配性



>我读过 可以手动深度复制Fortran派生类型,但以下简单的测试程序在运行时失败;程序使用PGI v16.10干净地编译。怎么了?

program Test
implicit none
type dt
integer :: n
real, dimension(:), allocatable :: xm
end type dt
type(dt) :: grid
integer :: i
grid%n = 10
allocate(grid%xm(grid%n))
!$acc enter data copyin(grid)
!$acc enter data pcreate(grid%xm)
!$acc kernels
do i = 1, grid%n
grid%xm(i) = i * i
enddo
!$acc end kernels
print*,grid%xm
end program Test

我得到的错误是:

call to cuStreamSynchronize returned error 700: Illegal address during kernel execution
call to cuMemFreeHost returned error 700: Illegal address during kernel execution

你只需要在内核指令上添加一个"present(grid("子句。

下面是包含修复程序的程序示例,以及其他一些内容,例如更新数据,以便可以在主机上打印数据。

% cat test.f90
program Test
implicit none
type dt
integer :: n
real, dimension(:), allocatable :: xm
end type dt
type(dt) :: grid
integer :: i
grid%n = 10
allocate(grid%xm(grid%n))
!$acc enter data copyin(grid)
!$acc enter data create(grid%xm)
!$acc kernels present(grid)
do i = 1, grid%n
grid%xm(i) = i * i
enddo
!$acc end kernels
!$acc update host(grid%xm)
print*,grid%xm
!$acc exit data delete(grid%xm, grid)
deallocate(grid%xm)
end program Test
% pgf90 -acc test.f90 -Minfo=accel -ta=tesla -V16.10; a.out
test:
16, Generating enter data copyin(grid)
17, Generating enter data create(grid%xm(:))
18, Generating present(grid)
19, Loop is parallelizable
Accelerator kernel generated
Generating Tesla code
19, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
23, Generating update self(grid%xm(:))
1.000000        4.000000        9.000000        16.00000
25.00000        36.00000        49.00000        64.00000
81.00000        100.0000

请注意,PGI 17.7 将在 Fortran 中包含 beta 支持真正的深拷贝。 与您上面的手动深度复制相反。 下面是使用真深拷贝的示例:

% cat test_deep.f90
program Test
implicit none
type dt
integer :: n
real, dimension(:), allocatable :: xm
end type dt
type(dt) :: grid
integer :: i
grid%n = 10
allocate(grid%xm(grid%n))
!$acc enter data copyin(grid)
!$acc kernels present(grid)
do i = 1, grid%n
grid%xm(i) = i * i
enddo
!$acc end kernels
!$acc update host(grid)
print*,grid%xm
!$acc exit data delete(grid)
deallocate(grid%xm)
end program Test
% pgf90 -acc test_deep.f90 -Minfo=accel -ta=tesla:deepcopy -V17.7 ; a.out
test:
16, Generating enter data copyin(grid)
17, Generating present(grid)
18, Loop is parallelizable
Accelerator kernel generated
Generating Tesla code
18, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
22, Generating update self(grid)
1.000000        4.000000        9.000000        16.00000
25.00000        36.00000        49.00000        64.00000
81.00000        100.0000

相关内容

最新更新