通过大量小文件改进格式I/O的Fortran

假设我有以下要求从模拟编写监视文件的要求：

必须编写大量的单个文件，通常按10000
文件必须是人类可读的，即格式I/O
定期将新行添加到每个文件中。通常每50秒一次。
新数据几乎必须立即访问，因此大型手册写缓冲区不是一个选项
我们在一个光泽文件系统上，该系统似乎已针对相反的情况进行了优化：顺序写入少数大文件。

不是我提出要求的不是我讨论它们没有意义。我只想找到最好的解决方案，并使用上述先决条件。我提出了一个工作示例来测试一些实现。这是我到目前为止能做的最好的事情：

!===============================================================!
! program to test some I/O implementations for many small files !
!===============================================================!
PROGRAM iotest
    use types
    use omp_lib
    implicit none
    INTEGER(I4B), PARAMETER :: steps = 1000
    INTEGER(I4B), PARAMETER :: monitors = 1000
    INTEGER(I4B), PARAMETER :: cachesize = 10
    INTEGER(I8B) :: counti, countf, count_rate, counti_global, countf_global
    REAL(DP) :: telapsed, telapsed_global
    REAL(DP), DIMENSION(:,:), ALLOCATABLE :: density, pressure, vel_x, vel_y, vel_z
    INTEGER(I4B) :: n, t, unitnumber, c, i, thread
    CHARACTER(LEN=100) :: dummy_char, number
    REAL(DP), DIMENSION(:,:,:), ALLOCATABLE :: writecache_real
    call system_clock(counti_global,count_rate)
    ! allocate cache
    allocate(writecache_real(5,cachesize,monitors))
    writecache_real = 0.0_dp
    ! fill values
    allocate(density(steps,monitors), pressure(steps,monitors), vel_x(steps,monitors), vel_y(steps,monitors), vel_z(steps,monitors))
    do n=1, monitors
        do t=1, steps
            call random_number(density(t,n))
            call random_number(pressure(t,n))
            call random_number(vel_x(t,n))
            call random_number(vel_y(t,n))
            call random_number(vel_z(t,n))
        end do
    end do
    ! create files
    do n=1, monitors
        write(number,'(I0.8)') n
        dummy_char = 'monitor_' // trim(adjustl(number)) // '.dat'
        open(unit=20, file=trim(adjustl(dummy_char)), status='replace', action='write')
        close(20)
    end do
    call system_clock(counti)
    ! write data
    c = 0
    do t=1, steps
        c = c + 1
        do n=1, monitors
            writecache_real(1,c,n) = density(t,n)
            writecache_real(2,c,n) = pressure(t,n)
            writecache_real(3,c,n) = vel_x(t,n)
            writecache_real(4,c,n) = vel_y(t,n)
            writecache_real(5,c,n) = vel_z(t,n)
        end do
        if(c .EQ. cachesize .OR. t .EQ. steps) then
            !$OMP PARALLEL DEFAULT(SHARED) PRIVATE(n,number,dummy_char,unitnumber, thread)
            thread = OMP_get_thread_num()
            unitnumber = thread + 20
            !$OMP DO
            do n=1, monitors
                write(number,'(I0.8)') n
                dummy_char = 'monitor_' // trim(adjustl(number)) // '.dat'
                open(unit=unitnumber, file=trim(adjustl(dummy_char)), status='old', action='write', position='append', buffered='yes')
                write(unitnumber,'(5ES25.15)') writecache_real(:,1:c,n)
                close(unitnumber)
            end do
            !$OMP END DO
            !$OMP END PARALLEL
            c = 0
        end if
    end do
    call system_clock(countf)
    call system_clock(countf_global)
    telapsed=real(countf-counti,kind=dp)/real(count_rate,kind=dp)
    telapsed_global=real(countf_global-counti_global,kind=dp)/real(count_rate,kind=dp)
    write(*,*)
    write(*,'(A,F15.6,A)') ' elapsed wall time for I/O: ', telapsed, ' seconds'
    write(*,'(A,F15.6,A)') ' global elapsed wall time:  ', telapsed_global, ' seconds'
    write(*,*)
END PROGRAM iotest

主要特征是：OpenMP并行化和手动写缓冲区。以下是带有16个线程的光泽文件系统上的一些时间：

cachesize = 5：i/o的经过的壁时间：991.627404秒
cachesize = 10：i/o的经过的壁时间：415.456265秒
cachesize = 20：i/o的经过的壁时间：93.842964秒
cachesize = 50：i/o的经过的壁时间：79.859099秒
cachesize = 100：i/o的经过的壁时间：23.937832秒
cachesize = 1000：i/o的经过的壁时间：10.472421秒

用于参考带有停用HDD的本地工作站HDD上的结果，16个线程：

cachesize = 1：i/o的经过的壁时间：5.543722秒
cachesize = 2：i/o的经过的壁时间：2.791811秒
cachesize = 3：i/o的经过的壁时间：1.752962秒
cachesize = 4：i/o的经过的壁时间：1.630385秒
cachesize = 5：i/o的经过的壁时间：1.174099秒
cachesize = 10：i/o的经过的壁时间：0.700624秒
cachesize = 20：i/o的经过的壁时间：0.433936秒
cachesize = 50：I/O的经过的壁时间：0.425782秒
cachesize = 100：i/o的经过的壁时间：0.227552秒

您可以看到，与普通的HDD相比，光泽文件系统的实现仍然令人尴尬，我需要巨大的缓冲尺寸以将I/O的开销减少到可耐受程度。这意味着违反了前面提出的要求的落后的输出滞后。另一种有希望的方法是在连续写作之间使单位打开。不幸的是，同时打开的单位数量通常仅限于无根特权的1024-4096。因此，这不是一个选项，因为文件数可以超过此限制。

如何在还满足要求的同时进一步降低I/O高架？

编辑1 从与吉尔斯的讨论中，我了解到即使使用普通用户特权，也可以调整光泽文件系统。因此，我尝试将条纹计数设置为建议（这已经是默认设置），并将条带大小降低至64K的最小支持值（默认值为1M）。但是，这并不能改善我的测试案例I/O性能。如果有人在更合适的文件系统设置上有其他提示，请告诉我。

对于每个患有小文件性能的每个人，新的光泽版本2.11允许将小文件直接存储在MDT上，从而改善了这些文件的访问时间。

http://cdn.opensfs.org/wp-content/uploads/2018/04/leers-lustre-data_on_mdt_mdt_an_early_early_look_look_ddn.pdf

lfs setstripe -E 1M -L mdt -E -1 fubar填充商店在MDT上的目录fubar中所有文件的第一个兆字节

相关内容

最新更新

热门标签：