如何正确理解 IMB 基准测试结果



>您好,目前我正在使用Infiniband并使用IMB基准测试性能,我目前正在测试并行转移测试并且想知道结果是否确实反映了 8 个进程的并行性能。

结果的解释太模糊了,我无法理解。由于每个结果中都提到了(MPI_Barrier 中提到的 6 个额外进程),我怀疑它每个只运行 2 个进程?

吞吐量列 t_avg[usec] 结果似乎得到了正确的结果,但我需要确保我理解正确。

#-----------------------------------------------------------------------------
# Benchmarking Sendrecv
# #processes = 8
#-----------------------------------------------------------------------------

上面的这段话是否意味着我并行运行 8 个进程?

#-----------------------------------------------------------------------------
# Benchmarking Sendrecv
# #processes = 4
# ( 4 additional processes waiting in MPI_Barrier)
#-----------------------------------------------------------------------------

这段话的意思是 4 个进程并行运行?非常感谢熟悉IMB基准测试的人的帮助,谢谢

以下是完整的结果

# np - 8
#------------------------------------------------------------
#    Intel (R) MPI Benchmarks 2018, MPI-1 part
#------------------------------------------------------------
# Date                  : Mon Oct 16 14:14:20 2017
# Machine               : x86_64
# System                : Linux
# Release               : 4.4.0-96-generic
# Version               : #119-Ubuntu SMP Tue Sep 12 14:59:54 UTC 2017
# MPI Version           : 3.0
# MPI Thread Environment:

# Calling sequence was:
# ./IMB-MPI1 Sendrecv Exchange
# Minimum message length in bytes:   0
# Maximum message length in bytes:   4194304
#
# MPI_Datatype                   :   MPI_BYTE
# MPI_Datatype for reductions    :   MPI_FLOAT
# MPI_Op                         :   MPI_SUM
#
#
# List of Benchmarks to run:
# Sendrecv
# Exchange
#-----------------------------------------------------------------------------
# Benchmarking Sendrecv
# #processes = 2
# ( 6 additional processes waiting in MPI_Barrier)
#-----------------------------------------------------------------------------
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]   Mbytes/sec
            0         1000        13.85        13.85        13.85         0.00
            1         1000        12.22        12.22        12.22         0.16
            2         1000        10.08        10.08        10.08         0.40
            4         1000         9.43         9.43         9.43         0.85
            8         1000         8.89         8.91         8.90         1.80
           16         1000         8.70         8.71         8.71         3.67
           32         1000         9.00         9.00         9.00         7.11
           64         1000         8.82         8.82         8.82        14.51
          128         1000         8.90         8.90         8.90        28.77
          256         1000         8.98         8.98         8.98        56.99
          512         1000         9.78         9.78         9.78       104.75
         1024         1000        12.65        12.65        12.65       161.91
         2048         1000        18.31        18.32        18.31       223.63
         4096         1000        20.05        20.05        20.05       408.52
         8192         1000        21.15        21.16        21.16       774.11
        16384         1000        27.46        27.47        27.46      1193.05
        32768         1000        36.93        36.94        36.93      1774.31
        65536          640        60.56        60.59        60.57      2163.39
       131072          320       117.62       117.63       117.63      2228.57
       262144          160       202.67       202.68       202.67      2586.78
       524288           80       323.86       324.28       324.07      3233.56
      1048576           40       615.05       615.47       615.26      3407.42
      2097152           20      1214.74      1216.89      1215.82      3446.74
      4194304           10      2471.83      2488.45      2480.14      3371.02
#-----------------------------------------------------------------------------
# Benchmarking Sendrecv
# #processes = 4
# ( 4 additional processes waiting in MPI_Barrier)
#-----------------------------------------------------------------------------
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]   Mbytes/sec
            0         1000        11.14        11.15        11.15         0.00
            1         1000        11.16        11.16        11.16         0.18
            2         1000        11.11        11.12        11.12         0.36
            4         1000        11.10        11.11        11.10         0.72
            8         1000        11.03        11.04        11.03         1.45
           16         1000        11.21        11.22        11.22         2.85
           32         1000        11.81        11.81        11.81         5.42
           64         1000        11.58        11.58        11.58        11.05
          128         1000        11.77        11.78        11.78        21.72
          256         1000        11.88        11.89        11.89        43.05
          512         1000        13.03        13.03        13.03        78.57
         1024         1000        14.73        14.74        14.74       138.92
         2048         1000        19.37        19.39        19.38       211.24
         4096         1000        21.31        21.34        21.33       383.96
         8192         1000        26.19        26.22        26.20       624.84
        16384         1000        32.65        32.69        32.67      1002.26
        32768         1000        48.71        48.78        48.75      1343.52
        65536          640        75.14        75.22        75.18      1742.63
       131072          320       174.66       175.15       174.94      1496.65
       262144          160       301.22       302.02       301.44      1735.95
       524288           80       539.40       542.68       540.78      1932.21
      1048576           40      1015.45      1026.34      1020.59      2043.32
      2097152           20      1959.53      1985.57      1971.34      2112.39
      4194304           10      3549.00      3641.61      3590.76      2303.55
#-----------------------------------------------------------------------------
# Benchmarking Sendrecv
# #processes = 8
#-----------------------------------------------------------------------------
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]   Mbytes/sec
            0         1000        12.81        12.83        12.82         0.00
            1         1000        12.82        12.84        12.83         0.16
            2         1000        12.73        12.75        12.74         0.31
            4         1000        12.82        12.85        12.84         0.62
            8         1000        12.87        12.88        12.87         1.24
           16         1000        12.83        12.86        12.84         2.49
           32         1000        13.25        13.28        13.26         4.82
           64         1000        13.44        13.46        13.45         9.51
          128         1000        13.49        13.51        13.50        18.94
          256         1000        13.72        13.74        13.73        37.27
          512         1000        13.69        13.71        13.70        74.72
         1024         1000        15.73        15.75        15.74       130.07
         2048         1000        20.72        20.76        20.74       197.28
         4096         1000        22.68        22.74        22.72       360.28
         8192         1000        29.48        29.52        29.50       555.04
        16384         1000        39.89        39.95        39.92       820.31
        32768         1000        57.38        57.48        57.43      1140.24
        65536          640        95.23        95.34        95.29      1374.78
       131072          320       214.61       215.16       214.83      1218.38
       262144          160       365.75       368.39       367.28      1423.18
       524288           80       679.82       687.10       683.13      1526.08
      1048576           40      1277.18      1309.22      1295.65      1601.83
      2097152           20      2292.99      2377.56      2339.35      1764.12
      4194304           10      4617.95      4919.67      4778.37      1705.12
#-----------------------------------------------------------------------------
# Benchmarking Exchange
# #processes = 2
# ( 6 additional processes waiting in MPI_Barrier)
#-----------------------------------------------------------------------------
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]   Mbytes/sec
            0         1000        12.41        12.42        12.42         0.00
            1         1000        12.47        12.48        12.47         0.32
            2         1000        11.93        11.94        11.94         0.67
            4         1000        11.95        11.96        11.95         1.34
            8         1000        11.91        11.92        11.92         2.69
           16         1000        11.97        11.98        11.97         5.34
           32         1000        12.80        12.81        12.80        10.00
           64         1000        12.84        12.84        12.84        19.93
          128         1000        12.90        12.91        12.91        39.67
          256         1000        12.90        12.91        12.91        79.34
          512         1000        14.04        14.04        14.04       145.82
         1024         1000        17.13        17.14        17.13       239.02
         2048         1000        21.06        21.06        21.06       389.05
         4096         1000        23.32        23.33        23.32       702.41
         8192         1000        28.07        28.07        28.07      1167.45
        16384         1000        37.81        37.82        37.82      1732.64
        32768         1000        55.23        55.24        55.24      2372.75
        65536          640       101.04       101.06       101.05      2593.84
       131072          320       212.88       212.88       212.88      2462.84
       262144          160       362.37       362.38       362.37      2893.62
       524288           80       668.88       668.89       668.88      3135.26
      1048576           40      1286.48      1287.81      1287.15      3256.92
      2097152           20      2463.56      2464.13      2463.84      3404.29
      4194304           10      4845.24      4854.75      4849.99      3455.83
#-----------------------------------------------------------------------------
# Benchmarking Exchange
# #processes = 4
# ( 4 additional processes waiting in MPI_Barrier)
#-----------------------------------------------------------------------------
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]   Mbytes/sec
            0         1000        16.46        16.46        16.46         0.00
            1         1000        16.42        16.43        16.42         0.24
            2         1000        16.17        16.17        16.17         0.49
            4         1000        16.17        16.17        16.17         0.99
            8         1000        16.19        16.20        16.20         1.98
           16         1000        16.21        16.22        16.22         3.94
           32         1000        17.20        17.21        17.20         7.44
           64         1000        17.09        17.10        17.10        14.97
          128         1000        17.24        17.25        17.25        29.68
          256         1000        17.40        17.41        17.40        58.83
          512         1000        17.59        17.61        17.60       116.32
         1024         1000        21.43        21.45        21.44       190.95
         2048         1000        29.49        29.50        29.49       277.71
         4096         1000        31.63        31.66        31.64       517.58
         8192         1000        36.70        36.72        36.71       892.41
        16384         1000        49.50        49.53        49.52      1323.07
        32768         1000        68.35        68.36        68.36      1917.38
        65536          640       108.80       108.85       108.82      2408.31
       131072          320       314.38       314.72       314.56      1665.91
       262144          160       521.71       522.24       521.94      2007.84
       524288           80       930.03       933.47       931.82      2246.62
      1048576           40      1729.81      1738.30      1734.66      2412.87
      2097152           20      3384.33      3414.99      3403.61      2456.41
      4194304           10      6972.50      7058.12      7028.16      2377.01
#-----------------------------------------------------------------------------
# Benchmarking Exchange
# #processes = 8
#-----------------------------------------------------------------------------
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]   Mbytes/sec
            0         1000        18.91        18.93        18.92         0.00
            1         1000        19.06        19.08        19.07         0.21
            2         1000        18.91        18.92        18.92         0.42
            4         1000        19.07        19.09        19.08         0.84
            8         1000        18.81        18.83        18.82         1.70
           16         1000        19.02        19.03        19.03         3.36
           32         1000        19.85        19.85        19.85         6.45
           64         1000        19.76        19.78        19.77        12.94
          128         1000        19.94        19.96        19.95        25.65
          256         1000        20.16        20.18        20.17        50.75
          512         1000        20.50        20.51        20.50        99.86
         1024         1000        24.52        24.55        24.54       166.83
         2048         1000        36.35        36.39        36.37       225.14
         4096         1000        38.77        38.81        38.79       422.20
         8192         1000        44.79        44.82        44.81       731.12
        16384         1000        59.28        59.33        59.31      1104.68
        32768         1000        86.39        86.47        86.42      1515.87
        65536          640       142.47       142.60       142.53      1838.29
       131072          320       402.11       402.98       402.57      1301.04
       262144          160       648.90       650.30       649.68      1612.44
       524288           80      1209.17      1213.71      1211.74      1727.89
      1048576           40      2332.69      2355.17      2344.35      1780.89
      2097152           20      4686.88      4767.48      4733.77      1759.55
      4194304           10      9457.18      9674.69      9567.31      1734.13

# All processes entering MPI_Finalize

一次IMB基准测试

  • 各种 MPI 子例程(此处MPI_SendrecvMPI_Exchange
  • 各种消息大小(从此处04MB
  • 各种通讯器尺寸(248在这里)

由于mpirun-np 8调用一次,这意味着8创建了MPI任务。因此,在测试大小2通信器时,会在引擎盖下创建一个额外的大小6通信器,其6 MPI任务只是挂在MPI_Barrier中,因此消息

# #processes = 2
# ( 6 additional processes waiting in MPI_Barrier)

最新更新