>您好,目前我正在使用Infiniband并使用IMB基准测试性能,我目前正在测试并行转移测试并且想知道结果是否确实反映了 8 个进程的并行性能。
结果的解释太模糊了,我无法理解。由于每个结果中都提到了(MPI_Barrier 中提到的 6 个额外进程),我怀疑它每个只运行 2 个进程?
吞吐量列 t_avg[usec] 结果似乎得到了正确的结果,但我需要确保我理解正确。
#-----------------------------------------------------------------------------
# Benchmarking Sendrecv
# #processes = 8
#-----------------------------------------------------------------------------
上面的这段话是否意味着我并行运行 8 个进程?
#-----------------------------------------------------------------------------
# Benchmarking Sendrecv
# #processes = 4
# ( 4 additional processes waiting in MPI_Barrier)
#-----------------------------------------------------------------------------
这段话的意思是 4 个进程并行运行?非常感谢熟悉IMB基准测试的人的帮助,谢谢
以下是完整的结果
# np - 8
#------------------------------------------------------------
# Intel (R) MPI Benchmarks 2018, MPI-1 part
#------------------------------------------------------------
# Date : Mon Oct 16 14:14:20 2017
# Machine : x86_64
# System : Linux
# Release : 4.4.0-96-generic
# Version : #119-Ubuntu SMP Tue Sep 12 14:59:54 UTC 2017
# MPI Version : 3.0
# MPI Thread Environment:
# Calling sequence was:
# ./IMB-MPI1 Sendrecv Exchange
# Minimum message length in bytes: 0
# Maximum message length in bytes: 4194304
#
# MPI_Datatype : MPI_BYTE
# MPI_Datatype for reductions : MPI_FLOAT
# MPI_Op : MPI_SUM
#
#
# List of Benchmarks to run:
# Sendrecv
# Exchange
#-----------------------------------------------------------------------------
# Benchmarking Sendrecv
# #processes = 2
# ( 6 additional processes waiting in MPI_Barrier)
#-----------------------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec
0 1000 13.85 13.85 13.85 0.00
1 1000 12.22 12.22 12.22 0.16
2 1000 10.08 10.08 10.08 0.40
4 1000 9.43 9.43 9.43 0.85
8 1000 8.89 8.91 8.90 1.80
16 1000 8.70 8.71 8.71 3.67
32 1000 9.00 9.00 9.00 7.11
64 1000 8.82 8.82 8.82 14.51
128 1000 8.90 8.90 8.90 28.77
256 1000 8.98 8.98 8.98 56.99
512 1000 9.78 9.78 9.78 104.75
1024 1000 12.65 12.65 12.65 161.91
2048 1000 18.31 18.32 18.31 223.63
4096 1000 20.05 20.05 20.05 408.52
8192 1000 21.15 21.16 21.16 774.11
16384 1000 27.46 27.47 27.46 1193.05
32768 1000 36.93 36.94 36.93 1774.31
65536 640 60.56 60.59 60.57 2163.39
131072 320 117.62 117.63 117.63 2228.57
262144 160 202.67 202.68 202.67 2586.78
524288 80 323.86 324.28 324.07 3233.56
1048576 40 615.05 615.47 615.26 3407.42
2097152 20 1214.74 1216.89 1215.82 3446.74
4194304 10 2471.83 2488.45 2480.14 3371.02
#-----------------------------------------------------------------------------
# Benchmarking Sendrecv
# #processes = 4
# ( 4 additional processes waiting in MPI_Barrier)
#-----------------------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec
0 1000 11.14 11.15 11.15 0.00
1 1000 11.16 11.16 11.16 0.18
2 1000 11.11 11.12 11.12 0.36
4 1000 11.10 11.11 11.10 0.72
8 1000 11.03 11.04 11.03 1.45
16 1000 11.21 11.22 11.22 2.85
32 1000 11.81 11.81 11.81 5.42
64 1000 11.58 11.58 11.58 11.05
128 1000 11.77 11.78 11.78 21.72
256 1000 11.88 11.89 11.89 43.05
512 1000 13.03 13.03 13.03 78.57
1024 1000 14.73 14.74 14.74 138.92
2048 1000 19.37 19.39 19.38 211.24
4096 1000 21.31 21.34 21.33 383.96
8192 1000 26.19 26.22 26.20 624.84
16384 1000 32.65 32.69 32.67 1002.26
32768 1000 48.71 48.78 48.75 1343.52
65536 640 75.14 75.22 75.18 1742.63
131072 320 174.66 175.15 174.94 1496.65
262144 160 301.22 302.02 301.44 1735.95
524288 80 539.40 542.68 540.78 1932.21
1048576 40 1015.45 1026.34 1020.59 2043.32
2097152 20 1959.53 1985.57 1971.34 2112.39
4194304 10 3549.00 3641.61 3590.76 2303.55
#-----------------------------------------------------------------------------
# Benchmarking Sendrecv
# #processes = 8
#-----------------------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec
0 1000 12.81 12.83 12.82 0.00
1 1000 12.82 12.84 12.83 0.16
2 1000 12.73 12.75 12.74 0.31
4 1000 12.82 12.85 12.84 0.62
8 1000 12.87 12.88 12.87 1.24
16 1000 12.83 12.86 12.84 2.49
32 1000 13.25 13.28 13.26 4.82
64 1000 13.44 13.46 13.45 9.51
128 1000 13.49 13.51 13.50 18.94
256 1000 13.72 13.74 13.73 37.27
512 1000 13.69 13.71 13.70 74.72
1024 1000 15.73 15.75 15.74 130.07
2048 1000 20.72 20.76 20.74 197.28
4096 1000 22.68 22.74 22.72 360.28
8192 1000 29.48 29.52 29.50 555.04
16384 1000 39.89 39.95 39.92 820.31
32768 1000 57.38 57.48 57.43 1140.24
65536 640 95.23 95.34 95.29 1374.78
131072 320 214.61 215.16 214.83 1218.38
262144 160 365.75 368.39 367.28 1423.18
524288 80 679.82 687.10 683.13 1526.08
1048576 40 1277.18 1309.22 1295.65 1601.83
2097152 20 2292.99 2377.56 2339.35 1764.12
4194304 10 4617.95 4919.67 4778.37 1705.12
#-----------------------------------------------------------------------------
# Benchmarking Exchange
# #processes = 2
# ( 6 additional processes waiting in MPI_Barrier)
#-----------------------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec
0 1000 12.41 12.42 12.42 0.00
1 1000 12.47 12.48 12.47 0.32
2 1000 11.93 11.94 11.94 0.67
4 1000 11.95 11.96 11.95 1.34
8 1000 11.91 11.92 11.92 2.69
16 1000 11.97 11.98 11.97 5.34
32 1000 12.80 12.81 12.80 10.00
64 1000 12.84 12.84 12.84 19.93
128 1000 12.90 12.91 12.91 39.67
256 1000 12.90 12.91 12.91 79.34
512 1000 14.04 14.04 14.04 145.82
1024 1000 17.13 17.14 17.13 239.02
2048 1000 21.06 21.06 21.06 389.05
4096 1000 23.32 23.33 23.32 702.41
8192 1000 28.07 28.07 28.07 1167.45
16384 1000 37.81 37.82 37.82 1732.64
32768 1000 55.23 55.24 55.24 2372.75
65536 640 101.04 101.06 101.05 2593.84
131072 320 212.88 212.88 212.88 2462.84
262144 160 362.37 362.38 362.37 2893.62
524288 80 668.88 668.89 668.88 3135.26
1048576 40 1286.48 1287.81 1287.15 3256.92
2097152 20 2463.56 2464.13 2463.84 3404.29
4194304 10 4845.24 4854.75 4849.99 3455.83
#-----------------------------------------------------------------------------
# Benchmarking Exchange
# #processes = 4
# ( 4 additional processes waiting in MPI_Barrier)
#-----------------------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec
0 1000 16.46 16.46 16.46 0.00
1 1000 16.42 16.43 16.42 0.24
2 1000 16.17 16.17 16.17 0.49
4 1000 16.17 16.17 16.17 0.99
8 1000 16.19 16.20 16.20 1.98
16 1000 16.21 16.22 16.22 3.94
32 1000 17.20 17.21 17.20 7.44
64 1000 17.09 17.10 17.10 14.97
128 1000 17.24 17.25 17.25 29.68
256 1000 17.40 17.41 17.40 58.83
512 1000 17.59 17.61 17.60 116.32
1024 1000 21.43 21.45 21.44 190.95
2048 1000 29.49 29.50 29.49 277.71
4096 1000 31.63 31.66 31.64 517.58
8192 1000 36.70 36.72 36.71 892.41
16384 1000 49.50 49.53 49.52 1323.07
32768 1000 68.35 68.36 68.36 1917.38
65536 640 108.80 108.85 108.82 2408.31
131072 320 314.38 314.72 314.56 1665.91
262144 160 521.71 522.24 521.94 2007.84
524288 80 930.03 933.47 931.82 2246.62
1048576 40 1729.81 1738.30 1734.66 2412.87
2097152 20 3384.33 3414.99 3403.61 2456.41
4194304 10 6972.50 7058.12 7028.16 2377.01
#-----------------------------------------------------------------------------
# Benchmarking Exchange
# #processes = 8
#-----------------------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec
0 1000 18.91 18.93 18.92 0.00
1 1000 19.06 19.08 19.07 0.21
2 1000 18.91 18.92 18.92 0.42
4 1000 19.07 19.09 19.08 0.84
8 1000 18.81 18.83 18.82 1.70
16 1000 19.02 19.03 19.03 3.36
32 1000 19.85 19.85 19.85 6.45
64 1000 19.76 19.78 19.77 12.94
128 1000 19.94 19.96 19.95 25.65
256 1000 20.16 20.18 20.17 50.75
512 1000 20.50 20.51 20.50 99.86
1024 1000 24.52 24.55 24.54 166.83
2048 1000 36.35 36.39 36.37 225.14
4096 1000 38.77 38.81 38.79 422.20
8192 1000 44.79 44.82 44.81 731.12
16384 1000 59.28 59.33 59.31 1104.68
32768 1000 86.39 86.47 86.42 1515.87
65536 640 142.47 142.60 142.53 1838.29
131072 320 402.11 402.98 402.57 1301.04
262144 160 648.90 650.30 649.68 1612.44
524288 80 1209.17 1213.71 1211.74 1727.89
1048576 40 2332.69 2355.17 2344.35 1780.89
2097152 20 4686.88 4767.48 4733.77 1759.55
4194304 10 9457.18 9674.69 9567.31 1734.13
# All processes entering MPI_Finalize
一次IMB
基准测试
- 各种 MPI 子例程(此处
MPI_Sendrecv
和MPI_Exchange
) - 各种消息大小(从此处
0
到4MB
) - 各种通讯器尺寸(
2
,4
和8
在这里)
由于mpirun
被-np 8
调用一次,这意味着8
创建了MPI任务。因此,在测试大小2
通信器时,会在引擎盖下创建一个额外的大小6
通信器,其6
MPI任务只是挂在MPI_Barrier
中,因此消息
# #processes = 2
# ( 6 additional processes waiting in MPI_Barrier)