所以我在我的集群上运行TestDFSIO,以查看读写操作的吞吐量和平均io速率。我做4个测试:4 个文件,每个文件 256 MB(总计 1 GB(2 个文件,每个文件 256 MB(共 512 MB(2 个文件,每个 128 MB(总共 256 MB(1 个文件 50 MB(总共 50 MB(
我在单节点到 5 节点的 Hadoop 集群上运行它们。 块大小为 256 MB,每个节点具有不同的复制(单节点 = 1 个复制,2 个节点 = 2 个复制,依此类推(。
这是 1 GB 数据测试的测试结果1 个节点
----- TestDFSIO ----- : write
Date & time: Thu Jun 20 11:38:21 WIB 2019
Number of files: 4
Total MBytes processed: 1024.0
Throughput mb/sec: 8.503288381053611
Average IO rate mb/sec: 8.507380485534668
IO rate std deviation: 0.18595730311606032
Test exec time sec: 84.876
----- TestDFSIO ----- : read
Date & time: Thu Jun 20 11:39:52 WIB 2019
Number of files: 4
Total MBytes processed: 1024.0
Throughput mb/sec: 14.351786965662228
Average IO rate mb/sec: 14.422638893127441
IO rate std deviation: 1.0515649052955383
Test exec time sec: 61.371
2 node
----- TestDFSIO ----- : write
Date & time: Thu Jun 20 11:15:52 WIB 2019
Number of files: 4
Total MBytes processed: 1024.0
Throughput mb/sec: 2.557167936510315
Average IO rate mb/sec: 2.5574562549591064
IO rate std deviation: 0.027311795003682558
Test exec time sec: 150.506
----- TestDFSIO ----- : read
Date & time: Thu Jun 20 11:18:04 WIB 2019
Number of files: 4
Total MBytes processed: 1024.0
Throughput mb/sec: 9.567321617101587
Average IO rate mb/sec: 9.673456192016602
IO rate std deviation: 1.0593562755825534
Test exec time sec: 79.333
3 node
----- TestDFSIO ----- : write
Date & time: Thu Jun 20 10:42:47 WIB 2019
Number of files: 4
Total MBytes processed: 1024.0
Throughput mb/sec: 2.343067129788529
Average IO rate mb/sec: 2.3866918087005615
IO rate std deviation: 0.3233444726530288
Test exec time sec: 167.593
----- TestDFSIO ----- : read
Date & time: Thu Jun 20 10:47:33 WIB 2019
Number of files: 4
Total MBytes processed: 1024.0
Throughput mb/sec: 11.901164547546546
Average IO rate mb/sec: 12.255699157714844
IO rate std deviation: 2.2415787547598667
Test exec time sec: 69.29
4 node
----- TestDFSIO ----- : write
Date & time: Thu Jun 20 10:23:19 WIB 2019
Number of files: 4
Total MBytes processed: 1024.0
Throughput mb/sec: 1.6539390885245053
Average IO rate mb/sec: 1.6625666618347168
IO rate std deviation: 0.12093049037575003
Test exec time sec: 205.164
----- TestDFSIO ----- : read
Date & time: Thu Jun 20 10:25:23 WIB 2019
Number of files: 4
Total MBytes processed: 1024.0
Throughput mb/sec: 19.842653954966476
Average IO rate mb/sec: 20.02923583984375
IO rate std deviation: 1.9719328195872965
Test exec time sec: 57.25
5 node
----- TestDFSIO ----- : write
Date & time: Thu Jun 13 12:50:12 WIB 2019
Number of files: 4
Total MBytes processed: 1024.0
Throughput mb/sec: 1.5617159964556366
Average IO rate mb/sec: 1.573684573173523
IO rate std deviation: 0.14426118715726127
Test exec time sec: 219.959
----- TestDFSIO ----- : read
Date & time: Thu Jun 13 14:01:01 WIB 2019
Number of files: 4
Total MBytes processed: 1024.0
Throughput mb/sec: 18.00692844707827
Average IO rate mb/sec: 18.323461532592773
IO rate std deviation: 2.501963465819598
Test exec time sec: 64.316
我认为节点越多,工作就会变得更加并行化并增加吞吐量。 为什么添加新节点时写入操作会大幅下降?
数据大小太小。 单个系统可以轻松处理 1 GB 的数据。 考虑到这是您使用的最大尺寸,您看到这些结果也就不足为奇了。
将此数量级提高到100GB-1TB之类的水平,否则从这种类型的测试中获得性能结果没有任何意义。