IN map reduce concept under replica and over replica to use。如何平衡复制副本过多和复制副本不足。
我想您已经知道,默认情况下复制因子是3。
Over replicated block 是超过其所属文件的目标复制的块。通常,过度复制不是问题,HDFS会自动删除多余的副本。这就是它在这种情况下的平衡。
Under replicated block 是指不符合其所属文件的目标复制的块。
为了平衡这些HDFS将自动为复制不足的块创建新的副本,直到它们满足目标复制。
您可以使用获取有关正在复制(或等待复制)的块的信息
hdfs dfsadmin -metasave.
if you execute below command, you will get the detailed stats.
hdfs fsck /
......................
Status: HEALTHY
Total size: 511799225 B
Total dirs: 10 Total files: 22
Total blocks (validated): 22 (avg. block size 23263601 B)
Minimally replicated blocks: 22 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 3.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 4
Number of racks: 1
The filesystem under path '/' is HEALTHY