我的集群中有2个datanode + 1个namenode。我可以看到我的文件邮件。在两个数据节点上复制Json。我可以根据fsck的输出看到这里的位置。
- 192.168.99.1:50010 192.168.99.100:50010
我可以在其中一个服务器中找到文件的物理位置
服务器# 1
[raviramadoss@karaba ~]$ cat $HADOOP_CONF_DIR/hdfs-site.xml | grep datanode
<name>dfs.datanode.data.dir</name>
<value>/home/raviramadoss/datadir/datanode/dir1, /home/raviramadoss/datadir/datanode/dir2, /home/raviramadoss/datadir/datanode/dir3</value>
[raviramadoss@karaba ~]$ pwd
/home/raviramadoss
[raviramadoss@karaba ~]$ find . | xargs grep 'his email should be filtered out' 2> /dev/null
./datadir/datanode/dir1/current/BP-277552337-172.20.10.2-1470405150547/current/finalized/subdir0/subdir0/blk_1073741829:{"from":"chien@williamsonoma.com","to":"ravir@pro.com","body":"This email should be filtered out"}
但是无法在第二个数据节点上找到副本。
服务器# 2
ravis-MacBook-Pro:datadir raviramadoss$ cat /Users/raviramadoss/Downloads/hadoop-2.7.2/etc/hadoop/hdfs-site.xml | grep datanode
<name>dfs.datanode.data.dir</name>
<value>/Users/raviramadoss/datadir/datanode/dir1, Users/raviramadoss/datadir/datanode/dir2, Users/raviramadoss/datadir/datanode/dir3</value>
ravis-MacBook-Pro:datadir raviramadoss$ cat $HADOOP_CONF_DIR/hdfs-site.xml | grep datanode
<name>dfs.datanode.data.dir</name>
<value>/Users/raviramadoss/datadir/datanode/dir1, Users/raviramadoss/datadir/datanode/dir2, Users/raviramadoss/datadir/datanode/dir3</value>
ravis-MacBook-Pro:datadir raviramadoss$ pwd
/Users/raviramadoss/datadir
ravis-MacBook-Pro:datadir raviramadoss$ find . | xargs grep 'his email should be filtered out' 2> /dev/null
FSCK命令输出
hadoop fsck /users/raviramadoss/emails.json -locations -files -blocks
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
16/08/06 16:41:02 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Connecting to namenode via http://kirikou.worldofthe.com:50070/fsck?ugi=raviramadoss&locations=1&files=1&blocks=1&path=%2Fusers%2Fraviramadoss%2Femails.json
FSCK started by raviramadoss (auth:SIMPLE) from /192.168.99.1 for path /users/raviramadoss/emails.json at Sat Aug 06 16:41:03 IST 2016
/users/raviramadoss/emails.json 207 bytes, 1 block(s): Under replicated BP-277552337-172.20.10.2-1470405150547:blk_1073741829_1005. Target Replicas is 3 but found 2 replica(s).
0. BP-277552337-172.20.10.2-1470405150547:blk_1073741829_1005 len=207 repl=2 [DatanodeInfoWithStorage[192.168.99.1:50010,DS-69e0ae16-85b8-4a7b-ae82-bd9e195aa946,DISK], DatanodeInfoWithStorage[192.168.99.100:50010,DS-2d2d3e94-58a9-465c-860d-90188838b675,DISK]]
Status: HEALTHY
Total size: 207 B
Total dirs: 0
Total files: 1
Total symlinks: 0
Total blocks (validated): 1 (avg. block size 207 B)
Minimally replicated blocks: 1 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 1 (100.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 2
Average block replication: 2.0
Corrupt blocks: 0
Missing replicas: 1 (33.333332 %)
Number of data-nodes: 2
Number of racks: 1
FSCK ended at Sat Aug 06 16:41:03 IST 2016 in 1 milliseconds
The filesystem under path '/users/raviramadoss/emails.json' is HEALTHY
发现我在服务器#2的dfs.datanode.data.dir中缺少一个/。一旦这个问题解决了,我就可以找到副本了。