向hadoop集群同步数据失败(未能放置足够的副本)



我在ubuntu 20.04中安装了Hadoop集群。在安装Hadoop集群之后,我遇到了一个很大的问题。客户端到Hadoop集群的数据同步管道中断。我打算传输到Hadoop集群的数据大约是18Gb。虽然我的DFS剩余是65%,但是从客户端到Hadoop集群的管道传输文件失败了。我试图格式化nameNode和dataNode,但结果仍然失败。有人能帮我修一下吗?以下是给我们的一些信息:

日志失败:

2022-06-09 10:46:08,507 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Not enough replicas was chosen. Reason: {NOT_ENOUGH_STORAGE_SPACE=3, NO_REQUIRED_STORAGE_TYPE=1}
2022-06-09 10:46:08,507 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Not enough replicas was chosen. Reason: {NO_REQUIRED_STORAGE_TYPE=1}
2022-06-09 10:46:08,508 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 3 to reach 3 (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy and org.apache.hadoop.net.NetworkTopology
2022-06-09 10:46:08,508 WARN org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough replicas: expected size is 3 but only 0 storage types can be selected (replication=3, selected=[], unavailable=[DISK], removed=[DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]})
2022-06-09 10:46:08,508 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 3 to reach 3 (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) All required storage types are unavailable:  unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}
2022-06-09 10:46:08,511 INFO org.apache.hadoop.ipc.Server: IPC Server handler 8 on default port 9000, call Call#1744126 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 172.24.217.203:38560
java.io.IOException: File /.reserved/.inodes/30284/.9DFE31A1-742F11EC-981DADC9-CEAF568F@172.24.219.148.wav.C5j1SD could only be written to 0 of the 1 minReplication nodes. There are 3 datanode(s) running and 3 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2315)
at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2960)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:904)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:593)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:604)
at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:572)
at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:556)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1093)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1043)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:971)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2976)

My DFS admin report:

Name: 172.24.217.204:9866 (hadoop-node1)
Hostname: hadoop-node1
Decommission Status : Normal
Configured Capacity: 104091082752 (96.94 GB)
DFS Used: 19833880576 (18.47 GB)
Non DFS Used: 11562889216 (10.77 GB)
DFS Remaining: 67362725888 (62.74 GB)
DFS Used%: 19.05%
DFS Remaining%: 64.72%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 0
Last contact: Thu Jun 09 11:26:29 ICT 2022
Last Block Report: Thu Jun 09 09:54:17 ICT 2022
Num of Blocks: 9033

Name: 172.24.217.205:9866 (hadoop-node2)
Hostname: hadoop-node2
Decommission Status : Normal
Configured Capacity: 104091082752 (96.94 GB)
DFS Used: 19833790464 (18.47 GB)
Non DFS Used: 11416346624 (10.63 GB)
DFS Remaining: 67509358592 (62.87 GB)
DFS Used%: 19.05%
DFS Remaining%: 64.86%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 0
Last contact: Thu Jun 09 11:26:29 ICT 2022
Last Block Report: Thu Jun 09 10:02:23 ICT 2022
Num of Blocks: 9033

Name: 172.24.217.206:9866 (hadoop-node3)
Hostname: hadoop-node3
Decommission Status : Normal
Configured Capacity: 104091082752 (96.94 GB)
DFS Used: 19833802752 (18.47 GB)
Non DFS Used: 10835709952 (10.09 GB)
DFS Remaining: 68089982976 (63.41 GB)
DFS Used%: 19.05%
DFS Remaining%: 65.41%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 0
Last contact: Thu Jun 09 11:26:29 ICT 2022
Last Block Report: Thu Jun 09 09:54:17 ICT 2022
Num of Blocks: 9033

**My core-site.xml config:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://hadoop-master:9000</value>
</property>
<!-->HTTP/dfs proxy-->  
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
</configuration>

My hdfs-site.xml config:

*<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>/opt/hadoop/dfsdata/nameNode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>[DISK]file:///opt/hadoop/dfsdata/dataNode</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.client.use.datanode.hostname</name>
<value>true</value>
</property>
<property>
<name>dfs.datanode.use.datanode.hostname</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.redundancy.considerLoad</name>
<value>false</value>
</property>

<!-- HDFS NFS gateway -->
<property>
<name>dfs.namenode.accesstime.precision</name>
<value>3600000</value>
<description>
The access time for HDFS file is precise up to this value. 
The default value is 1 hour. Setting a value of 0 disables
access times for HDFS.
</description>
</property>
<property>
<name>dfs.storage.policy.enabled</name>
<value>false</value>
</property>
<property>
<name>dfs.nfs3.dump.dir</name>
<value>/tmp/.hdfs-nfs</value>
</property>
<property>
<name>dfs.nfs.exports.allowed.hosts</name>
<value>172.24.217.0/24 rw ; 172.30.12.0/24 rw ; 172.24.216.0/24 rw</value>
</property>
<property>
<name>nfs.export.point</name>
<value>/</value>
</property>
<property>
<name>dfs.namenode.redundancy.considerLoad.factor</name>
<value>3</value>
</property>
</configuration>*

我使用了三个dev虚拟机来部署hadoop集群,遇到了和你的类似的问题。

我打开日志并根据提示设置调试级别org.apache.hadoop.net.NetworkTopology

2022-08-31 10:43:32,576 DEBUG org.apache.hadoop.net.NetworkTopology: Choosing random from 2 available nodes on node /default-rack, scope=/default-rack, excludedScope=null, excludeNodes=[]. numOfDatanodes=2.
2022-08-31 10:43:32,576 DEBUG org.apache.hadoop.net.NetworkTopology: chooseRandom returning x.x.x.x:9866
2022-08-31 10:43:32,577 DEBUG org.apache.hadoop.net.NetworkTopology: Failed to find datanode (scope="" excludedScope="/default-rack"). numOfDatanodes=0
2022-08-31 10:43:32,577 DEBUG org.apache.hadoop.net.NetworkTopology: No node to choose.

我怀疑是网络问题。我去机器上telnet端口9866,它真的不起作用。然后我打开了这个端口。打开这个端口后,我的hadoop问题解决了。

我希望它会对你有用。