我有一个三节点hadoop集群运行。由于某些原因,当datanode从服务器启动时,它们使用的IP地址甚至在我的网络中都不存在。这是我的主机名和IP映射。
nodes:
- hostname: hadoop-master
ip: 192.168.51.4
- hostname: hadoop-data1
ip: 192.168.52.4
- hostname: hadoop-data2
ip: 192.168.52.6
正如您在下面看到的,hadoop-master节点正常启动,但是在其他两个节点中,只有一个显示为Live datanode,无论哪个显示的IP总是192.168.51.1,正如您在上面看到的,这个IP在我的网络中甚至不存在。
hadoop@hadoop-master:~$ hdfs dfsadmin -report
Safe mode is ON
Configured Capacity: 84482326528 (78.68 GB)
Present Capacity: 75735965696 (70.53 GB)
DFS Remaining: 75735281664 (70.53 GB)
DFS Used: 684032 (668 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
-------------------------------------------------
Live datanodes (2):
Name: 192.168.51.1:50010 (192.168.51.1)
Hostname: hadoop-data2
Decommission Status : Normal
Configured Capacity: 42241163264 (39.34 GB)
DFS Used: 303104 (296 KB)
Non DFS Used: 4305530880 (4.01 GB)
DFS Remaining: 37935329280 (35.33 GB)
DFS Used%: 0.00%
DFS Remaining%: 89.81%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri Sep 25 13:54:23 UTC 2015
Name: 192.168.51.4:50010 (hadoop-master)
Hostname: hadoop-master
Decommission Status : Normal
Configured Capacity: 42241163264 (39.34 GB)
DFS Used: 380928 (372 KB)
Non DFS Used: 4440829952 (4.14 GB)
DFS Remaining: 37799952384 (35.20 GB)
DFS Used%: 0.00%
DFS Remaining%: 89.49%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri Sep 25 13:54:21 UTC 2015
我确实尝试为每个主机显式地添加dfs.datanode.address,但在这种情况下,它甚至无法显示为活动节点。下面是我的hdfs-site.xml的样子(注意,我已经尝试过使用dfs.datanode.地址集和不使用地址集)。
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
<property>
<name>dfs.namenode.rpc-bind-host</name>
<value>0.0.0.0</value>
</property>
<property>
<name>dfs.datanode.address</name>
<value>192.168.51.4:50010</value>
</property>
<property>
<name>dfs.namenode.datanode.registration.ip-hostname-check</name>
<value>false</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoop/hadoop-data/hdfs/namenode</value>
<description>Determines where on the local filesystem the DFS name node should store the name table(fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.</description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/hadoop/hadoop-data/hdfs/datanode</value>
<description>Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are ignored.</description>
</property>
</configuration>
为什么hadoop将每个datanode与一个甚至不存在的IP相关联?或者更重要的是,如何使节点正常工作?
更新:所有节点上的/etc/hosts文件都是相同的
192.168.51.4 hadoop-master
192.168.52.4 hadoop-data1
192.168.52.6 hadoop-data2
下面是我的奴隶文件的内容。
hadoop@hadoop-master:~$ cat /usr/local/hadoop/etc/hadoop/slaves
hadoop-master
hadoop-data1
hadoop-data2
datanode日志:
https://gist.github.com/dwatrous/7241bb804a9be8f9303f https://gist.github.com/dwatrous/bcd85cda23d6eca3a68b https://gist.github.com/dwatrous/922c4f773aded0137fa3
namenode日志:
https://gist.github.com/dwatrous/dafaa7695698f36a5d93
在审查了所有可能的问题之后,这个问题似乎与Vagrant和Virtualbox的某种组合有关。我试图在一个子网上运行主节点,在另一个子网上运行数据节点。原来,网络配置的方式,我可以在这些子网之间通信,但有某种类型的隐藏网关导致错误的IP地址被使用。
解决方案是更改我的Vagrantfile,将所有三个主机放在同一个子网上。之后一切正常
您可以发布您的整个datanode日志吗?尝试将下面的值设置为要绑定的ip的接口名。
dfs.client.local.interfaces = eth0