HDFS HA 群集备用节点在实际活动名称节点关闭时不会变为活动状态



我在HA模式下配置了HDF。我有一个"活动"节点和一个"备用"节点。我已经开始ZKFC。如果我停止活动节点的ZKFC,则备用节点会更改状态并将其作为"活动"节点。问题是,当我关闭具有ZKFC的活动服务器和一台"活动"服务器和一台"备用"服务器时,备用服务器不会更改其状态,始终保持待机状态。

我的core site.xml

<configuration>
 <property>
    <name>fs.default.name</name>
    <value>hdfs://auto-ha</value>
 </property>
</configuration>

我的hdfs-site.xml

<configuration>
<property>
  <name>dfs.namenode.rpc-bind-host</name>
  <value>0.0.0.0</value>
  <description>
    The actual address the RPC server will bind to. If this optional address is
    set, it overrides only the hostname portion of dfs.namenode.rpc-address.
    It can also be specified per name node or name service for HA/Federation.
    This is useful for making the name node listen on all interfaces by
    setting it to 0.0.0.0.
  </description>
</property>
<property>
  <name>dfs.namenode.servicerpc-bind-host</name>
  <value>0.0.0.0</value>
  <description>
    The actual address the service RPC server will bind to. If this optional address is
    set, it overrides only the hostname portion of dfs.namenode.servicerpc-address.
    It can also be specified per name node or name service for HA/Federation.
    This is useful for making the name node listen on all interfaces by
    setting it to 0.0.0.0.
  </description>
</property>
<property>
  <name>dfs.namenode.http-bind-host</name>
  <value>0.0.0.0</value>
  <description>
    The actual adress the HTTP server will bind to. If this optional address
    is set, it overrides only the hostname portion of dfs.namenode.http-address.
    It can also be specified per name node or name service for HA/Federation.
    This is useful for making the name node HTTP server listen on all
    interfaces by setting it to 0.0.0.0.
  </description>
</property>
<property>
  <name>dfs.namenode.https-bind-host</name>
  <value>0.0.0.0</value>
  <description>
    The actual adress the HTTPS server will bind to. If this optional address
    is set, it overrides only the hostname portion of dfs.namenode.https-address.
    It can also be specified per name node or name service for HA/Federation.
    This is useful for making the name node HTTPS server listen on all
    interfaces by setting it to 0.0.0.0.
  </description>
</property>
<property>
  <name>dfs.replication</name>
  <value>2</value>
 </property>
 <property>
  <name>dfs.name.dir</name>
  <value>file:///hdfs/name</value>
 </property>
 <property>
  <name>dfs.data.dir</name>
  <value>file:///hdfs/data</value>
 </property>
 <property>
  <name>dfs.permissions</name>
  <value>false</value>
 </property>
 <property>
  <name>dfs.nameservices</name>
  <value>auto-ha</value>
 </property>
 <property>
  <name>dfs.ha.namenodes.auto-ha</name>
  <value>nn01,nn02</value>
 </property>
<property>
  <name>dfs.namenode.rpc-address.auto-ha.nn01</name>
  <value>master1:8020</value>
</property>
 <property>
  <name>dfs.namenode.http-address.auto-ha.nn01</name>
  <value>master1:50070</value>
 </property>
 <property>
  <name>dfs.namenode.rpc-address.auto-ha.nn02</name>
  <value>master2:8020</value>
 </property>
 <property>
  <name>dfs.namenode.http-address.auto-ha.nn02</name>
  <value>master2:50070</value>
 </property>
 <property>
  <name>dfs.namenode.shared.edits.dir</name>
  <value>qjournal://master1:8485;master2:8485;master3:8485/auto-ha</value>
 </property>
 <property>
  <name>dfs.journalnode.edits.dir</name>
  <value>/hdfs/journalnode</value>
 </property>
 <property>
  <name>dfs.ha.fencing.methods</name>
  <value>sshfence</value>
</property>
<property>
  <name>dfs.ha.fencing.ssh.private-key-files</name>
  <value>/home/ikerlan/.ssh/id_rsa</value>
 </property>
 <property>
  <name>dfs.ha.automatic-failover.enabled.auto-ha</name>
  <value>true</value>
 </property>
 <property>
   <name>ha.zookeeper.quorum</name>
   <value>master1:2181,master2:2181,master3:2181</value>
 </property>
<property>
 <name>dfs.client.failover.proxy.provider.auto-ha</name>
 <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
 <name>dfs.namenode.datanode.registration.ip-hostname-check</name>
 <value>false</value>
</property>
</configuration>

我已经检查了日志,问题是当试图围栏时,我将有下一个失败:

    2017-02-24 12:46:29,389 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master2/172.16.8.232:8020. Already tried 0 time$
2017-02-24 12:46:49,399 WARN org.apache.hadoop.ha.FailoverController: Unable to gracefully make NameNode at master2/172.16.8.232:8020 $
org.apache.hadoop.net.ConnectTimeoutException: Call From master1/172.16.8.231 to master2:8020 failed on socket timeout exception: org.$
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:751)
        at org.apache.hadoop.ipc.Client.call(Client.java:1479)
        at org.apache.hadoop.ipc.Client.call(Client.java:1412)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
        at com.sun.proxy.$Proxy9.transitionToStandby(Unknown Source)
        at org.apache.hadoop.ha.protocolPB.HAServiceProtocolClientSideTranslatorPB.transitionToStandby(HAServiceProtocolClientSideTran$
        at org.apache.hadoop.ha.FailoverController.tryGracefulFence(FailoverController.java:172)
        at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:514)
        at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:505)
        at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:61)
        at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:892)
        at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:910)
        at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:809)
        at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:418)
        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
Caused by: org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch :$
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:534)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)

我刚刚添加了下一个属性,现在效果很好:

hdfs_site.xml

<property>
  <name>dfs.ha.fencing.methods</name>
  <value>shell(/bin/true)</value>
</property>  

core-site.xml

<property>
   <name>hs.zookeeper.quorum</name>
   <value>master1:2181,master2:2181,master3:2181</value>
</property>

问题是无法使用sshfence连接,因此使用shell(/bin/true)它可以正常工作。

最新更新