Spark连接Hbase时发生死锁



我正在尝试在Spark应用程序中连接HBase。我使用HDP(hortonworks数据平台)。版本:Spark 1.5.2, HBase 1.1.2。我的代码如下:

import org.apache.spark.{SparkConf, Logging, SparkContext}
import org.apache.spark.sql.SQLContext
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.{HColumnDescriptor, HTableDescriptor, TableName, HBaseConfiguration}
import org.apache.hadoop.hbase.client.{HBaseAdmin, Put, ConnectionFactory, Table}
object HBaseSource {
  def main(args: Array[String]){
    val sparkConf = new SparkConf().setAppName("HBase Test")
    val sc = new SparkContext(sparkConf)
    val sqlContext = new SQLContext(sc)
    import sqlContext._
    import sqlContext.implicits._
    val hConf = HBaseConfiguration.create
    if(hConf == null){
      println("null pointer")
    }else{
      println("not null pointer")
    }
    hConf.set("zookeeper.znode.parent", "/hbase-secure")
    hConf.set("hbase.zookeeper.quorum", "zookeeper quorum's IP address")
    hConf.set("hbase.master", "master's IP address")
    hConf.set("hbase.zookeeper.property.clientPort", "2181")
    hConf.addResource("/usr/hdp/curent/hbase-master/conf/hbase-site.xml")
    hConf.addResource("/usr/hdp/current/hbase-master/conf/hdfs-site.xml")
    hConf.addResource("/usr/hdp/current/hbase-master/conf/core-site.xml")
    println("beginning of the test")
    HBaseAdmin.checkHBaseAvailable(hConf)
  }
}

我正在使用HBaseAdmin.checkHBaseAvailable(hConf)来检查HBase是否良好。我用maven构建,我的maven POM文件如下所示:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.kpmg.tabi</groupId>
  <artifactId>kshc1</artifactId>
  <version>0.0.1-SNAPSHOT</version>
      <build>
    <plugins>
      <!-- any other plugins -->
      <plugin>
        <artifactId>maven-assembly-plugin</artifactId>
        <executions>
          <execution>
            <phase>package</phase>
            <goals>
              <goal>single</goal>
            </goals>
          </execution>
        </executions>
        <configuration>
          <descriptorRefs>
            <descriptorRef>jar-with-dependencies</descriptorRef>
          </descriptorRefs>
        </configuration>
      </plugin>
    </plugins>
  </build>
  <dependencies>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.10</artifactId>
        <version>1.5.2</version>
    </dependency>  
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.10</artifactId>
        <version>1.5.2</version>
    </dependency>
    <dependency>
      <groupId>org.apache.hbase</groupId>
      <artifactId>hbase-client</artifactId>
      <version>1.1.2</version>
            <exclusions>
        <exclusion>
          <groupId>asm</groupId>
          <artifactId>asm</artifactId>
        </exclusion>
        <exclusion>
          <groupId>org.jboss.netty</groupId>
          <artifactId>netty</artifactId>
        </exclusion>
        <exclusion>
          <groupId>io.netty</groupId>
          <artifactId>netty</artifactId>
        </exclusion>
        <exclusion>
          <groupId>commons-logging</groupId>
          <artifactId>commons-logging</artifactId>
        </exclusion>
        <exclusion>
          <groupId>org.jruby</groupId>
          <artifactId>jruby-complete</artifactId>
        </exclusion>
      </exclusions>
    </dependency>
  </dependencies>
</project>

我使用spark-submit来执行我的任务。要提交的命令为spark-submit --class HBaseSource appName-jar-with-dependencies.jar。然后出现死锁,这是日志信息的一部分。在这之后,这个应用程序就卡住了,无法继续运行。

16/08/15 17:03:02 INFO Executor: Starting executor ID driver on host localhost
16/08/15 17:03:02 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 56566.
16/08/15 17:03:02 INFO NettyBlockTransferService: Server created on 56566
16/08/15 17:03:02 INFO BlockManagerMaster: Trying to register BlockManager
16/08/15 17:03:02 INFO BlockManagerMasterEndpoint: Registering block manager localhost:56566 with 530.0 MB RAM, BlockManagerId(driver, localhost, 56566)
16/08/15 17:03:02 INFO BlockManagerMaster: Registered BlockManager
not null pointer
beginning of the test
16/08/15 17:03:02 INFO RecoverableZooKeeper: Process identifier=hconnection-0x5c60b0a0 connecting to ZooKeeper ensemble=10.1.188.121:2181
16/08/15 17:03:02 INFO ZooKeeper: Client environment:zookeeper.version=3.4.6-4--1, built on 02/11/2016 06:47 GMT
16/08/15 17:03:02 INFO ZooKeeper: Client environment:host.name=useomlxd00007.nix.us.kworld.kpmg.com
16/08/15 17:03:02 INFO ZooKeeper: Client environment:java.version=1.8.0_71
16/08/15 17:03:02 INFO ZooKeeper: Client environment:java.vendor=Oracle Corporation
16/08/15 17:03:02 INFO ZooKeeper: Client environment:java.home=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.71-1.b15.el6_7.x86_64/jre
16/08/15 17:03:02 INFO ZooKeeper: Client environment:java.class.path=/usr/hdp/current/spark-client/conf/:/usr/hdp/2.3.4.7-4/spark/lib/spark-assembly-1.5.2.2.3.4.7-4-hadoop2.7.1.2.3.4.7-4.jar:/usr/hdp/2.3.4.7-4/spark/lib/datanucleus-rdbms-3.2.9.jar:/usr/hdp/2.3.4.7-4/spark/lib/datanucleus-api-jdo-3.2.6.jar:/usr/hdp/2.3.4.7-4/spark/lib/datanucleus-core-3.2.10.jar:/usr/hdp/current/hadoop-client/conf/:/usr/hdp/2.3.4.7-4/hadoop/lib/hadoop-lzo-0.6.0.2.3.4.7-4.jar
16/08/15 17:03:02 INFO ZooKeeper: Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
16/08/15 17:03:02 INFO ZooKeeper: Client environment:java.io.tmpdir=/tmp
16/08/15 17:03:02 INFO ZooKeeper: Client environment:java.compiler=<NA>
16/08/15 17:03:02 INFO ZooKeeper: Client environment:os.name=Linux
16/08/15 17:03:02 INFO ZooKeeper: Client environment:os.arch=amd64
16/08/15 17:03:02 INFO ZooKeeper: Client environment:os.version=2.6.32-573.8.1.el6.x86_64
16/08/15 17:03:02 INFO ZooKeeper: Client environment:user.name=username
16/08/15 17:03:02 INFO ZooKeeper: Client environment:user.home=homePath
16/08/15 17:03:02 INFO ZooKeeper: Client environment:user.dir=dirPath
16/08/15 17:03:02 INFO ZooKeeper: Initiating client connection, connectString=ipAddr:2181 sessionTimeout=90000 watcher=hconnection-0x5c60b0a00x0, quorum=ipAddr:2181, baseZNode=/hbase-secure
16/08/15 17:03:02 INFO ClientCnxn: Opening socket connection to server ipAddr/ipAddr:2181. Will not attempt to authenticate using SASL (unknown error)
16/08/15 17:03:02 INFO ClientCnxn: Socket connection established to 10.1.188.121/10.1.188.121:2181, initiating session
16/08/15 17:03:02 INFO ClientCnxn: Session establishment complete on server 10.1.188.121/10.1.188.121:2181, sessionid = 0x1568e5a3fde008c, negotiated timeout = 40000
16/08/15 17:03:03 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
16/08/15 17:03:03 INFO ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x1568e5a3fde008c
16/08/15 17:03:03 INFO ZooKeeper: Session: 0x1568e5a3fde008c closed
16/08/15 17:03:03 INFO ClientCnxn: EventThread shut down

我确信IP地址和端口号是正确的,因为我可以使用Java API成功连接到HBase。我的问题是为什么这个简单的小spark应用程序会出现死锁。spark还有其他连接HBase的方式吗?

此错误是由于版本不匹配。我用POM文件中声明的依赖关系来构建我的项目,并通过使用这个插件将外部jar包含到我的项目中。

      <plugin>
        <artifactId>maven-assembly-plugin</artifactId>
        <executions>
          <execution>
            <phase>package</phase>
            <goals>
              <goal>single</goal>
            </goals>
          </execution>
        </executions>
        <configuration>
          <descriptorRefs>
            <descriptorRef>jar-with-dependencies</descriptorRef>
          </descriptorRefs>
        </configuration>
      </plugin>

当我删除这个插件以排除在maven中声明的jar并使用集群中提供的库时,问题就解决了。但是我仍然想知道为什么会发生这种情况,因为maven中声明的库版本与集群中提供的库版本相同。

无论如何这个问题似乎lib版本不匹配

相关内容

  • 没有找到相关文章

最新更新