我正在尝试在Spark应用程序中连接HBase。我使用HDP(hortonworks数据平台)。版本:Spark 1.5.2, HBase 1.1.2。我的代码如下:
import org.apache.spark.{SparkConf, Logging, SparkContext}
import org.apache.spark.sql.SQLContext
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.{HColumnDescriptor, HTableDescriptor, TableName, HBaseConfiguration}
import org.apache.hadoop.hbase.client.{HBaseAdmin, Put, ConnectionFactory, Table}
object HBaseSource {
def main(args: Array[String]){
val sparkConf = new SparkConf().setAppName("HBase Test")
val sc = new SparkContext(sparkConf)
val sqlContext = new SQLContext(sc)
import sqlContext._
import sqlContext.implicits._
val hConf = HBaseConfiguration.create
if(hConf == null){
println("null pointer")
}else{
println("not null pointer")
}
hConf.set("zookeeper.znode.parent", "/hbase-secure")
hConf.set("hbase.zookeeper.quorum", "zookeeper quorum's IP address")
hConf.set("hbase.master", "master's IP address")
hConf.set("hbase.zookeeper.property.clientPort", "2181")
hConf.addResource("/usr/hdp/curent/hbase-master/conf/hbase-site.xml")
hConf.addResource("/usr/hdp/current/hbase-master/conf/hdfs-site.xml")
hConf.addResource("/usr/hdp/current/hbase-master/conf/core-site.xml")
println("beginning of the test")
HBaseAdmin.checkHBaseAvailable(hConf)
}
}
我正在使用HBaseAdmin.checkHBaseAvailable(hConf)来检查HBase是否良好。我用maven构建,我的maven POM文件如下所示:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.kpmg.tabi</groupId>
<artifactId>kshc1</artifactId>
<version>0.0.1-SNAPSHOT</version>
<build>
<plugins>
<!-- any other plugins -->
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
</plugin>
</plugins>
</build>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.5.2</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>1.5.2</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>1.1.2</version>
<exclusions>
<exclusion>
<groupId>asm</groupId>
<artifactId>asm</artifactId>
</exclusion>
<exclusion>
<groupId>org.jboss.netty</groupId>
<artifactId>netty</artifactId>
</exclusion>
<exclusion>
<groupId>io.netty</groupId>
<artifactId>netty</artifactId>
</exclusion>
<exclusion>
<groupId>commons-logging</groupId>
<artifactId>commons-logging</artifactId>
</exclusion>
<exclusion>
<groupId>org.jruby</groupId>
<artifactId>jruby-complete</artifactId>
</exclusion>
</exclusions>
</dependency>
</dependencies>
</project>
我使用spark-submit
来执行我的任务。要提交的命令为spark-submit --class HBaseSource appName-jar-with-dependencies.jar
。然后出现死锁,这是日志信息的一部分。在这之后,这个应用程序就卡住了,无法继续运行。
16/08/15 17:03:02 INFO Executor: Starting executor ID driver on host localhost
16/08/15 17:03:02 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 56566.
16/08/15 17:03:02 INFO NettyBlockTransferService: Server created on 56566
16/08/15 17:03:02 INFO BlockManagerMaster: Trying to register BlockManager
16/08/15 17:03:02 INFO BlockManagerMasterEndpoint: Registering block manager localhost:56566 with 530.0 MB RAM, BlockManagerId(driver, localhost, 56566)
16/08/15 17:03:02 INFO BlockManagerMaster: Registered BlockManager
not null pointer
beginning of the test
16/08/15 17:03:02 INFO RecoverableZooKeeper: Process identifier=hconnection-0x5c60b0a0 connecting to ZooKeeper ensemble=10.1.188.121:2181
16/08/15 17:03:02 INFO ZooKeeper: Client environment:zookeeper.version=3.4.6-4--1, built on 02/11/2016 06:47 GMT
16/08/15 17:03:02 INFO ZooKeeper: Client environment:host.name=useomlxd00007.nix.us.kworld.kpmg.com
16/08/15 17:03:02 INFO ZooKeeper: Client environment:java.version=1.8.0_71
16/08/15 17:03:02 INFO ZooKeeper: Client environment:java.vendor=Oracle Corporation
16/08/15 17:03:02 INFO ZooKeeper: Client environment:java.home=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.71-1.b15.el6_7.x86_64/jre
16/08/15 17:03:02 INFO ZooKeeper: Client environment:java.class.path=/usr/hdp/current/spark-client/conf/:/usr/hdp/2.3.4.7-4/spark/lib/spark-assembly-1.5.2.2.3.4.7-4-hadoop2.7.1.2.3.4.7-4.jar:/usr/hdp/2.3.4.7-4/spark/lib/datanucleus-rdbms-3.2.9.jar:/usr/hdp/2.3.4.7-4/spark/lib/datanucleus-api-jdo-3.2.6.jar:/usr/hdp/2.3.4.7-4/spark/lib/datanucleus-core-3.2.10.jar:/usr/hdp/current/hadoop-client/conf/:/usr/hdp/2.3.4.7-4/hadoop/lib/hadoop-lzo-0.6.0.2.3.4.7-4.jar
16/08/15 17:03:02 INFO ZooKeeper: Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
16/08/15 17:03:02 INFO ZooKeeper: Client environment:java.io.tmpdir=/tmp
16/08/15 17:03:02 INFO ZooKeeper: Client environment:java.compiler=<NA>
16/08/15 17:03:02 INFO ZooKeeper: Client environment:os.name=Linux
16/08/15 17:03:02 INFO ZooKeeper: Client environment:os.arch=amd64
16/08/15 17:03:02 INFO ZooKeeper: Client environment:os.version=2.6.32-573.8.1.el6.x86_64
16/08/15 17:03:02 INFO ZooKeeper: Client environment:user.name=username
16/08/15 17:03:02 INFO ZooKeeper: Client environment:user.home=homePath
16/08/15 17:03:02 INFO ZooKeeper: Client environment:user.dir=dirPath
16/08/15 17:03:02 INFO ZooKeeper: Initiating client connection, connectString=ipAddr:2181 sessionTimeout=90000 watcher=hconnection-0x5c60b0a00x0, quorum=ipAddr:2181, baseZNode=/hbase-secure
16/08/15 17:03:02 INFO ClientCnxn: Opening socket connection to server ipAddr/ipAddr:2181. Will not attempt to authenticate using SASL (unknown error)
16/08/15 17:03:02 INFO ClientCnxn: Socket connection established to 10.1.188.121/10.1.188.121:2181, initiating session
16/08/15 17:03:02 INFO ClientCnxn: Session establishment complete on server 10.1.188.121/10.1.188.121:2181, sessionid = 0x1568e5a3fde008c, negotiated timeout = 40000
16/08/15 17:03:03 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
16/08/15 17:03:03 INFO ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x1568e5a3fde008c
16/08/15 17:03:03 INFO ZooKeeper: Session: 0x1568e5a3fde008c closed
16/08/15 17:03:03 INFO ClientCnxn: EventThread shut down
我确信IP地址和端口号是正确的,因为我可以使用Java API成功连接到HBase。我的问题是为什么这个简单的小spark应用程序会出现死锁。spark还有其他连接HBase的方式吗?
此错误是由于版本不匹配。我用POM文件中声明的依赖关系来构建我的项目,并通过使用这个插件将外部jar包含到我的项目中。
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
</plugin>
当我删除这个插件以排除在maven中声明的jar并使用集群中提供的库时,问题就解决了。但是我仍然想知道为什么会发生这种情况,因为maven中声明的库版本与集群中提供的库版本相同。
无论如何这个问题似乎lib版本不匹配