我正在尝试使用jersey Rest-API通过java-Spark程序从HBASE表中获取记录,然后我收到下面提到的错误,但是当我通过spark-Jar访问HBase表时,代码正在执行而没有错误。
我有一个用于 Hbase 的 2 个工作节点和 2 个用于 Spark 的工作节点,它们由同一个主节点维护。
WARN TaskSetManager:在阶段 0.0 中丢失任务 1.0(TID 1,172.31.16.140):java.lang.IllegalState异常:未读块数据 at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:69) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:95) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:194) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)
好吧,我可能知道你的问题,因为我刚刚经历过.
原因很可能是错过了一些 HBase jar,因为在 Spark 运行的过程中,Spark 需要通过 HBase jar 来读取数据,如果不存在,那么一些异常就会抛出,该怎么办? 这很容易
在提交作业之前,您需要添加参数 --jar 并加入以下一些 jar:
--罐子/ROOT/server/hive/lib/hive-hbase-handler-1.2.1.jar,
/ROOT/server/hbase/lib/hbase-client-0.98.12-hadoop2.jar,
/ROOT/server/hbase/lib/hbase-common-0.98.12-hadoop2.jar,
/ROOT/server/hbase/lib/hbase-server-0.98.12-hadoop2.jar,
/ROOT/server/hbase/lib/hbase-hadoop2-compat-0.98.12-hadoop2.jar,
/ROOT/server/hbase/lib/guava-12.0.1.jar,
/ROOT/server/hbase/lib/hbase-protocol-0.98.12-hadoop2.jar,
/ROOT/server/hbase/lib/htrace-core-2.04.jar
如果可以,享受它!
在提交使用 java API 实现的 Spark 作业时,我在 CDH5.4.0 中遇到了同样的问题,这是我的解决方案:
解决方案 1:使用 Spark-submit:
--jars zookeeper-3.4.5-cdh5.4.0.jar,
hbase-client-1.0.0-cdh5.4.0.jar,
hbase-common-1.0.0-cdh5.4.0.jar,
hbase-server1.0.0-cdh5.4.0.jar,
hbase-protocol1.0.0-cdh5.4.0.jar,
htrace-core-3.1.0-incubating.jar,
// custom jars which are needed in the spark executors
解决方案2:在代码中使用SparkConf:
SparkConf.setJars(new String[]{"zookeeper-3.4.5-cdh5.4.0.jar",
"hbase-client-1.0.0-cdh5.4.0.jar",
"hbase-common-1.0.0-cdh5.4.0.jar",
"hbase-server1.0.0-cdh5.4.0.jar",
"hbase-protocol1.0.0-cdh5.4.0.jar",
"htrace-core-3.1.0-incubating.jar",
// custom jars which are needed in the spark executors
});
到总结
该问题是由Spark项目中缺少jar引起的,您需要将这些jar添加到项目类路径中,此外,使用上述2种解决方案来帮助将这些jar分发到Spark群集。
CDP/CDH:
Step1:将 hbase-site.xml 文件复制到/etc/spark/conf/ 目录中。cp/opt/cloudera/parcels/CDH/lib/hbase/conf/hbase-site.xml/etc/spark/conf/
第 2 步:将以下库添加到 spark-submit/spark-shell。
/opt/cloudera/parcels/CDH/jars/hive-hbase-handler-*.jar
/opt/cloudera/parcels/CDH/lib/hbase/hbase-client-*.jar
/opt/cloudera/parcels/CDH/lib/hbase/hbase-common-*.jar
/opt/cloudera/parcels/CDH/lib/hbase/hbase-server-*.jar
/opt/cloudera/parcels/CDH/lib/hbase/hbase-hadoop2-compat-*.jar
/opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol-*.jar
/opt/cloudera/parcels/CDH/jars/guava-28.1-jre.jar
/opt/cloudera/parcels/CDH/jars/htrace-core-3.2.0-incubating.jar
火花壳:
sudo -u hive spark-shell --master yarn --jars /opt/cloudera/parcels/CDH/jars/hive-hbase-handler-*.jar, /opt/cloudera/parcels/CDH/lib/hbase/hbase-client-*.jar, /opt/cloudera/parcels/CDH/lib/hbase/hbase-common-*.jar, /opt/cloudera/parcels/CDH/lib/hbase/hbase-server-*.jar, /opt/cloudera/parcels/CDH/lib/hbase/hbase-hadoop2-compat-*.jar, /opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol-*.jar,/opt/cloudera/parcels/CDH/jars/guava-28.1-jre.jar,/opt/cloudera/parcels/CDH/jars/htrace-core-3.2.0-incubating.jar --files /etc/spark/conf/hbase-site.xml