sparkjob fails with guava error-java.lang.NoSuchMethodError:



我们已经设置了开源apache hadoop集群,包含以下组件。


hadoop - 3.1.4
spark - 3.3.1
hive - 3.1.3

当我们尝试使用下面的命令运行spark示例作业,但失败时,出现以下异常。

/opt/spark-3.3.1/bin/spark-submit --class org.apache.spark.examples.SparkPi --deploy-mode cluster --master yarn --num-executors 1 --driver-memory 1G --executor-memory 1G --executor-cores 1  /opt/spark-3.3.1/examples/jars/spark-examples_2.12-3.3.1.jar

错误:

[2022-12-09 00:05:02.747]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data/hdfsdata2/yarn/local/usercache/spark/filecache/70/__spark_libs__3692263374412677830.zip/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-3.1.4/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2022-12-09 00:05:02,137 INFO util.SignalUtils: Registering signal handler for TERM
2022-12-09 00:05:02,139 INFO util.SignalUtils: Registering signal handler for HUP
2022-12-09 00:05:02,139 INFO util.SignalUtils: Registering signal handler for INT
Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
at org.apache.spark.deploy.SparkHadoopUtil$.$anonfun$appendHiveConfigs$1(SparkHadoopUtil.scala:477)
at org.apache.spark.deploy.SparkHadoopUtil$.$anonfun$appendHiveConfigs$1$adapted(SparkHadoopUtil.scala:476)
at scala.collection.immutable.Stream.foreach(Stream.scala:533)
at org.apache.spark.deploy.SparkHadoopUtil$.appendHiveConfigs(SparkHadoopUtil.scala:476)
at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:430)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:894)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)

调试后,这个错误似乎与guava和它的依赖jar有关。

Hadoop有guava-27.0-jre.jar, spark有guava-14.0-jre.jar。

我删除了spark guava jar,并将guava及其依赖的jar从hadoop lib位置复制到spark jar文件夹。下面是所有番石榴和它的依赖jar的列表。

/opt/spark-3.3.1/jars/animal-sniffer-annotations-1.17.jar
/opt/spark-3.3.1/jars/failureaccess-1.0.jar
/opt/spark-3.3.1/jars/error_prone_annotations-2.2.0.jar
/opt/spark-3.3.1/jars/checker-qual-2.5.2.jar
/opt/spark-3.3.1/jars/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar
/opt/spark-3.3.1/jars/jsr305-3.0.2.jar
/opt/spark-3.3.1/jars/j2objc-annotations-1.1.jar
/opt/spark-3.3.1/jars/guava-27.0-jre.jar

但是错误似乎仍然存在。

有趣的是,当我运行下面的示例spark作业时,它成功了。

/opt/spark-3.3.1/bin/spark-submit --class org.apache.spark.examples.SparkPi --deploy-mode cluster --master yarn --num-executors 1 --driver-memory 1G --executor-memory 1G --executor-cores 1  /opt/spark-3.3.1/examples/jars/spark-examples_2.12-3.3.1.jar 50

因此观察到在命令末尾传递的任何小于50的值都失败,而较大的值则使作业成功。我不确定这背后的原因。

@Saurav Suman,为了避免spark查找hadoop yarn特定jar的混乱,Apache spark文档提供了一个清晰的解决方案。

如果你已经下载了没有hadoop二进制文件的Spark,并且你已经自己设置了hadoop,那么你应该按照下面的链接创建你的SPARK_DIST_CLASSPATH=$(hadoop classpath)https://spark.apache.org/docs/latest/hadoop-provided.html

这将处理Spark获取与hadoop相关的二进制文件。我想火花不会自带番石榴。当您使用过时的Guava或多个Guava文件存在冲突,导致Spark无法决定选择哪个文件时,特别会出现上述错误。希望对你有帮助。

相关内容

最新更新