运行时异常java.lang.NoSuchMethodError.com.google.common.base.Opti



当前我正在尝试从Spark连接到BigQuery。我已经使用sbt assembly插件构建了fat jar文件,并尝试使用spark-submit在本地模式下启动作业。Spark作业一启动,我就观察到java.lang.NoSuchMethodError: com.google.common.base.Optional.toJavaUtil()Ljava/util/Optional;异常。

以下是异常跟踪,

Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Optional.toJavaUtil()Ljava/util/Optional;
at com.google.cloud.spark.bigquery.SparkBigQueryConfig.getOption(SparkBigQueryConfig.java:265)
at com.google.cloud.spark.bigquery.SparkBigQueryConfig.getOption(SparkBigQueryConfig.java:256)
at com.google.cloud.spark.bigquery.SparkBigQueryConfig.lambda$getOptionFromMultipleParams$7(SparkBigQueryConfig.java:273)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.Spliterators$IteratorSpliterator.tryAdvance(Spliterators.java:1812)
at java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:126)
at java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:499)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:486)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at java.util.stream.FindOps$FindOp.evaluateSequential(FindOps.java:152)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.findFirst(ReferencePipeline.java:464)
at com.google.cloud.spark.bigquery.SparkBigQueryConfig.getOptionFromMultipleParams(SparkBigQueryConfig.java:275)
at com.google.cloud.spark.bigquery.SparkBigQueryConfig.from(SparkBigQueryConfig.java:119)
at com.google.cloud.spark.bigquery.BigQueryRelationProvider.createSparkBigQueryConfig(BigQueryRelationProvider.scala:133)
at com.google.cloud.spark.bigquery.BigQueryRelationProvider.createRelationInternal(BigQueryRelationProvider.scala:71)
at com.google.cloud.spark.bigquery.BigQueryRelationProvider.createRelation(BigQueryRelationProvider.scala:45)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:340)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:174)
at com.bigquery.OwnDataSetReader$.delayedEndpoint$com$$bigquery$OwnDataSetReader$1(OwnDataSetReader.scala:18)
at com.bigquery.OwnDataSetReader$delayedInit$body.apply(OwnDataSetReader.scala:6)
at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.App$class.main(App.scala:76)
at com..bigquery.OwnDataSetReader$.main(OwnDataSetReader.scala:6)
at com..bigquery.OwnDataSetReader.main(OwnDataSetReader.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

在对异常进行了一些研究后,我发现由于guava库的多个版本,可能会发生这种异常。我确保在最终构建的jar中没有这样的冲突,我还通过反编译我的jar文件来验证它。没有观察到冲突,但问题仍然存在:(。下面是build.sbt片段,

name := "bigquer-connector"
version := "0.1"
scalaVersion := "2.11.8"
test in assembly := {}
assemblyJarName in assembly := "BigQueryConnector.jar"
assemblyMergeStrategy in assembly := {
case x if x.startsWith("META-INF") => MergeStrategy.discard
case x =>
val oldStrategy = (assemblyMergeStrategy in assembly).value
oldStrategy(x)
}
libraryDependencies += ("com.google.cloud.spark" %% "spark-bigquery" % "0.18.0")
.exclude("com.google.guava", "guava")
.exclude("org.glassfish.jersey.bundles.repackaged", "jersey-guava")
libraryDependencies += "com.google.guava" % "guava" % "30.0-jre"
libraryDependencies += ("org.apache.spark" % "spark-core_2.11" % "2.3.1")
.exclude("com.google.guava", "guava")
.exclude("org.glassfish.jersey.bundles.repackaged", "jersey-guava")

libraryDependencies += ("org.apache.spark" % "spark-sql_2.11" % "2.3.1")
.exclude("com.google.guava", "guava")
.exclude("org.glassfish.jersey.bundles.repackaged", "jersey-guava")

以下是主要类别,

object OwnDataSetReader extends App {
val session = SparkSession.builder()
.appName("big-query-connector")
.config(getConf)
.getOrCreate()
session.read
.format("com.google.cloud.spark.bigquery")
.option("viewsEnabled", true)
.option("parentProject", "my_gcp_project")
.option("credentialsFile", "<path to private json file>")
.load("my_gcp_data_set.my_gcp_view")
.show(2)
private def getConf : SparkConf = {
val sparkConf = new SparkConf
sparkConf.setAppName("biq-query-connector")
sparkConf.setMaster("local[*]")
sparkConf
}
}

用于在我的本地终端中启动Spark的命令:spark-submit --deploy-mode client --class com.bigquery.OwnDataSetReader BigQueryConnector.jar。我在本地机器上使用spark版本2.3.x

我能够解决这个问题。它在我的build.sbt文件中使用了合并策略。

assemblyMergeStrategy in assembly := {
case x if x.startsWith("META-INF") => MergeStrategy.discard
case x =>
val oldStrategy = (assemblyMergeStrategy in assembly).value
oldStrategy(x)
}

我正在丢弃META-INF文件夹中的文件。spark-bigquery连接器的META-INF文件夹中的配置文件在库引导过程中使用。因此,我没有放弃,而是改变了下面的策略

case PathList("META-INF", xs @ _*) =>
(xs map {_.toLowerCase}) match {
case ("manifest.mf" :: Nil) | ("index.list" :: Nil) | ("dependencies" :: Nil) | ("license" :: Nil) | ("licence.txt" :: Nil) | ("notice.txt" :: Nil) | ("notice" :: Nil)=>
MergeStrategy.discard
case ps @ (x :: xs) if ps.last.endsWith(".sf") || ps.last.endsWith(".dsa") || ps.contains("license") || ps.contains("notice") =>
MergeStrategy.discard
case "plexus" :: xs =>
MergeStrategy.discard
case "services" :: xs =>
MergeStrategy.filterDistinctLines
case _ => MergeStrategy.last
}

可能是因为com.google.cloud.spark:spark-bigquery_2.11:0.18.1的依赖库之间的版本不匹配。我使用com.google.cloud.spark:bigquery-with-dependences_2.11:0.18.1解决了这个问题,它引入了所有依赖的库。

相关内容

最新更新