Spark: Fat-JAR crashing Zeppelin with NullPointerException



注意:这不是在Zeppelin 0.7.1中运行Spark代码时获取NullPointerException的副本


我在Amazon EMRApache Zeppelin遇到了这个障碍.我正在尝试将一个胖罐(位于Amazon S3)装入Spark interpreter.一旦胖罐被加载,ZeppelinSpark interpreter拒绝使用下面的堆栈跟踪

java.lang.NullPointerException at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:38) at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:33) at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext_2(SparkInterpreter.java:398) at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:387) at org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:146) at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:843) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:70) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:491) at org.apache.zeppelin.scheduler.Job.run(Job.java:175) at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

即使是像val str: String = "sample string"这样访问jar中的任何内容的简单Scala语句也会生成上述错误日志。从解释器的依赖项中删除jar可以解决此问题;很明显,它只与罐子有关

有问题的fat-jar是由Jenkins使用sbt assembly生成的。该项目(我正在加载谁的胖罐子)在父模块中包含两个submodules


虽然共享所有 3 个子模块的完整build.sbt文件和依赖项文件是不切实际的,但我附上了子模块中使用的所有依赖项和配置的详尽列表

AWS 依赖项

  • "com.amazonaws" % "aws-java-sdk-s3" % "1.11.218"
  • "com.amazonaws" % "aws-java-sdk-emr" % "1.11.218"
  • "com.amazonaws" % "aws-java-sdk-ec2" % "1.11.218"

Spark 依赖项(如allSparkdependencies.map(_ % "provided")提供的那样给出)

  • "org.apache.spark" %% "spark-core" % "2.2.0"
  • "org.apache.spark" %% "spark-sql" % "2.2.0"
  • "org.apache.spark" %% "spark-hive" % "2.2.0"
  • "org.apache.spark" %% "spark-streaming" % "2.2.0"

测试依赖关系

  • "org.scalatest" %% "scalatest" % "3.0.3" % Test
  • "com.holdenkarau" %% "spark-testing-base" % "2.2.0_0.7.2" % "test"

其他依赖项

  • "com.github.scopt" %% "scopt" % "3.7.0"
  • "com.typesafe" % "config" % "1.3.1"
  • "com.typesafe.play" %% "play-json" % "2.6.6"
  • "joda-time" % "joda-time" % "2.9.9"
  • "mysql" % "mysql-connector-java" % "5.1.41"
  • "com.github.gilbertw1" %% "slack-scala-client" % "0.2.2"
  • "org.scalaj" %% "scalaj-http" % "2.3.0"

框架版本

  • Scala v2.11.11
  • SBT v1.0.3
  • Spark v2.2.0
  • Zeppelin v0.7.3

SBT 配置

// cache options
offline := false
updateOptions := updateOptions.value.withCachedResolution(true)
// aggregate options
aggregate in assembly := false
aggregate in update := false
// fork options
fork in Test := true
// merge strategy
assemblyMergeStrategy in assembly := {
case m if m.toLowerCase.endsWith("manifest.mf") => MergeStrategy.discard
case m if m.startsWith("META-INF") => MergeStrategy.discard
case PathList("javax", "servlet", _@_*) => MergeStrategy.first
case PathList("org", "apache", _@_*) => MergeStrategy.first
case PathList("org", "jboss", _@_*) => MergeStrategy.first
case "about.html" => MergeStrategy.rename
case "reference.conf" => MergeStrategy.concat
case "application.conf" => MergeStrategy.concat
case _ => MergeStrategy.first
}

虽然问题得到了解决,但老实说,我无法深入了解它的根本原因(因此是它的真正解决方案)。在徒劳地严格浏览论坛之后,我最终手动比较(并重新调整)我的代码(git diff)与最后一个已知的工作build。(!)

从那时起已经有一段时间了,现在当我检查我的git历史记录时,我发现它(解决此问题的commit)包含重构或与构建相关的东西因此,我最好的猜测是这是一个与构建相关的问题。我正在放下我对build.sbt所做的所有更改。

我重申,我无法确定是否由于这些特定的修改而解决了问题,因此请继续寻找。我会让这个问题保持开放,直到找到决定性的原因(和解决方案)。


将以下依赖项标记为provided,如此处所述:

"org.apache.spark" %% "spark-core" % sparkVersion
"org.apache.spark" %% "spark-sql" % sparkVersion
"org.apache.spark" %% "spark-hive" % sparkVersion
"org.apache.spark" %% "spark-streaming" % sparkVersion
"com.holdenkarau" %% "spark-testing-base" % "2.2.0_0.7.2" % "test"

覆盖以下fasterxml.jackson依赖项

dependencyOverrides += "com.fasterxml.jackson.core" % "jackson-core" % "2.6.5"
dependencyOverrides += "com.fasterxml.jackson.core" % "jackson-databind" % "2.6.5"
dependencyOverrides += "com.fasterxml.jackson.module" % "jackson-module-scala_2.11" %
"2.6.5"

我想指出一件事:我最初认为是罪魁祸首的以下LogBack依赖实际上与此无关(我们过去曾遇到过LogBack问题,所以它适合我们责怪它)。虽然我们在解决时将其删除,但从那时起我们又添加了它。

"ch.qos.logback" % "logback-classic" % "1.2.3"

最新更新