火花卡桑德拉连接器sbt装配故障



我对Scala环境相当陌生。我在尝试用DataStax连接器组装Scala Spark作业时收到一个deduplicate错误。如果有什么建议可以解决这个问题,我将不胜感激。

我的系统:

  • 通过brew安装的最新Scala(2.11.7)
  • 通过brew安装的最新Spark(2.10.5)
  • 通过brew安装的最新SBT(0.13.9)
  • SBT程序集插件已安装

我的build.sbt:

name := "spark-test"
version := "0.0.1"
scalaVersion := "2.11.7"
// additional libraries
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.0" %     "provided"
libraryDependencies += "com.datastax.spark" %% "spark-cassandra-connector" % "1.5.0-M3"

控制台:

$ sbt assembly
...
[error] 353 errors were encountered during merge
java.lang.RuntimeException: deduplicate: different file contents found in the following:
/Users/bob/.ivy2/cache/io.netty/netty-all/jars/netty-all-4.0.29.Final.jar:META-INF/io.netty.versions.properties 
... 

正如我在评论中所说,这是由于sbt不知道如何处理重复的文件。这可能是由依赖于同一库的不同版本的2个依赖项引起的。所以你需要决定使用什么策略——检查sbt组装文档,但这些都是"保持第一"、"保持最后"等。

作为参考,这里是我的合并策略块,用于一个没有太多依赖关系的spark项目:

assemblyMergeStrategy in assembly := {
  case x if x.endsWith(".class") => MergeStrategy.last
  case x if x.endsWith(".properties") => MergeStrategy.last
  case x if x.contains("/resources/") => MergeStrategy.last
  case x if x.startsWith("META-INF/mailcap") => MergeStrategy.last
  case x if x.startsWith("META-INF/mimetypes.default") => MergeStrategy.first
  case x if x.startsWith("META-INF/maven/org.slf4j/slf4j-api/pom.") => MergeStrategy.first
  case x =>
    val oldStrategy = (assemblyMergeStrategy in assembly).value
    if (oldStrategy == MergeStrategy.deduplicate)
      MergeStrategy.first
    else
      oldStrategy(x)
}
// this jar caused issues so I just exclude it completely
assemblyExcludedJars in assembly := {
  val cp = (fullClasspath in assembly).value
  cp filter {_.data.getName == "jetty-util-6.1.26.jar"}
}

最新更新