我对Scala环境相当陌生。我在尝试用DataStax连接器组装Scala Spark作业时收到一个deduplicate
错误。如果有什么建议可以解决这个问题,我将不胜感激。
我的系统:
- 通过brew安装的最新Scala(2.11.7)
- 通过brew安装的最新Spark(2.10.5)
- 通过brew安装的最新SBT(0.13.9)
- SBT程序集插件已安装
我的build.sbt
:
name := "spark-test"
version := "0.0.1"
scalaVersion := "2.11.7"
// additional libraries
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.0" % "provided"
libraryDependencies += "com.datastax.spark" %% "spark-cassandra-connector" % "1.5.0-M3"
控制台:
$ sbt assembly
...
[error] 353 errors were encountered during merge
java.lang.RuntimeException: deduplicate: different file contents found in the following:
/Users/bob/.ivy2/cache/io.netty/netty-all/jars/netty-all-4.0.29.Final.jar:META-INF/io.netty.versions.properties
...
正如我在评论中所说,这是由于sbt不知道如何处理重复的文件。这可能是由依赖于同一库的不同版本的2个依赖项引起的。所以你需要决定使用什么策略——检查sbt组装文档,但这些都是"保持第一"、"保持最后"等。
作为参考,这里是我的合并策略块,用于一个没有太多依赖关系的spark项目:
assemblyMergeStrategy in assembly := {
case x if x.endsWith(".class") => MergeStrategy.last
case x if x.endsWith(".properties") => MergeStrategy.last
case x if x.contains("/resources/") => MergeStrategy.last
case x if x.startsWith("META-INF/mailcap") => MergeStrategy.last
case x if x.startsWith("META-INF/mimetypes.default") => MergeStrategy.first
case x if x.startsWith("META-INF/maven/org.slf4j/slf4j-api/pom.") => MergeStrategy.first
case x =>
val oldStrategy = (assemblyMergeStrategy in assembly).value
if (oldStrategy == MergeStrategy.deduplicate)
MergeStrategy.first
else
oldStrategy(x)
}
// this jar caused issues so I just exclude it completely
assemblyExcludedJars in assembly := {
val cp = (fullClasspath in assembly).value
cp filter {_.data.getName == "jetty-util-6.1.26.jar"}
}