aws库与spark 3.1.1兼容



我有一个sparkVersion的spark算子:"3.1.1";并且希望将其用于到minIO的结构化流传输。然而,我还没有找到一个兼容的库组合,用于hadoop2.7.0以外的任何更新版本。(不支持新的s3a://path(

是否有一组兼容的spark/hadoop/aws库用于3.1.1版本的spark?

我当前在sbt中的依赖关系应该基于https://mvnrepository.com/依赖项,但它们没有(NoSuchMethodError(:

scalaVersion := "2.12.0"
lazy val Versions = new {
val spark = "3.1.1"
val hadoop = "3.2.0"
val scalatest = "3.0.4"
}
"org.apache.spark" %% "spark-core" % Versions.spark % Provided
, "org.apache.spark" %% "spark-sql" % Versions.spark % Provided
, "org.apache.spark" %% "spark-hive" % Versions.spark % Provided
, "org.scalatest"  %% "scalatest" % Versions.scalatest % Test
, "org.apache.hadoop" % "hadoop-aws" % Versions.hadoop
, "org.apache.hadoop" % "hadoop-common" % Versions.hadoop
, "org.apache.hadoop" % "hadoop-mapreduce-client-core" % Versions.hadoop
, "org.apache.hadoop" % "hadoop-client" %  Versions.hadoop
, "com.typesafe" % "config" % "1.3.1"
, "com.github.scopt" %% "scopt" % "3.7.0"
, "com.github.melrief" %% "purecsv" % "0.1.1"
, "joda-time" % "joda-time" % "2.9.9"

非常感谢的帮助

此库组合有效:

"org.apache.spark" %% "spark-core" % "3.1.1" % Provided,
"org.apache.spark" %% "spark-sql" % "3.1.1" % Provided,
"org.apache.hadoop" % "hadoop-aws" % "3.2.0",
"org.apache.hadoop" % "hadoop-common" % "3.2.0",
"org.apache.hadoop" % "hadoop-client" % "3.2.0",
"org.apache.hadoop" % "hadoop-mapreduce-client-core" % "3.2.0",
"org.apache.hadoop" % "hadoop-minikdc" % "3.2.0",
"com.amazonaws" % "aws-java-sdk-bundle" % "1.11.375",
"com.typesafe" % "config" % "1.3.1"
, "joda-time" % "joda-time" % "2.9.9"

诀窍是将此图像用于sparkgcr.io/spark-operator/spark:v3.1.1-hadoop3,因为默认的图像即使用于spark 3.1.1 也仍然具有Hadoop 2.7

相关内容

  • 没有找到相关文章

最新更新