将数据帧保存在 scala 智能中会引发异常



我正在尝试使用Intellij Spark Scala将CSV或XML文件加载到预先存在的hive表中,然后在保存数据帧的最后一步中给出以下异常。

具有讽刺意味的是:下面的代码在火花壳中运行良好,在所有四种情况下都没有任何问题。

1. 当我使用 Hive 上下文和 Insertinto(( 时。

val sparkConf = new SparkConf().setAppName("TEST")
val sc = new SparkContext(sparkConf)
val hiveContext = new HiveContext(sc)
hiveContext.setConf("hive.exec.dynamic.partition", "true")
hiveContext.setConf("hive.exec.dynamic.partition.mode", "nonstrict")
println("CONFIG DONE!!!!!")
val xml = hiveContext.read.format("com.databricks.spark.xml").option("rowTag","employee").load("/PUBLIC_TABLESPACE/updatedtest1.xml")
println("XML LOADED!!!!!!")
xml.write.format("parquet").mode("overwrite").partitionBy("designation").insertInto("test2")
println("TABLE SAVED!!!!!!!")

线程 "main" 中的异常 java.lang.NoSuchMethodException: org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions(org.apache.hadoop.fs.Path, java.lang.String, java.util.Map, boolean, int, boolean, boolean, boolean(

2.当我使用Hive Context和SaveAsTable((时。

val sparkConf = new SparkConf().setAppName("TEST")
val sc = new SparkContext(sparkConf)
val hiveContext = new HiveContext(sc)
hiveContext.setConf("hive.exec.dynamic.partition", "true")
hiveContext.setConf("hive.exec.dynamic.partition.mode", "nonstrict")
println("CONFIG DONE!!!!!")
val xml = hiveContext.read.format("com.databricks.spark.xml").option("rowTag","employee").load("/PUBLIC_TABLESPACE/updatedtest1.xml")
println("XML LOADED!!!!!!")
xml.write.format("parquet")
.mode("overwrite")
.partitionBy("designation")
.saveAsTable("test2")

线程 "main" 中的异常 java.lang.NoSuchMethodException: org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions(org.apache.hadoop.fs.Path, java.lang.String, java.util.Map, boolean, int, boolean, boolean, boolean(

3. 当我使用 SQL 上下文和 Insertinto(( 时。

val sparkConf = new SparkConf().setAppName("TEST")
val sc = new SparkContext(sparkConf)
val hiveContext = new SQLContext(sc)
hiveContext.setConf("hive.exec.dynamic.partition", "true")
hiveContext.setConf("hive.exec.dynamic.partition.mode", "nonstrict")
println("CONFIG DONE!!!!!")
val xml = hiveContext.read.format("com.databricks.spark.xml").option("rowTag","employee").load("/PUBLIC_TABLESPACE/updatedtest1.xml")
println("XML LOADED!!!!!!") xml.write.format("parquet").mode("overwrite").partitionBy("designation").insertInto("test2")
println("TABLE SAVED!!!!!!!")

线程"main"中的异常 org.apache.spark.sql.AnalysisException: 找不到表: test2;

4. 当我使用 SQL 上下文并保存为表(( 时。

val sparkConf = new SparkConf().setAppName("TEST")
val sc = new SparkContext(sparkConf)
val hiveContext = new SQLContext(sc)
hiveContext.setConf("hive.exec.dynamic.partition", "true")
hiveContext.setConf("hive.exec.dynamic.partition.mode", "nonstrict")
println("CONFIG DONE!!!!!") 
val xml = hiveContext.read.format("com.databricks.spark.xml").option("rowTag","employee").load("/PUBLIC_TABLESPACE/updatedtest1.xml")
println("XML LOADED!!!!!!") xml.write.format("parquet").mode("overwrite").partitionBy("designation").saveAsTable("test2")
println("TABLE SAVED!!!!!!!")

线程"main"中的异常 java.lang.RuntimeException:使用 SQLContext 创建的表必须是 TEMPORARY。请改用 HiveContext。

使用构建进行编辑。SBT 文件:

BUILD.SBT File: name := "testonSpark"
version := "1.0"
scalaVersion := "2.10.4"
libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "1.6.0"
libraryDependencies += "com.databricks" % "spark-csv_2.10" % "1.5.0"
libraryDependencies += "org.apache.spark" % "spark-hive_2.10" % "1.6.0"

尝试使用 sbt 文件作为

val sparkVersion = "1.6.0"
resolvers ++= Seq(
"apache-snapshots" at "http://repository.apache.org/snapshots/"
)
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-sql" % sparkVersion,
"org.apache.spark" %% "spark-hive" % sparkVersion,
"org.apache.spark" %% "spark-mllib" % sparkVersion
)
libraryDependencies += "com.databricks" % "spark-csv_2.10" % "1.5.0"

最新更新