从数据帧创建 Hive 表时出错 'java.lang.IllegalArgumentException: 错误的 FS: file:/tmp/spark 预期: hdfs://nameservic



我是火花的新手。我正在尝试开发一个应用程序,该应用程序使用 Spark 1.6 将 json 数据保存到 Hive 表中。这是我的代码:

val rdd = sc.parallelize(Seq(arr.toString)) //arr is the Json array
val dataframe = hiveContext.read.json(rdd)
dataframe.registerTempTable("RiskRecon_tmp")
hiveContext.sql("DROP TABLE IF EXISTS RiskRecon_TOES")
hiveContext.sql("CREATE TABLE RiskRecon_TOES as select * from RiskRecon_tmp")

当我运行这个时,我收到以下错误:

Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: file:/tmp/spark-2c2e53f5-6b5f-462a-afa2-53b8cf5e53f1/scratch_hive_2017-07-12_07-41-07_146_1120449530614050587-1, expected: hdfs://nameservice1
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:660)
at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:480)
at org.apache.hadoop.hive.ql.Context.getStagingDir(Context.java:229)
at org.apache.hadoop.hive.ql.Context.getExternalScratchDir(Context.java:359)
at org.apache.hadoop.hive.ql.Context.getExternalTmpPath(Context.java:437)
at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:132)
at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:127)
at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:276)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
at org.apache.spark.sql.hive.execution.CreateTableAsSelect.run(CreateTableAsSelect.scala:89)
at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145)
at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130)
at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:817)
at test$.main(test.scala:25)
at test.main(test.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

它给了我create table语句的错误。

此错误是什么意思?我这样做的方式是否正确,或者是否有更好的方法将数据帧保存到表中?另外,如果此代码有效,创建的表将是内部表吗?理想情况下,我需要为数据提供一个外部表。

任何帮助将不胜感激。谢谢。

假设df包含存储为dataframe的 JSON 文件的数据:

val df = sqlContext.read.json(rdd)

然后,可以使用saveAsTable将其加载到配置单元表中。请注意,要加载到的配置单元表应已存在于所需位置,因此您可以根据需要创建EXTERNAL表。并且您的 Spark 用户有权将数据写入相应的文件夹。

df.write.mode("append").saveAsTable("database.table_name")

根据您的要求,您可以使用其他几种可用的写入模式,如appendoverwrite等。

最新更新