看起来这会出错
df.write()
.option("mode", "DROPMALFORMED")
.option("compression", "snappy")
.mode("overwrite")
.bucketBy(32,"column")
.sortBy("column")
.parquet("s3://....");
有错误
Exception in thread "main" org.apache.spark.sql.AnalysisException: 'save' does not support bucketing right now; at org.apache.spark.sql.DataFrameWriter.assertNotBucketed(DataFrameWriter.scala:314)
我看到仍然支持saveAsTable("myfile")
但它只在本地写入。作业完成后,我将如何获取该saveAsTable(...)
输出并将其放在 s3 上?
You Can use like below:
df
.write()
.option("mode", "DROPMALFORMED")
.option("compression", "snappy")
.option("path","s3://....")
.mode("overwrite")
.format("parquet")
.bucketBy(32,"column").sortBy("column")
.saveAsTable("tableName");
这将创建一个指向 S3 位置的外部表.option("path","s3://....") 是这里的陷阱