小贝子编程

apache Spark - 如何指定 saveAsTable 将文件保存到的路径

本文关键字：保存文件路径 saveAsTable Spark 何指定 apache apache-spark pyspark apache-spark-sql
更新时间 : 2023-08-21
英文 : apache spark - How to specify the path where saveAsTable saves files to?

我正在尝试使用DataFrameWriter将数据帧保存到Spark1.4中的pyspark中的S3

df = sqlContext.read.format("json").load("s3a://somefile")
df_writer = pyspark.sql.DataFrameWriter(df)
df_writer.partitionBy('col1')
         .saveAsTable('test_table', format='parquet', mode='overwrite')

镶木地板文件转到"/tmp/hive/warehouse/...."这是我驱动程序上的本地 TMP 目录。

我确实在hive-site中设置了hive.metastore.warehouse.dir.xml为"s3a：//...."位置，但Spark似乎不尊重我的蜂巢仓库设置。

使用 path 。

df_writer.partitionBy('col1')
         .saveAsTable('test_table', format='parquet', mode='overwrite',
                      path='s3a://bucket/foo')

您可以使用insertInto(tablename)覆盖自1.4以来的现有表

apache Spark - 如何指定 saveAsTable 将文件保存到的路径

相关内容

最新更新

热门标签：