如何使用kedro写HDFS

我正在尝试将我的Kedro管道输出到HDFS文件系统。但我在网上找不到怎么做，也找不到Kedro的文件。如果有人在目录中配置了kedro，请分享一个示例代码。

还有如何使用凭据安全地连接hdfs

我有熊猫数据框架中的数据。

这个目录的条目。我在哪里提到凭据

在您的目录中，您可以定义像hdfs://user@server:port/path/to/data这样的文件路径

https://kedro.readthedocs.io/en/stable/data/data_catalog.html specifying-the-location-of-the-dataset

假设您可以从Kedro(独立spark)外部写入hdfs，这应该是直接从Kedro

在目录文件中使用sparkDataSet，并在spark.yml中定义hive元存储等属性，应该是它

然后，像上面提到的Rahul一样，您需要指定您想要写入的hdfs位置的完整路径，如果您仍然面临问题，请分享一些快照

dataset_name:
type: spark.SparkDataSet
filepath: hdfs://your_bucket/location/file.parq

相关内容