Pyarrow无法识别地图数据类型

有些数据以镶木地板文件格式存储，我想使用Dask读取这些数据。不幸的是，它无法解释map数据类型。有没有一种方法可以在不依赖Spark的情况下读取这些数据？我正在使用pyarrow==6.0.1

示例：

import dask.dataframe as dd
df = dd.read_parquet("s3://data/part=0", engine='pyarrow')
df.compute()

错误：

ArrowNotImplementedError: Not implemented type for Arrow list to pandas: map<string, double>

不确定这是否适用于您的案例(有一个可复制的代码段可能会有所帮助(，但基本的delayed包装器可能会有帮助，比如：

@delayed
def custom_load(file_path):
# xx could be pandas, pyarrow or something else that opens the file without a problem
df = xx.open_file(file_path) 
...
return df
df = dd.from_delayed([custom_load(f) for f in list_files])

相关内容

最新更新

热门标签：