小贝子编程

在将DataFrame转换为字典列表时，如何避免这种火花问题

本文关键字：何避免问题火花转换 DataFrame 字典列表在将 python apache-spark pyspark
更新时间 : 2023-09-21
英文 : How do I avoid this spark issue when converting a DataFrame into a list of dictionaries?

我想将我的spark DataFrame转换为字典列表。new_df = list(map(lambda row: row.asDict(), df_base.collect()))

但是当我运行上面的程序时，我不断地得到以下错误。

org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 5 tasks (4.3 GiB) is bigger than spark.driver.maxResultSize 4.0 GiB.

我该如何解决这个问题？有可能做我想做的事吗？

简单的答案是使用df_base.toLocalIterator()而不是collect((。但您真的需要将超过4GB的数据加载到本地python列表中吗？您是考虑df_base.toPandas((还是使用spark来运行所有代码。

new_df = list(map(lambda row: row.asDict(), df_base.toLocalIterator()))

在将DataFrame转换为字典列表时，如何避免这种火花问题

相关内容

最新更新

热门标签：