小贝子编程

ZipWithIndex在Pyspark中失败

本文关键字：失败 Pyspark ZipWithIndex apache-spark pyspark apache-spark-sql apache-spark-ml
更新时间 : 2023-09-08
英文 : zipWithIndex fails in PySpark

我有一个像这样的rdd

>>> termCounts.collect()
[(2, 'good'), (2, 'big'), (1, 'love'), (1, 'sucks'), (1, 'sachin'), (1, 'formulas'), (1, 'batsman'), (1, 'time'), (1, 'virat'), (1, 'modi')]

当将其拉开以创建字典时，它会给我一些随机输出

>>> vocabulary = termCounts.map(lambda x: x[1]).zipWithIndex().collectAsMap()
>>> vocabulary
{'formulas': 5, 'good': 0, 'love': 2, 'modi': 9, 'big': 1, 'batsman': 6, 'sucks': 3, 'time': 7, 'virat': 8, 'sachin': 4}

这是预期的输出吗？我想用每个单词作为键创建一个词典，它们各自的数量为value

您需要像这样写这样的文字和事件，

vocabulary =termCounts.map(lambda x: (x[1], x[0])).collectAsMap()

顺便说一句，您编写的代码将打印列表中的一对单词和索引。

ZipWithIndex在Pyspark中失败

相关内容

最新更新

热门标签：