我有一个元组:
wordsTuple = [(('431949',['python',
'print',
'hellow',
'world',
'at',
'py',
'file',
...]
我想把它改成[(python, 1), (print, 1) ...]
。我怎么能只使用一行代码或一些函数从PySpark实现这一点?
counts = wordsTuple._________________
如果你真的想要固定的"1"作为每个元组的第二项那么它就是
wordsTuple = ('431949',['python', 'print', 'hellow', 'world', 'at', 'py', 'file'])
counts = [(x,1) for x in wordsTuple[1]]
counts
[('python', 1), ('print', 1), ('hellow', 1), ('world', 1), ('at', 1), ('py', 1), ('file', 1)]
如果你正在寻找每个世界出现的次数,那么检查collections.Counter
类