我在数据集中有一列文本:
Text
This is a long string of words
words have many types
each type represents one thing
thing are different
where are these words
我想计算整个列中每行的单词频率。我期望的结果是这样或其他格式:
Text. Count
this is a long string of words this:1, is :1, a:1, long:1.....
words have many types words:3, have:1....
each type represents one thing ......
thing are different thing:2, are:2
where are these words .......
我如何使用python来做这个?
Try withCounter
:
from collections import Counter
df["Count"] = df['Text'].str.lower().str.split().apply(Counter)
>>> df
Text Count
0 This is a long string of words {'this': 1, 'is': 1, 'a': 1, 'long': 1, 'strin...
1 words have many types {'words': 1, 'have': 1, 'many': 1, 'types': 1}
2 each type represents one thing {'each': 1, 'type': 1, 'represents': 1, 'one':...
3 thing are different {'thing': 1, 'are': 1, 'different': 1}
4 where are these words {'where': 1, 'are': 1, 'these': 1, 'words': 1}