如何计算数据集中每行的词频



我在数据集中有一列文本:

Text
This is a long string of words
words have many types
each type represents one thing
thing are different
where are these words

我想计算整个列中每行的单词频率。我期望的结果是这样或其他格式:

Text.                               Count
this is a long string of words     this:1, is :1, a:1, long:1.....
words have many types              words:3, have:1....
each type represents one thing     ......
thing are different                thing:2, are:2
where are these words              .......

我如何使用python来做这个?

Try withCounter:

from collections import Counter
df["Count"] = df['Text'].str.lower().str.split().apply(Counter)
>>> df
Text                                              Count
0  This is a long string of words  {'this': 1, 'is': 1, 'a': 1, 'long': 1, 'strin...
1           words have many types     {'words': 1, 'have': 1, 'many': 1, 'types': 1}
2  each type represents one thing  {'each': 1, 'type': 1, 'represents': 1, 'one':...
3             thing are different             {'thing': 1, 'are': 1, 'different': 1}
4           where are these words     {'where': 1, 'are': 1, 'these': 1, 'words': 1}

相关内容

  • 没有找到相关文章

最新更新