如何计算排序列表中单词的频率

npl = []
for i in df2['Review']:
npl.extend(tb(i).noun_phrases)

这是我用来生成列表的代码，它很长。

['omg',
'populus scramble',
'coffee',
'hack',
'snagged',
'burpple',
'ice cream melts',
'sweet prize 😋',
...]

基本上，我该如何开发一个代码来循环列表，计算单词的频率并显示单词？类似Counter的东西。我花了几个小时浏览这个网站，试图找到一个适合我的代码，但没有成功。

npl.count('coffee')

使用上面的代码可以工作，但它只适用于一个单词

预期输出如下：

{'coffee', '45'
'snagged', '23'
'ice cream melts', '13'}

您可以在数据帧上使用Counter from Library集合，如下所示：

word_counter = Counter()
df2['Review'].str.split(" ").apply(word_counter.update)

但当你有一个列表时，你可以直接将其应用于列表：

word_counter = Counter("your list")

之后你可以循环通过索引和删除一些停止语，如果你愿意的话。你也可以通过做word_counter.most_common(10)来看到最常见的

但要小心的是，你的结果不会是完美的，因为多重写作，如房子或房子。最好的方法是先标记并应用stemmer。

第1版

如果你想要一个最普通的教区，你可以这样做：

dict(word_counter .most_common(10))

这是我的片段：

from collections import Counter
inp = list("abcaddabcabadbcabdabdcbaziyutoigkfdshjkbvaoiuhgbgjkvd^giohdfb")
word_counter = Counter(inp)
dict(word_counter.most_common(4)) => {'a': 9, 'b': 10, 'c': 4, 'd': 8}

第1版

相关内容

最新更新

热门标签：