如何实现以百分比计算该单词在文本中的频率的功能



我想要一个函数来计算给定单词在文本中出现的频率,并将结果表示为百分比。我想从失败中读取,然后返回带有百分比的频繁单词

import re

words = re.findall(r"w+", text)
frequencies = most_common(words)
percentages = [(instance, count / len(words)) for instance, count in frequencies]
for word, percentage in percentages:
print("%s %.2f%%" % (word, percentage * 100))

NameError: name 'most_common' is not defined

我想把任何单词传递给函数,函数会计算文本文件中该单词的频率

您可以尝试以下操作:

import re
from collections import Counter

def frequency_in_text(word, text):
words = re.findall(r"w+", text)
total_len = len(words)
frequencies = dict()
for string, freq in Counter(words).items():
frequencies[string] = freq / total_len * 100
return frequencies.get(word)

您可以使用pandas.Series.value_counts()方法:

import pandas as pd
def word_counter(text):
words = pd.Series(re.findall(r"w+", text))
frequencies = words.value_counts(normalize=True)
return frequencies

最新更新