在列表中查找唯一的单词,并对其进行计数,同时跟踪另一个变量



如果给定如下的txt文件,我需要输出:这部电影太棒了!4部年度最佳独立电影。3电影还可以。有史以来最糟糕的表演5奥斯卡电影从开始到结束有两股腐烂的臭味1部最差电影

输入一个短语进行测试:电影太棒了*"电影"出现4次,平均得分3.5*"was"出现2次,平均得分4.0*"惊人"出现1次,平均得分5.0这个短语的平均得分是:4.1666666666667这是一句积极的话。

到目前为止,我已经能够创建一个独特单词的列表,但我一直在尝试不同的方法来配置计数器变量,但没有成功。我也不知道如何使用索引来跟踪评分,因为我把它们分成了单独的单词。有没有一种方法可以在不把所有评论都变成一大堆单词的情况下统计独特的单词?

#remove punctuations and store the cleaned data
clean_data=''
punc = '''!"#$%&'()*+,-./:;<=>?@[]^_`{|}~'''
split_data = data.split('n')
#iterate over every character to check if it's punctuation
for character in data:
if character not in punc:
clean_data += character
else:
clean_data += ''

#store the cleaned reviews into a list of reviews
cleaned_str = clean_data.split('n')

#seperate the ratings and reviews into two seperate lists
#they can be matched via index
score = []
reviews = []
for i in cleaned_str:
score.append(int(i[0]))
new_string = i[2:]
reviews.append(new_string)


words = {}
unique = []
repeats = []
counter = 0 
#add all the words in all the reviews to one long string
#then turn into a list with all the words
reviews_words = ''
for i in reviews:
reviews_words += str(i) + ' '
word_list = reviews_words.split()
for word in word_list:
if word not in unique:
unique.append(word)
counter +=1 
repeats.append(counter)

else:
counter += 1
repeats.append(counter)

如果有任何提示,我将不胜感激!我不想用导入的包来解决这个问题:(

试试这个:

PUNCTUATION = r'''!"#$%&'()*+,-./:;<=>?@[]^_`{|}~'''

def remove_punctuation(string):
new_string = []
for character in string:
if character not in PUNCTUATION:
new_string.append(character)
return "".join(new_string)

scores = {}
counts = {}

with open('text.txt', 'r') as data:
lines = data.readlines()

for line in lines:
line = remove_punctuation(line)
words = line.split()
# Removes first 'word' from line which is the number rating
score = int(words.pop(0))
for word in words:
if word in scores:
scores[word] += score
counts[word] += 1
else:
scores[word] = score
counts[word] = 1
for word in scores:
scores[word] /= counts[word]

这将为您提供字典,其中包含唯一单词的分数和计数(只需打印出分数和计数即可查看(。让我知道它是否有效,因为我没有你使用的文本文件,所以无法真正测试它。

最新更新