试图读取文本文件并计算定义组中的单词

我是新手Python用户。我正在尝试创建一个读取文本文件的程序，并搜索该文本是否为某些分组的单词(我通过从CSV读取(。例如，如果我想创建自己的定义"正"，包含"兴奋"，"快乐"one_answers"乐观"的单词，则CSV将包含这些术语。我知道以下是混乱的 - 我正在从CSV中读取的三个"正面"测试器单词中的7个出现的TXT文件，但结果打印为25。我认为它是返回的角色计数，而不是Word Count。代码：

import csv
import string
import re
from collections import Counter
remove = dict.fromkeys(map(ord, 'n' + string.punctuation))
# Read the .txt file to analyze.
with open("test.txt", "r") as f:
    textanalysis = f.read()
    textresult = textanalysis.lower().translate(remove).split()
# Read the CSV list of terms.
with open("positivetest.csv", "r") as senti_file:
    reader = csv.reader(senti_file)
    positivelist = list(reader)
# Convert term list into flat chain.
from itertools import chain
newposlist = list(chain.from_iterable(positivelist))
# Convert chain list into string.
posstring = ' '.join(str(e) for e in newposlist)
posstring2 = posstring.split(' ')
posstring3 = ', '.join('"{}"'.format(word) for word in posstring2)
# Count number of words as defined in list category
def positive(str):
  counts = dict()
  for word in posstring3:
    if word in counts:
      counts[word] += 1
    else:
      counts[word] = 1
  total = sum (counts.values())
  return total

# Print result; will write to CSV eventually
print ("Positive: ", positive(textresult))

我也是初学者，但我偶然发现了一个可能会有所帮助的过程。在文件中阅读后，在每个空间，选项卡和newline中将文本分开。就您而言，我将保留所有单词小写字母，并在您的分式电话中包括标点符号。将其保存为数组，然后用某种循环将其解析，以获取每个"正面"或其他词的实例数。

看这个，特别是"火车"功能：

https://github.com/g3kappa/adjustable-markov-chains/blob/master/markovchain.py

此外，此链接，一开始就忽略JSON的内容，文章谈论了情感分析：

https://dev.to/rodolfoferro/sentiment-analysis-on-trumpss-tweets-using-python-

同样适用此链接：

http://adilmoujahid.com/posts/2014/07/twitter-analytics/

祝你好运！

我查看了您的代码，并通过我自己的一些样本。根据我认为您想要的东西，我有2个想法。

第一个假设：您想要基本的情感计数吗？进入" Textresult"很棒。然后，您对"积极词典"做了同样的事情 - 我认为这将是完美的动作？然后，您将[Potivelist]转换为一个大句子。您不仅会：1.通过[Textresult]传递" stop_words"列表2.合并两个数据范围[textresult(较少的停止字(和potitivelist]用于常用单词 - 如"内部联接"3.然后基本上执行您的任期频率4.汇总得分然后

要容易得多

第二个假设：您专注于"兴奋"，"快乐"one_answers"乐观"您正在尝试将文本主题隔离为这三个类别？1.再次停在[Textresult]2.下载'nrc'和/或" syuzhet"情感货物词典他们分解了8个情绪群体的情感词因此，如果您只想要8个情感群体中的3个(子集(3.像获得[Potitivelist]一样处理它4.再加入

对不起，这有点悬浮，但是如果我在您想的任何地方，请告诉我，我们可以联系。第二个道歉，我也是新手Python用户，我正在适应上面的python中使用的内容(它也不是微妙的:)(

相关内容

最新更新

热门标签：