我有一个文件夹,其中包含一组文件,每个文件都包含一个文本字符串、句点和逗号。我想用空格替换句点和逗号,然后打印所有文件。
我使用了Replace,但这个错误出现在我面前:
attributeError: 'list' object has no attribute 'replace'
我该怎么解决?
codes.py:
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import PorterStemmer
import os
# 1-stop word processing
stop_words_list = stopwords.words('english')
additional_stopwords = []
with open("C:/Users/Super/Desktop/IR/homework/Lab4/IR Homework/stop words.txt", 'r') as file:
for word in file:
word = word.split('n')
additional_stopwords.append(word[0])
stop_words_list += additional_stopwords
# --------------
# 2-tokenize and stemming
dir_path = 'C:/Users/Super/Desktop/IR/homework/Lab4/corpus/corpus/'
save_dir = "C:/Users/Super/Desktop/IR/homework/Files_Without_SW/"
for document in os.listdir(dir_path):
with open(dir_path + document, "r") as reader:
save_file = open(save_dir + document, 'w')
text = reader.read()
tokens_without_sw = [word for word in text if (word not in stop_words_list)]
cleaned = tokens_without_sw.replace(',', ' ')
cleaned = cleaned.replace('.', ' ')
ps = PorterStemmer()
text_tokens = word_tokenize(cleaned)
save_file.writelines(["%s " % item for item in text_tokens])
# cleaned = (" ").join(tokens_without_sw)
print(document, ':', tokens_without_sw)
with open("../Files/stemmer_words.txt", "a+") as stemFile:
for stemWord in tokens_without_sw:
stemFile.write(stemWord)
stemFile.write(":")
stemFile.write(ps.stem(stemWord))
stemFile.write('n')
您似乎正在尝试使用字符串函数"替换";在列表中。如果你打算在列表的所有成员上使用它,你可以这样做:
cleaned = [item.replace(',', ' ') for item in tokens_without_sw]
cleaned = [item.replace('.', ' ') for item in cleaned]
您甚至可以向前迈进一步,同时进行两次替换,而不是进行两次列表综合。
cleaned = [item.replace(',', ' ').replace('.', ' ') for item in tokens_without_sw]
安德里亚斯在评论中提到了另一种没有清单理解的方式。