将文件夹中每个文件中的句点和逗号替换为空格



我有一个文件夹,其中包含一组文件,每个文件都包含一个文本字符串、句点和逗号。我想用空格替换句点和逗号,然后打印所有文件。

我使用了Replace,但这个错误出现在我面前:

attributeError: 'list' object has no attribute 'replace'

我该怎么解决?

codes.py:

from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import PorterStemmer
import os
# 1-stop word processing
stop_words_list = stopwords.words('english')
additional_stopwords = []
with open("C:/Users/Super/Desktop/IR/homework/Lab4/IR Homework/stop words.txt", 'r') as file:
for word in file:
word = word.split('n')
additional_stopwords.append(word[0])
stop_words_list += additional_stopwords
# --------------
# 2-tokenize and stemming
dir_path = 'C:/Users/Super/Desktop/IR/homework/Lab4/corpus/corpus/'
save_dir = "C:/Users/Super/Desktop/IR/homework/Files_Without_SW/"
for document in os.listdir(dir_path):
with open(dir_path + document, "r") as reader:
save_file = open(save_dir + document, 'w')
text = reader.read()
tokens_without_sw = [word for word in text if (word not in stop_words_list)]

cleaned = tokens_without_sw.replace(',', ' ')
cleaned = cleaned.replace('.', ' ')
ps = PorterStemmer()
text_tokens = word_tokenize(cleaned)
save_file.writelines(["%s " % item for item in text_tokens])
# cleaned = (" ").join(tokens_without_sw)
print(document, ':', tokens_without_sw)
with open("../Files/stemmer_words.txt", "a+") as stemFile:
for stemWord in tokens_without_sw:
stemFile.write(stemWord)
stemFile.write(":")
stemFile.write(ps.stem(stemWord))
stemFile.write('n')

您似乎正在尝试使用字符串函数"替换";在列表中。如果你打算在列表的所有成员上使用它,你可以这样做:

cleaned = [item.replace(',', ' ') for item in tokens_without_sw]
cleaned = [item.replace('.', ' ') for item in cleaned]

您甚至可以向前迈进一步,同时进行两次替换,而不是进行两次列表综合。

cleaned = [item.replace(',', ' ').replace('.', ' ') for item in tokens_without_sw]

安德里亚斯在评论中提到了另一种没有清单理解的方式。

相关内容

  • 没有找到相关文章

最新更新