使用字典在文本文件中替换带标点符号的单词时，忽略它们

我有一个包含多行的文本文件。我需要用字典把那个文本文件中的单词换成新单词。

我需要排除带有标点符号的单词(例如猫！跳舞？(。

我需要将结果保存在一个新的文本文件中，并保留原始文件的格式。

以下是我到目前为止一直在Python中尝试做的事情。

我想要的结果是一个新的文本文件，包含以下三行：

我爱我的猫！

我喜欢喝水

我真的想跳舞吗？

#--create my custom dictionary 
mydict = {
"cat": "dog",
"milk": "water", #--mistakenly had these switched in original post
"dance": "sit"
}
#--import list of punctuations
from string import punctuation
#--create sample data 
f=open("mytextfile.txt", "w+")
f.write("I love my cat!" + "n" + "I love to drink milk" + "n" + "Do I really want to dance?" + "n") 
f.close()
#--read in sample data
i = open("mytextfile.txt", "r")
txtfile = i.read()
i.close
#print(txtfile)
#--create an empty list for new textfile
new_txtfile = []
#--for loop through each sentence
for line in txtfile:
words = line.split()
for word in words:
new_word = []
if word in punctuation:
new_word = word #--do not replace word if it is punctuated
else:
new_word = mydict.get(word) #--replace word using dictionary
new_line= 'n'.join(new_word)
new_textfile.append(new_line)

print(new_txtfile)
#--save the new textfile 
#--make sure the new lines are preserved
with open("my_new_file.txt", 'w') as file:
for row in new_textfile:
s = " ".join(map(str, row))
file.write(s+'n')

您实际上不需要考虑标点符号，因为例如，'cat！'不等于"cat"。所以，你只需要这个：

INFILE = "mytextfile.txt"
OUTFILE = "my_new_file.txt"
mydict = {
"cat": "dog",
"water": "milk",
"dance": "sit"
}
with open(INFILE, "w") as txt:
print("I love my cat!", file=txt)
print("I love to drink milk", file=txt)
print("Do I really want to dance?", file=txt)
with open(INFILE) as txt, open(OUTFILE, 'w') as newtext:
for line in map(str.strip, txt):
words = line.split()
for i, word in enumerate(words):
if (replacement := mydict.get(word)) is not None:
words[i] = replacement
print(' '.join(words), file=newtext)

输出(文件内容(：

I love my cat!
I love to drink milk
Do I really want to dance?

注意：

当然，考虑到问题中显示的字典，这不会修改文件，因为没有一个关键字(单词(与匹配

您忘记了close方法中的括号。应该是i.close()而不是i.close。最佳实践是使用上下文管理器，而不是打开/关闭方法。阅读更多
您还可以尝试附加到不存在的列表new_textfile，而不是new_txtfile。

Bellow你会发现解决方案与评论：

from string import punctuation # [1]

mydict = {
"cat": "dog",
"milk": "water", # [2]
"dance": "sit"
}
with open("mytextfile.txt", "w+") as f: # [3]
f.write("I love my cat!" + "n" + "I love to drink milk" + "n" + "Do I really want to dance?" + "n")
new_txtfile = []
with open("mytextfile.txt", "r") as file:
for line in file:
words = line.replace("n", "").split()
for index, word in enumerate(words):
if mydict.get(word) and word[-1] not in punctuation: # [4]]
words[index] = mydict.get(word)
new_txtfile.append(" ".join(words))
with open("my_new_file.txt", 'w') as file:
file.write("n".join(new_txtfile))

所有导入都应该在文件的顶部。阅读更多
我在这里替换了value和key。在这种情况下，它更有意义
由上下文管理器替换打开/关闭以避免错误(如缺少括号；(
我检查了单词是否在字典中。如果是，并且最后一个字符不是标点符号。我为这个单词查找索引，在words列表中，我使用字典的结果将单词替换为这个索引

最终文件otput:

I love my cat!
I love to drink water
Do I really want to dance?

您只需删除多余的字符(即n(，然后检查dict中的单词之间是否匹配，如果匹配，则将其映射。如果有标点符号，那么单词将不匹配——所以我们不需要单独的规则。CCD_ 7和CCD_。

#--create my custom dictionary 
my_dict = {
"cat": "dog",
"water": "milk",
"dance": "sit"
}
#--import list of punctuations
from string import punctuation
#--create sample data 
with open("mytextfile.txt", "w+") as f:
f.write("I love my cat!" + "n" + "I love to drink milk" + "n" + 
"Do I really want to dance?" + "n") 
#--read in sample data
with open("mytextfile.txt", "r") as i:
txtfile = i.readlines()
#print(txtfile)
#--create an empty list for new textfile
new_txtfile = []
#--for loop through each sentence
for line in txtfile:
print(line)
words = line.replace("n", "").split(" ")
print(words)
for index, word in enumerate(words):
new_word = []
if word in my_dict.keys():
print(f'replacing {word} with {my_dict[word]}')
words[index] = my_dict[word]
elif word in my_dict.values():
for key, value in my_dict.items():
if word == value:
words[index] = key
words[-1] = words[-1]  + "n"
new_txtfile.append(" ".join(words))

#--save the new textfile 
#--make sure the new lines are preserved
with open("my_new_file.txt", 'w') as file:
file.writelines(new_txtfile)

相关内容

最新更新

热门标签：