使用字典在文本文件中替换带标点符号的单词时,忽略它们



我有一个包含多行的文本文件。我需要用字典把那个文本文件中的单词换成新单词。

我需要排除带有标点符号的单词(例如猫!跳舞?(。

我需要将结果保存在一个新的文本文件中,并保留原始文件的格式。

以下是我到目前为止一直在Python中尝试做的事情。

我想要的结果是一个新的文本文件,包含以下三行:

我爱我的猫!

我喜欢喝水

我真的想跳舞吗?

#--create my custom dictionary 
mydict = {
"cat": "dog",
"milk": "water", #--mistakenly had these switched in original post
"dance": "sit"
}
#--import list of punctuations
from string import punctuation
#--create sample data 
f=open("mytextfile.txt", "w+")
f.write("I love my cat!" + "n" + "I love to drink milk" + "n" + "Do I really want to dance?" + "n") 
f.close()
#--read in sample data
i = open("mytextfile.txt", "r")
txtfile = i.read()
i.close
#print(txtfile)
#--create an empty list for new textfile
new_txtfile = []
#--for loop through each sentence
for line in txtfile:
words = line.split()
for word in words:
new_word = []
if word in punctuation:
new_word = word #--do not replace word if it is punctuated
else:
new_word = mydict.get(word) #--replace word using dictionary
new_line= 'n'.join(new_word)
new_textfile.append(new_line)

print(new_txtfile)
#--save the new textfile 
#--make sure the new lines are preserved
with open("my_new_file.txt", 'w') as file:
for row in new_textfile:
s = " ".join(map(str, row))
file.write(s+'n')

您实际上不需要考虑标点符号,因为例如,'cat!'不等于"cat"。所以,你只需要这个:

INFILE = "mytextfile.txt"
OUTFILE = "my_new_file.txt"
mydict = {
"cat": "dog",
"water": "milk",
"dance": "sit"
}
with open(INFILE, "w") as txt:
print("I love my cat!", file=txt)
print("I love to drink milk", file=txt)
print("Do I really want to dance?", file=txt)
with open(INFILE) as txt, open(OUTFILE, 'w') as newtext:
for line in map(str.strip, txt):
words = line.split()
for i, word in enumerate(words):
if (replacement := mydict.get(word)) is not None:
words[i] = replacement
print(' '.join(words), file=newtext)

输出(文件内容(:

I love my cat!
I love to drink milk
Do I really want to dance?

注意:

当然,考虑到问题中显示的字典,这不会修改文件,因为没有一个关键字(单词(与匹配

您忘记了close方法中的括号。应该是i.close()而不是i.close。最佳实践是使用上下文管理器,而不是打开/关闭方法。阅读更多
您还可以尝试附加到不存在的列表new_textfile,而不是new_txtfile

Bellow你会发现解决方案与评论:

from string import punctuation # [1]

mydict = {
"cat": "dog",
"milk": "water", # [2]
"dance": "sit"
}
with open("mytextfile.txt", "w+") as f: # [3]
f.write("I love my cat!" + "n" + "I love to drink milk" + "n" + "Do I really want to dance?" + "n")
new_txtfile = []
with open("mytextfile.txt", "r") as file:
for line in file:
words = line.replace("n", "").split()
for index, word in enumerate(words):
if mydict.get(word) and word[-1] not in punctuation: # [4]]
words[index] = mydict.get(word)
new_txtfile.append(" ".join(words))
with open("my_new_file.txt", 'w') as file:
file.write("n".join(new_txtfile))
  1. 所有导入都应该在文件的顶部。阅读更多
  2. 我在这里替换了value和key。在这种情况下,它更有意义
  3. 由上下文管理器替换打开/关闭以避免错误(如缺少括号;(
  4. 我检查了单词是否在字典中。如果是,并且最后一个字符不是标点符号。我为这个单词查找索引,在words列表中,我使用字典的结果将单词替换为这个索引

最终文件otput:

I love my cat!
I love to drink water
Do I really want to dance?

您只需删除多余的字符(即n(,然后检查dict中的单词之间是否匹配,如果匹配,则将其映射。如果有标点符号,那么单词将不匹配——所以我们不需要单独的规则。CCD_ 7和CCD_。

#--create my custom dictionary 
my_dict = {
"cat": "dog",
"water": "milk",
"dance": "sit"
}
#--import list of punctuations
from string import punctuation
#--create sample data 
with open("mytextfile.txt", "w+") as f:
f.write("I love my cat!" + "n" + "I love to drink milk" + "n" + 
"Do I really want to dance?" + "n") 
#--read in sample data
with open("mytextfile.txt", "r") as i:
txtfile = i.readlines()
#print(txtfile)
#--create an empty list for new textfile
new_txtfile = []
#--for loop through each sentence
for line in txtfile:
print(line)
words = line.replace("n", "").split(" ")
print(words)
for index, word in enumerate(words):
new_word = []
if word in my_dict.keys():
print(f'replacing {word} with {my_dict[word]}')
words[index] = my_dict[word]
elif word in my_dict.values():
for key, value in my_dict.items():
if word == value:
words[index] = key
words[-1] = words[-1]  + "n"
new_txtfile.append(" ".join(words))

#--save the new textfile 
#--make sure the new lines are preserved
with open("my_new_file.txt", 'w') as file:
file.writelines(new_txtfile)

最新更新