我有一个包含多行的文本文件。我需要用字典把那个文本文件中的单词换成新单词。
我需要排除带有标点符号的单词(例如猫!跳舞?(。
我需要将结果保存在一个新的文本文件中,并保留原始文件的格式。
以下是我到目前为止一直在Python中尝试做的事情。
我想要的结果是一个新的文本文件,包含以下三行:
我爱我的猫!
我喜欢喝水
我真的想跳舞吗?
#--create my custom dictionary
mydict = {
"cat": "dog",
"milk": "water", #--mistakenly had these switched in original post
"dance": "sit"
}
#--import list of punctuations
from string import punctuation
#--create sample data
f=open("mytextfile.txt", "w+")
f.write("I love my cat!" + "n" + "I love to drink milk" + "n" + "Do I really want to dance?" + "n")
f.close()
#--read in sample data
i = open("mytextfile.txt", "r")
txtfile = i.read()
i.close
#print(txtfile)
#--create an empty list for new textfile
new_txtfile = []
#--for loop through each sentence
for line in txtfile:
words = line.split()
for word in words:
new_word = []
if word in punctuation:
new_word = word #--do not replace word if it is punctuated
else:
new_word = mydict.get(word) #--replace word using dictionary
new_line= 'n'.join(new_word)
new_textfile.append(new_line)
print(new_txtfile)
#--save the new textfile
#--make sure the new lines are preserved
with open("my_new_file.txt", 'w') as file:
for row in new_textfile:
s = " ".join(map(str, row))
file.write(s+'n')
您实际上不需要考虑标点符号,因为例如,'cat!'不等于"cat"。所以,你只需要这个:
INFILE = "mytextfile.txt"
OUTFILE = "my_new_file.txt"
mydict = {
"cat": "dog",
"water": "milk",
"dance": "sit"
}
with open(INFILE, "w") as txt:
print("I love my cat!", file=txt)
print("I love to drink milk", file=txt)
print("Do I really want to dance?", file=txt)
with open(INFILE) as txt, open(OUTFILE, 'w') as newtext:
for line in map(str.strip, txt):
words = line.split()
for i, word in enumerate(words):
if (replacement := mydict.get(word)) is not None:
words[i] = replacement
print(' '.join(words), file=newtext)
输出(文件内容(:
I love my cat!
I love to drink milk
Do I really want to dance?
注意:
当然,考虑到问题中显示的字典,这不会修改文件,因为没有一个关键字(单词(与匹配
您忘记了close方法中的括号。应该是i.close()
而不是i.close
。最佳实践是使用上下文管理器,而不是打开/关闭方法。阅读更多
您还可以尝试附加到不存在的列表new_textfile
,而不是new_txtfile
。
Bellow你会发现解决方案与评论:
from string import punctuation # [1]
mydict = {
"cat": "dog",
"milk": "water", # [2]
"dance": "sit"
}
with open("mytextfile.txt", "w+") as f: # [3]
f.write("I love my cat!" + "n" + "I love to drink milk" + "n" + "Do I really want to dance?" + "n")
new_txtfile = []
with open("mytextfile.txt", "r") as file:
for line in file:
words = line.replace("n", "").split()
for index, word in enumerate(words):
if mydict.get(word) and word[-1] not in punctuation: # [4]]
words[index] = mydict.get(word)
new_txtfile.append(" ".join(words))
with open("my_new_file.txt", 'w') as file:
file.write("n".join(new_txtfile))
- 所有导入都应该在文件的顶部。阅读更多
- 我在这里替换了value和key。在这种情况下,它更有意义
- 由上下文管理器替换打开/关闭以避免错误(如缺少括号;(
- 我检查了单词是否在字典中。如果是,并且最后一个字符不是标点符号。我为这个单词查找索引,在
words
列表中,我使用字典的结果将单词替换为这个索引
最终文件otput:
I love my cat!
I love to drink water
Do I really want to dance?
您只需删除多余的字符(即n
(,然后检查dict中的单词之间是否匹配,如果匹配,则将其映射。如果有标点符号,那么单词将不匹配——所以我们不需要单独的规则。CCD_ 7和CCD_。
#--create my custom dictionary
my_dict = {
"cat": "dog",
"water": "milk",
"dance": "sit"
}
#--import list of punctuations
from string import punctuation
#--create sample data
with open("mytextfile.txt", "w+") as f:
f.write("I love my cat!" + "n" + "I love to drink milk" + "n" +
"Do I really want to dance?" + "n")
#--read in sample data
with open("mytextfile.txt", "r") as i:
txtfile = i.readlines()
#print(txtfile)
#--create an empty list for new textfile
new_txtfile = []
#--for loop through each sentence
for line in txtfile:
print(line)
words = line.replace("n", "").split(" ")
print(words)
for index, word in enumerate(words):
new_word = []
if word in my_dict.keys():
print(f'replacing {word} with {my_dict[word]}')
words[index] = my_dict[word]
elif word in my_dict.values():
for key, value in my_dict.items():
if word == value:
words[index] = key
words[-1] = words[-1] + "n"
new_txtfile.append(" ".join(words))
#--save the new textfile
#--make sure the new lines are preserved
with open("my_new_file.txt", 'w') as file:
file.writelines(new_txtfile)