匹配 python 字典中的键值对会产生荒谬的结果



我创建了一个用于扩展首字母缩略词的python字典。例如,字典具有以下条目:

Acronym_dict = {
"cont":"continued"
}

字典查找的代码如下所示:

def code_dictionary(text, dict1=Acronym_dict):
for word in text.split():
for key in Acronym_dict:
if key in text:
text = text.replace(key, Acronym_dict[key],1)
return text

问题是代码将包含子字符串"cont"的每个字符串替换为 continue。例如,大陆正在被字典中的"续"所取代。这是我不想要的。我知道我可以在字典中的每个键之前和之后添加空格,但由于字典很长,这将非常耗时。还有其他选择吗??请指教。

一些解决方案:

  1. 使用正则表达式通过b(分词符(查找孤立的单词:
import re
Acronym_dict = {
r'bcontb':'continued'
}
def code_dictionary(text, dict1=Acronym_dict):
for key,value in dict1.items():
text = re.sub(key,value,text)
return text
s = 'to be cont in continental'
print(code_dictionary(s))
to be continued in continental
  1. 如果您不想更改字典,请构建正则表达式。 注意re.escape确保密钥不包含任何由正则表达式处理不同的内容:
import re
Acronym_dict = {
'cont':'continued'
}
def code_dictionary(text, dict1=Acronym_dict):
for key,value in dict1.items():
regex = r'b' + re.escape(key) + r'b'
text = re.sub(regex,value,text)
return text
s = 'to be cont in continental'
print(code_dictionary(s))
to be continued in continental
  1. 最流行的版本,在一次调用中替换所有首字母缩略词re.sub
import re
Acronym_dict = {'a':'aaa',
'b':'bbb',
'c':'ccc',
'd':'ddd'}

def code_dictionary(text, dict1=Acronym_dict):
# ORs all the keys together, longest match first.
# E.g. generates r'b(abc|ab|b)b'.
# Captures the value it matches.
regex = r'b(' + '|'.join([re.escape(key)
for key in
sorted(dict1,key=len,reverse=True)]) + r')b'
# Replace everything in the text in one regex.
# Uses a callback to look up the value of the acronym.
return re.sub(regex,lambda m: dict1[m.group(1)],text)
s = 'a abcd b abcd c abcd d'
print(code_dictionary(s))
aaa abcd bbb abcd ccc abcd ddd

试试这个:

import re
Acronym_dict = {
"cont":"continued"
}
def code_dictionary(text, dict1=Acronym_dict):
# for word in text.split():
for key in Acronym_dict:
text = re.sub(r'b' + key + r'b', Acronym_dict[key], text)
return text

if __name__ == "__main__":
text = '''
abcd cont ajflkasdfla cont.
cont continental afakjsklfjakl jfalfj asl cont fjdlaskfjal fjal
cont
'''
print(text)
print('--------------------')
print(code_dictionary(text))

最新更新