如何在字符串替换复合单词使用字典吗?



我有一个字典,它的键:值对对应于复合单词和我想在文本中替换它们的表达式。例如:

terms_dict = {'digi conso': 'digi conso', 'digi': 'digi conso', 'digiconso': 'digi conso', '3xcb': '3xcb', '3x cb': '3xcb', 'legal entity identifier': 'legal entity identifier'}

我的目标是创建一个函数replace_terms(text, dict),它以文本和字典作为参数,并在替换复合词后返回文本。

例如:

test_text = "i want a digi conso loan for digiconso" 
print(replace_terms(test_text, terms_dict))

应该返回:

"i want a digi conso loan for digi conso"

我尝试过使用。replace(),但由于某些原因它不能正常工作,可能是因为要替换的术语由多个单词组成。

我也试过这个:

def replace_terms(text, terms_dict):
if len(terms_dict) > 0:
words_in = [k for k in terms_dict.keys() if k in text]  # ex: words_in = [digi conso, digi, digiconso]
if len(words_in) > 0:
for w in words_in:
pattern = r"b" + w + r"b"
text = re.sub(pattern, terms_dict[w], text)
return text

但是当应用到我的文本时,这个函数返回:"我想要一个digi conso">,单词conso得到加倍,我可以看到为什么(因为words_in列表是通过遍历字典键创建的,当一个键追加到列表时,文本不会改变)。

是否有有效的方法来做到这一点?

非常感谢!

一种快速而不可靠的方法:

from typing import Dict, List, Tuple

def replace_terms(text: str, terms: Dict[str, str]) -> str:
replacement_list: List[Tuple[int, str]] = []
check = True
for term in terms:
if term in text:
for replacement in replacement_list:
if replacement[0] == text.index(term):
if len(term) > len(replacement[1]):
replacement_list.remove(replacement)
else:
check = False
if check:
replacement_list.append((text.index(term), term))
else:
check = True
for replacement in replacement_list:
text = text.replace(replacement[1], terms[replacement[1]], 1)
return text

用法:

terms_dict = {
"digi conso": "digi conso",
"digi": "digi conso",
"digiconso": "digi conso",
"3xcb": "3xcb",
"3x cb": "3xcb",
"legal entity identifier": "legal entity identifier"
}
test_text = "i want a digi conso loan for digiconso"
print(replace_terms(test_text, terms_dict))
结果:

i want a digi conso loan for digi conso

应该可以了。


terms_dict = { 'digiconso': 'digi conso', '3xcb': '3xcb', '3x cb': '3xcb', 'legal entity identifier': 'legal entity identifier'}
test_text = "i want a digi conso loan for digiconso" 
def replace_terms(txt, dct):
dct = tuple(dct.items())
for x, y in dct:
txt = txt.replace(x, y, 1)
return txt
print(replace_terms(test_text, terms_dict))

首先获得字典对,并以更简单的形式(元组)获得它们。然后我再替换!

输出:

i want a digi conso loan for digi conso

你有很多你不需要的额外的替换标识符。我也让它只替换1,但你可以改变它。