用另一个元组中的适当对替换子字符串元组中的子字符串的Python方法



我正在寻找一种快速解决方案,其中代码应该在一个很长的句子列表上循环(每行(,并用另一个元组的匹配项替换一个元组(或列表(中的子字符串。(伪(代码应该是这样的:

# an example of one line sentence:
a = "I was thinking to begin this journey."
# tuples: targets and replacements
verbs = ("to begin", "I begin", "you begin", "we begin")
verbs_fixed = ("toXXbegin", "IXXbegin", "youXXbegin", "weXXbegin")
with open(<INPUT FILE NAME>) as inf:
for line in inf:
line = ????

考虑到句子列表很长,我希望能找到最快的解决方案。

我在想re.compile,然后是一些列表理解。有更好的方法吗?

如果压缩两个列表,则只有简单的替换:

for original_value, target_value in zip(verbs, verbs_fixed):
line = line.replace(original_value, target_value)

使用正则表达式

def regex_mapping(sentence):
" Function to do the replacements based upon mapping of verbs to verbs fixed"
return  regex_pattern.sub(lambda m: mapping[m.group(0)], sentence)

# Setup code
verbs = ("to begin", "I begin", "you begin", "we begin")
verbs_fixed = ("toXXbegin", "IXXbegin", "youXXbegin", "weXXbegin")
# Dictionary mapping
mapping = {x:y for x, y in zip(verbs, verbs_fixed)}
# Regex pattern (pre-compile for speed)
regex_pattern = re.compile('|'.join(verbs))

用法

a = "I was thinking to begin this journey."
print(regex_mapping(a))

附录

如果你的关键词列表有数百个,你应该研究这个基于构建Trie词典的解决方案。

最新更新