在大字符串中进行多次替换的有效方式

我想在一个字符串中迭代地连接多组字符。示例：

mystr = 'T h i s _ i s _ a _ s e n t e n c e'
joins = [('e', 'n'), ('en', 't'), ('i', 's'), ('h', 'is')]
# do multiple replace
for bigram in joins:
mystr = mystr.replace(' '.join(bigram), ''.join(bigram))
print(mystr)
'T his _ is _ a _ s ent en c e'

在第一次迭代中，它将e n加入en，然后将en t加入ent，依此类推。按顺序进行连接很重要，因为除非加入了('e'，'n'(，否则连接('en'，'t'(不起作用。

对于20MB和10k连接的字符串，这需要一段时间。我想优化这个，但我不知道怎么做。我丢弃的一些东西：

我没有像在这个问题中那样使用regex，因为我不知道如何执行re.sub，其中替换是匹配本身，但连接在一起
我也没有像这个问题那样使用str.translate，因为据我所知，translate只能翻译单个字符，而在我的joins中有多个

是否有任何算法、字符串或正则表达式或任何其他函数允许我这样做？非常感谢。

简单的方法是：

mystr = 'T h i s _ i s _ a _ s e n t e n c e'
bigrams = [('e', 'n'), ('en', 't'), ('i', 's'), ('h', 'is')]
for first_part, second_part in bigrams:
mystr = mystr.replace(first_part + ' ' + second_part, first_part + second_part)
print(mystr)

打印：

T his _ is _ a _ s ent en c e

第二种方式：

mystr = 'T h i s _ i s _ a _ s e n t e n c e'
bigrams = [('e', 'n'), ('en', 't'), ('i', 's'), ('h', 'is')]
for bigram in bigrams:
mystr = mystr.replace(' '.join(bigram), ''.join(bigram))
print(mystr)

您必须对这两种方法进行基准测试。

相关内容

最新更新

热门标签：