我有数十万个原始句子和一个字典形式的查找表。我需要找到所有句子中的所有键,并将它们替换为相应键的值。
例如,原句子和查找表为
sentences = ['Seoul is a beautiful place', 'I want to visit Paris', 'New York New York',
'Between Paris and New York']
lookup = {'Paris': 'France', 'New York': 'United States', 'Seoul': 'Korea'}
期望的结果如下:
['Korea is a beautiful place', 'I want to visit France', 'United States United States',
'Between France and United States']
我尝试的是下面的代码。
for i in range(len(sentences)):
sentence1 = sentences[I]
for key in lookup.keys():
sentence1 = sentence1.replace(key, lookup[key])
sentences[i] = sentence1
我担心双循环可能会花费太多时间。这是最好的方法吗?是否有更快或更优雅的方法来实现这一点?您可以使用带有回调函数的re.sub
。形成城市键的正则表达式,然后在回调中进行查找。
sentences = ['Seoul is a beautiful place', 'I want to visit Paris', 'New York New York', 'Between Paris and New York']
lookup = {'Paris': 'France', 'New York': 'United States', 'Seoul': 'Korea'}
regex = r'b(?:' + r'|'.join([re.escape(x) for x in lookup.keys()]) + r')b'
output = [re.sub(regex, lambda m: lookup[m.group()], x) for x in sentences]
print(output)
这个打印:
['Korea is a beautiful place',
'I want to visit France',
'United States United States',
'Between France and United States']
你只需要遍历所有句子然后替换每个元素:
sentences_corrected = []
for sentence in sentences:
for key, substitution in lookup.items():
sentence = sentence.replace(key, substitution)
sentences_corrected.append(sentence)