使用字典查找句子中的字符串并将其替换为句子列表中的另一个字符串



我有数十万个原始句子和一个字典形式的查找表。我需要找到所有句子中的所有键,并将它们替换为相应键的值。

例如,原句子和查找表为

sentences = ['Seoul is a beautiful place', 'I want to visit Paris', 'New York New York',
'Between Paris and New York'] 
lookup = {'Paris': 'France', 'New York': 'United States', 'Seoul': 'Korea'} 

期望的结果如下:

['Korea is a beautiful place', 'I want to visit France', 'United States United States', 
'Between France and United States']

我尝试的是下面的代码。

for i in range(len(sentences)):
sentence1 = sentences[I]
for key in lookup.keys():
sentence1 = sentence1.replace(key, lookup[key])
sentences[i] = sentence1
我担心双循环可能会花费太多时间。这是最好的方法吗?是否有更快或更优雅的方法来实现这一点?

您可以使用带有回调函数的re.sub。形成城市键的正则表达式,然后在回调中进行查找。

sentences = ['Seoul is a beautiful place', 'I want to visit Paris', 'New York New York', 'Between Paris and New York']
lookup = {'Paris': 'France', 'New York': 'United States', 'Seoul': 'Korea'}
regex = r'b(?:' + r'|'.join([re.escape(x) for x in lookup.keys()]) + r')b'
output = [re.sub(regex, lambda m: lookup[m.group()], x) for x in sentences]
print(output)

这个打印:

['Korea is a beautiful place',
'I want to visit France',
'United States United States',
'Between France and United States']

你只需要遍历所有句子然后替换每个元素:

sentences_corrected = []
for sentence in sentences:
for key, substitution in lookup.items():
sentence = sentence.replace(key, substitution)
sentences_corrected.append(sentence) 

相关内容

  • 没有找到相关文章

最新更新