我有以下句子和字典:
sentence = "I love Obama and David Card, two great people. I live in a boat"
dico = {
'dict1':['is','the','boat','tree'],
'dict2':['apple','blue','red'],
'dict3':['why','Obama','Card','two'],
}
我想匹配句子和给定字典中元素的数量。较重的方法包括执行以下步骤:
classe_sentence = []
text_splited = sentence.split(" ")
dic_keys = dico.keys()
for key_dics in dic_keys:
for values in dico[key_dics]:
if values in text_splited:
classe_sentence.append(key_dics)
from collections import Counter
Counter(classe_sentence)
输出如下:
Counter({'dict1': 1, 'dict3': 2})
然而,由于有两个循环,并且它是原始比较,因此它根本没有效率。我想知道是否有更快的方法来做到这一点。也许用itertools
对象。你知道吗?
提前感谢!
您可以使用set
数据类型进行所有比较,并使用set.intersection
方法获得匹配的数量。
这将提高算法效率,但它只计算每个单词一次,即使它在句子中的几个地方出现。
sentence = set("I love Obama and David Card, two great people. I live in a boat".split())
dico = {
'dict1':{'is','the','boat','tree'},
'dict2':{'apple','blue','red'},
'dict3':{'why','Obama','Card','two'}
}
results = {}
for key, words in dico.items():
results[key] = len(words.intersection(sentence))
假设您想要区分大小写匹配:
from collections import defaultdict
sentence_words = defaultdict(lambda: 0)
for word in sentence.split(' '):
# strip off any trailing or leading punctuation
word = word.strip(''";.,!?')
sentence_words[word] += 1
for name, words in dico.items():
count = 0
for x in words:
count += sentence_words.get(x, 0)
print('Dictionary [%s] has [%d] matches!' % (name, count,))