upload = ['wish you happy birthday','take care','good night']
string = 'happy birthday take'
期望是::{'happy birthday':0,'take':1} ## 0 and 1 are index of list
暴力破解方法:
-
初始生成n*(n-1)个元素的组合列表并存储在二维数组中
-
然后搜索后续的子字符串匹配(或字符串匹配)从先前生成的二维数组中获取所需的字符串。
-
与所需字符串匹配的生成字符串单元的索引给出预期结果。
如果我理解正确的话,这里的关键是为你的单词组合找到最长可能的匹配。因此,我将使用递归来检查在upload:
中可以找到多少个单词。upload = ['wish you happy birthday','take care','good night']
search_string = 'foo happy birthday take care bar good night good'
words = search_string.split()
len_word = len(words)
result = dict()
def findWordInUpload(search_str, upld):
for j, u in enumerate(upld):
if u.find(search_str) > -1:
return j
return -1
next_idx = 0
for i, w in enumerate(words):
if i < next_idx:
continue
var_len = 1
search_substr_new = w
temp_res = -1
while findWordInUpload(search_substr_new, upload) > -1:
search_substr = search_substr_new
temp_res = findWordInUpload(search_substr, upload)
if i + var_len < len_word:
var_len += 1
search_substr_new = " ".join(words[i:i + var_len])
else:
break
if temp_res > -1:
result[search_substr] = temp_res
next_idx = i + var_len - 1
print(result)
使用前面提到的搜索字符串,它输出{'happy birthday': 0, 'take care': 1, 'good night': 2, 'good': 2}
是你要找的吗?
编辑:简化了整个var_len的东西(冗余在第一个版本)
您可以使用itertools
:
import itertools
upload = ['wish you happy birthday','take care','good night']
string = 'happy birthday take'.split(' ')
combine = [] # combinations
for u in range(1, len(string)):
for i in itertools.permutations(string, u):
combine += [" ".join(list(i))]
print(combine)
result = {}
for count, words in enumerate(upload):
for strings in combine:
if strings in words:
result[strings] = count
remove = []
# remove duplicate combinations
# e.g. if 'happy birthday' exist, remove 'happy' and 'birthday'
for items in result:
if len(items.split(" ")) > 1:
for u in items.split(" "):
if u in result and u not in remove:
remove += [u]
print(remove)
for items in remove:
del result[items]
print(result)
输出:
['happy', 'birthday', 'take', 'happy birthday', 'happy take', 'birthday happy', 'birthday take', 'take happy', 'take birthday']
['happy', 'birthday']
{'happy birthday': 0, 'take': 1}