如何从列表中的一个元素检测字符串的某些部分,从列表中的另一个元素检测字符串的其他部分?


upload = ['wish you happy birthday','take care','good night']
string = 'happy birthday take'

期望是::{'happy birthday':0,'take':1} ## 0 and 1 are index of list

暴力破解方法:

  1. 初始生成n*(n-1)个元素的组合列表并存储在二维数组中

  2. 然后搜索后续的子字符串匹配(或字符串匹配)从先前生成的二维数组中获取所需的字符串。

  3. 与所需字符串匹配的生成字符串单元的索引给出预期结果。

如果我理解正确的话,这里的关键是为你的单词组合找到最长可能的匹配。因此,我将使用递归来检查在upload:

中可以找到多少个单词。
upload = ['wish you happy birthday','take care','good night']
search_string = 'foo happy birthday take care bar good night good'
words = search_string.split()
len_word = len(words)
result = dict()
def findWordInUpload(search_str, upld):
for j, u in enumerate(upld):
if u.find(search_str) > -1:
return j
return -1
next_idx = 0
for i, w in enumerate(words):
if i < next_idx:
continue
var_len = 1
search_substr_new = w
temp_res = -1
while findWordInUpload(search_substr_new, upload) > -1:
search_substr = search_substr_new      
temp_res = findWordInUpload(search_substr, upload)
if i + var_len < len_word:
var_len += 1
search_substr_new = " ".join(words[i:i + var_len])
else:
break

if temp_res > -1:
result[search_substr] = temp_res
next_idx = i + var_len - 1
print(result)

使用前面提到的搜索字符串,它输出{'happy birthday': 0, 'take care': 1, 'good night': 2, 'good': 2}是你要找的吗?

编辑:简化了整个var_len的东西(冗余在第一个版本)

您可以使用itertools:

import itertools
upload = ['wish you happy birthday','take care','good night']
string = 'happy birthday take'.split(' ')
combine = []    # combinations
for u in range(1, len(string)):
for i in itertools.permutations(string, u):
combine += [" ".join(list(i))]
print(combine)
result = {}
for count, words in enumerate(upload):
for strings in combine:
if strings in words:
result[strings] = count
remove = []
# remove duplicate combinations
# e.g. if 'happy birthday' exist, remove 'happy' and 'birthday'
for items in result:
if len(items.split(" ")) > 1:
for u in items.split(" "):
if u in result and u not in remove:
remove += [u]

print(remove)
for items in remove:
del result[items]
print(result)

输出:

['happy', 'birthday', 'take', 'happy birthday', 'happy take', 'birthday happy', 'birthday take', 'take happy', 'take birthday']
['happy', 'birthday']
{'happy birthday': 0, 'take': 1}

最新更新