如何在列表中找到最长的常用单词子串



我有一个单词列表:

list1 = ['technology','technician','technical','technicality']

我想检查每个单词中重复了哪个短语。在这种情况下,它是"技术"。我尝试过将所有字符转换为ascii值,但我被困在那里,因为我想不出任何逻辑。有人能帮我做这个吗?

这通常被称为最长公共子字符串/子序列问题。


一种非常基本(但缓慢(的策略:

longest_substring = ""
curr_substring = ""
# Loop over a particular word (ideally, shortest).
for start_idx in range(shortest_word):
# Select a substring from that word.
for length in range(1, len(shortest_word) - start_idx):
curr_substring = shortest_word[start_idx : start_idx + length]
# Check if substring is present in all words,
# and exit loop or update depending on outcome.
if "curr_substring not in all words":
break
if "new string is longer":
longest_substring = curr_substring

在第一个单词上迭代,如果集合检查的所有单词中只有一个前缀,则增加前缀长度,当发现前缀差异时,返回最后结果

list1 = ['technology', 'technician', 'technical', 'technicality']

def common_prefix(li):
s = set()
word = li[0]
while(len(s) < 2):
old_s = s
for i in range(1, len(word)):
s.add(word[:i])
return old_s.pop()

print(common_prefix(list1))

输出:技术

查找最短的单词。遍历第一个单词中越来越小的块,从长度等于最短单词的块开始,检查每个块是否包含在所有的其他字符串中。如果是,则返回该子字符串。

list1 = ['technology', 'technician', 'technical', 'technicality']
def shortest_common_substring(lst):
shortest_len = min(map(len, lst))
shortest_word = next((w for w in lst if len(w) == shortest_len), None)

for i in range(shortest_len, 1, -1):
for j in range(0, shortest_len - i):
substr = lst[0][j:i]

if all(substr in w for w in lst[1:]):
return substr

为了好玩,让我们用生成器表达式替换这个循环,然后取它给我们的第一个东西(或None(。

def shortest_common_substring(lst):
shortest_len = min(map(len, lst))
shortest_word = next((w for w in lst if len(w) == shortest_len), 0)

return next((lst[0][j:i] for i in range(shortest_len, 1, -1)
for j in range(0, shortest_len - i)
if all(lst[0][j:i] in w for w in lst[1:])),
None)

最新更新