用于查找共同组成目标列表的最小列表的算法



让我们假设我有以下内容:

{
'a': [1, 2, 3],
'b': [1, 5],
'c': [3, 4, 5],
'd': [1, 3, 5],
'e': [4]
}

期望的结果是['a', 'c']因为我想找出哪些数组合并在一起(并删除重复项(形成[1 , 2, 3, 4, 5]

除了将哪些数组合并在一起形成所需结果之外,我还想找到要合并以获得所需结果的最小数组(因为例如['a', 'd', 'e']也给出了所需结果,但['a', 'c']是更好的解决方案(

PS。上面的字典只是一个例子,原来的字典有很多键,每个键都有数百个值。

沿着这条线应该可以工作。你需要根据你是否希望它排序等进行一些调整。这个解决方案假设顺序无关紧要。

打印从最小到最大的解决方案:

import itertools
input = {
'a': [1, 2, 3],
'b': [1, 5],
'c': [3, 4, 5],
'd': [1, 3, 5],
'e': [4]
}
solution = [1,2,3,4,5]
for i in range(1,len(input.keys())):
for combination in itertools.combinations(input, i):
pot = list(set(itertools.chain.from_iterable(input[k] for k in combination)))
if pot == solution:
print("This is a solution:", combination)

这可能不是很优雅,但它将比通过所有组合强制执行更有效。它应该在二次时间内执行,而不是根据条目的数量和数据的传播而变成指数的组合。

以下函数可找到一个简短的解决方案。很可能(但不一定(是最短的。

from collections import Counter
def findMerge(data,target):
# identify candidate items (i.e. subsets of the target list)
target     = set(target)
candidates = {c:group for c,group in data.items() if target.issuperset(group)}

# compute the overlap between candidates and check coverage
counts = Counter( n for group in candidates.values() for n in group )
if any(t not in counts for t in target): return [] 
# identify candidates that are mandatory and the base set they form
# (i.e. candidates that are the only ones with a given value)
mandatory = { c:group for c,group in candidates.items()
if any(counts[n]==1 for n in group) }
baseSet   = set().union(*mandatory.values())
remaining = target - baseSet
if not remaining: return list(mandatory)

# identify potentially redundant candidates for remaining values
redundant = [ (c,remaining.intersection(group))
for c,group in candidates.items() if c not in mandatory ]
# remove redundant candidates (smallest first)
# note: using combinations only on redundant keys may be affordable here
#       and could be used to return all solutions or ensure shortest
redundant = sorted(redundant,key=lambda cg:len(cg[1]))
for r,rGroup in redundant:
if all(counts[n]>1 for n in rGroup):
counts.subtract(rGroup)
del candidates[r]

return list(candidates)

小样本输出:

data = {
'a': [1, 2, 3],
'b': [1, 5],
'c': [3, 4, 5],
'd': [1, 3, 5],
'e': [4]
}
print(findMerge(data,[1,2,3,4,5])) # ['a', 'c']

对于较大的样本,与组合相比的时间差异将是显著的:

data = {
'a': [1, 2, 3],
'b': [1, 5, 0],
'c': [3, 2, 5],
'd': [1, 3, 5],
'e': [4],
'f': [1, 2, 5],
'g': [1, 5, 8],
'h': [3, 4, 7],
'i': [1, 6, 5],
'j': [4],
'k': [9],
'l': [1, 3, 5],
'm': [4],
'n': [1, 2, 5],
'o': [1, 5, 8],
'p': [3, 4, 7],
'q': [1, 5, 8],
'r': [3, 4, 7],
's': [1, 6, 5],
't': [4],
'u': [9],
}
target = [0,1,2,3,4,5,6,7,8,9]
print(findMerge(data,target)) # ['b', 'c', 'q', 'r', 's', 'u']

最新更新