让我们假设我有以下内容:
{
'a': [1, 2, 3],
'b': [1, 5],
'c': [3, 4, 5],
'd': [1, 3, 5],
'e': [4]
}
期望的结果是['a', 'c']
因为我想找出哪些数组合并在一起(并删除重复项(形成[1 , 2, 3, 4, 5]
除了将哪些数组合并在一起形成所需结果之外,我还想找到要合并以获得所需结果的最小数组(因为例如['a', 'd', 'e']
也给出了所需结果,但['a', 'c']
是更好的解决方案(
PS。上面的字典只是一个例子,原来的字典有很多键,每个键都有数百个值。
沿着这条线应该可以工作。你需要根据你是否希望它排序等进行一些调整。这个解决方案假设顺序无关紧要。
打印从最小到最大的解决方案:
import itertools
input = {
'a': [1, 2, 3],
'b': [1, 5],
'c': [3, 4, 5],
'd': [1, 3, 5],
'e': [4]
}
solution = [1,2,3,4,5]
for i in range(1,len(input.keys())):
for combination in itertools.combinations(input, i):
pot = list(set(itertools.chain.from_iterable(input[k] for k in combination)))
if pot == solution:
print("This is a solution:", combination)
这可能不是很优雅,但它将比通过所有组合强制执行更有效。它应该在二次时间内执行,而不是根据条目的数量和数据的传播而变成指数的组合。
以下函数可找到一个简短的解决方案。很可能(但不一定(是最短的。
from collections import Counter
def findMerge(data,target):
# identify candidate items (i.e. subsets of the target list)
target = set(target)
candidates = {c:group for c,group in data.items() if target.issuperset(group)}
# compute the overlap between candidates and check coverage
counts = Counter( n for group in candidates.values() for n in group )
if any(t not in counts for t in target): return []
# identify candidates that are mandatory and the base set they form
# (i.e. candidates that are the only ones with a given value)
mandatory = { c:group for c,group in candidates.items()
if any(counts[n]==1 for n in group) }
baseSet = set().union(*mandatory.values())
remaining = target - baseSet
if not remaining: return list(mandatory)
# identify potentially redundant candidates for remaining values
redundant = [ (c,remaining.intersection(group))
for c,group in candidates.items() if c not in mandatory ]
# remove redundant candidates (smallest first)
# note: using combinations only on redundant keys may be affordable here
# and could be used to return all solutions or ensure shortest
redundant = sorted(redundant,key=lambda cg:len(cg[1]))
for r,rGroup in redundant:
if all(counts[n]>1 for n in rGroup):
counts.subtract(rGroup)
del candidates[r]
return list(candidates)
小样本输出:
data = {
'a': [1, 2, 3],
'b': [1, 5],
'c': [3, 4, 5],
'd': [1, 3, 5],
'e': [4]
}
print(findMerge(data,[1,2,3,4,5])) # ['a', 'c']
对于较大的样本,与组合相比的时间差异将是显著的:
data = {
'a': [1, 2, 3],
'b': [1, 5, 0],
'c': [3, 2, 5],
'd': [1, 3, 5],
'e': [4],
'f': [1, 2, 5],
'g': [1, 5, 8],
'h': [3, 4, 7],
'i': [1, 6, 5],
'j': [4],
'k': [9],
'l': [1, 3, 5],
'm': [4],
'n': [1, 2, 5],
'o': [1, 5, 8],
'p': [3, 4, 7],
'q': [1, 5, 8],
'r': [3, 4, 7],
's': [1, 6, 5],
't': [4],
'u': [9],
}
target = [0,1,2,3,4,5,6,7,8,9]
print(findMerge(data,target)) # ['b', 'c', 'q', 'r', 's', 'u']