dict中元素的Python组合



我有一堆像下面这样的字典(有些可能相当大):

V = {
0: [823, 832, 1151, 1752, 2548, 3036],
823: [832, 1151, 1752, 2548, 3036, 3551],
832: [1151, 1752, 2548, 3036, 3551],
1151: [1752, 2548, 3036, 3551],
1752: [2548, 3036, 3551, 4622],
2548: [3036, 3551, 4622],
3036: [3551, 4622, 5936, 6440],
3551: [4622, 5936, 6440],
4622: [5936, 6440, 9001],
5936: [6440, 9001],
6440: [9001],
9001: []
}

字典表示帮助派生所有可能路径的基本规则(它们是路由)。路径是上述整型数的序列。

字典值列表中的每个值也都是键。

我如何确定所有可能的路径,知道例如:

[3036, 4622, 9001]是有效路径,

但是[3036,9001]不是,原因是3036后面必须跟V[3036]中的一个元素。每个组合必须包含一个兼容的序列,每个序列必须以9001结尾,这就是说,要得到9001,必须经过6440,或5936或4622。

每个序列也必须从V[0]中的一个点开始。

我试了两件事:

  1. 我第一次使用itertools。产品派生出所有的路径,然后过滤掉无效的,但对于大多数字典,itertools的数量。产品组合太大了。
  2. 蒙特卡罗模拟,但循环的数量是数百万,不能保证捕获所有路径。

看起来像一个简单的DFS。由于图看起来是有向的(每个节点都有比节点数目大的后继节点),您甚至不需要小心避免循环。

>>> def dfs(graph, start, end):
...     if start == end:
...         return [[end]]
...     return [[start] + result for s in graph[start] for result in dfs(graph, s, end)]
...
>>> dfs(V, 0, 9001)
[[0, 823, 832, 1151, 1752, 2548, 3036, 3551, 4622, 5936, 6440, 9001], [0, 823, 832, 1151, 1752, 2548, 3036, 3551, 4622, 5936, 9001], [0, 823, 832, 1151, 1752, 2548, 3036, 3551, 4622, 6440, 9001], [0, 823, 832, 1151, 1752, 2548, 3036, 3551, 4622, 9001], [0, 823, 832, 1151, 1752, 2548, 3036, 3551, 5936, 6440, 9001], [0, 823, 832, 1151, 1752, 2548, 3036, 3551, 5936, 9001], [0, 823, 832, 1151, 1752, 2548, 3036, 3551, 6440, 9001], [0, 823, 832, 1151, 1752, 2548, 3036, 4622, 5936, 6440, 9001], [0, 823, 832, 1151, 1752, 2548, 3036, 4622, 5936, 9001], [0, 823, 832, 1151, 1752, 2548, 3036, 4622, 6440, 9001], [0, 823, 832, 1151, 1752, 2548, 3036, 4622, 9001], [0, 823, 832, 1151, 1752, 2548, 3036, 5936, 6440, 9001], [0, 823, 832, 1151, 1752, 2548, 3036, 5936, 9001], [0, 823, 832, 1151, 1752, 2548, 3036, 6440, 9001], [0, 823, 832, 1151, 1752, 2548, 3551, 4622, 5936, 6440, 9001], [0, 823, 832, 1151, 1752, 2548, 3551, 4622, 5936, 9001], [0, 823, 832, 1151, 1752, 2548, 3551, 4622, 6440, 9001], [0, 823, 832, 1151, 1752, 2548, 3551, 4622, 9001], [0, 823, 832, 1151, 1752, 2548, 3551, 5936, 6440, 9001], [0, 823, 832, 1151, 1752, 2548, 3551, 5936, 9001], [0, 823, 832, 1151, 1752, 2548, 3551, 6440, 9001], [0, 823, 832, 1151, 1752, 2548, 4622, 5936, 6440, 9001], [0, 823, 832, 1151, 1752, 2548, 4622, 5936, 9001], [0, 823, 832, 1151, 1752, 2548, 4622, 6440, 9001], [0, 823, 832, 1151, 1752, 2548, 4622, 9001], [0, 823, 832, 1151, 1752, 3036, 3551, 4622, 5936, 6440, 9001], [0, 823, 832, 1151, 1752, 3036, 3551, 4622, 5936, 9001], [0, 823, 832, 1151, 1752, 3036, 3551, 4622, 6440, 9001], [0, 823, 832, 1151, 1752, 3036, 3551, 4622, 9001], [0, 823, 832, 1151, 1752, 3036, 3551, 5936, 6440, 9001], [0, 823, 832, 1151, 1752, 3036, 3551, 5936, 9001], [0, 823, 832, 1151, 1752, 3036, 3551, 6440, 9001], [0, 823, 832, 1151, 1752, 3036, 4622, 5936, 6440, 9001], [0, 823, 832, 1151, 1752, 3036, 4622, 5936, 9001], [0, 823, 832, 1151, 1752, 3036, 4622, 6440, 9001], [0, 823, 832, 1151, 1752, 3036, 4622, 9001], [0, 823, 832, 1151, 1752, 3036, 5936, 6440, 9001], [0, 823, 832, 1151, 1752, 3036, 5936, 9001], [0, 823, 832, 1151, 1752, 3036, 6440, 9001], [0, 823, 832, 1151, 1752, 3551, 4622, 5936, 6440, 9001], [0, 823, 832, 1151, 1752, 3551, 4622, 5936, 9001], [0, 823, 832, 1151, 1752, 3551, 4622, 6440, 9001], [0, 823, 832, 1151, 1752, 3551, 4622, 9001], [0, 823, 832, 1151, 1752, 3551, 5936, 6440, 9001], [0, 823, 832, 1151, 1752, 3551, 5936, 9001], [0, 823, 832, 1151, 1752, 3551, 6440, 9001], [0, 823, 832, 1151, 1752, 4622, 5936, 6440, 9001], [0, 823, 832, 1151, 1752, 4622, 5936, 9001], [0, 823, 832, 1151, 1752, 4622, 6440, 9001], [0, 823, 832, 1151, 1752, 4622, 9001], [0, 823, 832, 1151, 2548, 3036, 3551, 4622, 5936, 6440, 9001], [0, 823, 832, 1151, 2548, 3036, 3551, 4622, 5936, 9001], [0, 823, 832, 1151, 2548, 3036, 3551, 4622, 6440, 9001], [0, 823, 832, 1151, 2548, 3036, 3551, 4622, 9001], [0, 823, 832, 1151, 2548, 3036, 3551, 5936, 6440, 9001], [0, 823, 832, 1151, 2548, 3036, 3551, 5936, 9001], [0, 823, 832, 1151, 2548, 3036, 3551, 6440, 9001], [0, 823, 832, 1151, 2548, 3036, 4622, 5936, 6440, 9001], [0, 823, 832, 1151, 2548, 3036, 4622, 5936, 9001], [0, 823, 832, 1151, 2548, 3036, 4622, 6440, 9001], [0, 823, 832, 1151, 2548, 3036, 4622, 9001], [0, 823, 832, 1151, 2548, 3036, 5936, 6440, 9001], [0, 823, 832, 1151, 2548, 3036, 5936, 9001], [0, 823, 832, 1151, 2548, 3036, 6440, 9001], [0, 823, 832, 1151, 2548, 3551, 4622, 5936, 6440, 9001], [0, 823, 832, 1151, 2548, 3551, 4622, 5936, 9001], ...]

如果上面的函数在你的一个字典上永远旋转,那么是时候修改关于图是有向的假设了。

您可以将字典视为邻接表。你可以使用普通的Python(就像Samwise的答案一样),但如果图有循环,他们的答案将不起作用。

networkx公开了一种寻找所需路径的方法,因此我们可以使用它。这个函数返回一个生成器,这意味着它不会一次将所有路径加载到内存中(尽管如果您想使用list(),可以这样做——但是如果图形很大,可能会耗尽内存):

import networkx as nx
graph = nx.DiGraph(V)
for path in nx.all_simple_paths(graph, 0, 9001):
print(path)

输出的前三行和最后三行:

[0, 823, 832, 1151, 1752, 2548, 3036, 3551, 4622, 5936, 6440, 9001]
[0, 823, 832, 1151, 1752, 2548, 3036, 3551, 4622, 5936, 9001]
[0, 823, 832, 1151, 1752, 2548, 3036, 3551, 4622, 6440, 9001]
... [755 more lines]
[0, 3036, 5936, 6440, 9001]
[0, 3036, 5936, 9001]
[0, 3036, 6440, 9001]

非递归深度优先搜索生成器函数

  • 解决方案比其他两个解决方案(即networkx, dfs)更快
  • 更新了KellyBundy在注释中的观察结果,使代码稍微快一些。

代码

def dfs_stack(graph, start, goal):
'''
Depth First Search for all paths from start to goal
'''
# Init stack to path with just starting vertex
stack = [[start]]

while stack:
# Expand path at end of stack
path = stack.pop()

if path[-1] == goal:
yield path                # reached goal
else:
# Add all paths of vertex to stack
for start in graph[path[-1]]:
stack.append(path + [start])

使用

# Use list on generator to obtain all paths
paths = list(dfs_stack(V, 0, 9001))
print(paths [:3])    # First 3 paths
# Output: [0, 3036, 6440, 9001], [0, 3036, 5936, 9001], [0, 3036, 5936, 6440, 9001]]
print(paths [-3:])    # Last 3 paths
# Output: [[0, 823, 832, 1151, 1752, 2548, 3036, 3551, 4622, 6440, 9001], [0, 823, 832, 1151, 1752, 2548, 3036, 3551, 4622, 5936, 9001], [0, 823, 832, 1151, 1752, 2548, 3036, 3551, 4622, 5936, 6440, 9001]]

时间比较

当前的方法比在OP数据上发布的其他两个解决方案快两倍以上。

Current Approach
%timeit list(dfs_stack(V, 0, 9001))
Result: 874 µs ± 42.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
DFS function from Samwise solution
%timeit dfs(V, 0, 9001)
Result: 2.1 ms ± 91.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Networkx solution from BrokenBenchmark solution
%%timeit 
graph = nx.DiGraph(V)
list(nx.all_simple_paths(graph, 0, 9001))
Result: 4.83 ms ± 113 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Timing from OP (see comments): this solution produces 12 Million paths in less than 20s, 
networkx takes in excess of 48s

修改以避免循环

虽然当前的图没有循环,但是可以做一个简单的修改来避免循环。

def dfs_stack_no_cycles(graph, start, goal):
'''
Depth First Search for all paths from start to goal
'''
graph = {k:set(v) for k, v in graph.items()}
# Init stack to path with just starting vertex
stack = [[start]]

while stack:
# Expand path at end of stack
path = stack.pop()

if path[-1] == goal:
yield path                # reached goal
else:
# Add all paths of vertex to stack
for start in graph[path[-1]] - set(path):
stack.append(path + [start])

最新更新