计算是否存在巨型组件



我有一组整数,它们通过连接该集的整数来创建一个图,其中二进制表示只相差一个位置,例如:

set={0, 3, 16} --> their binary representation are  {00000, 00011, 10000}

这将是一个图,其中两个节点连接(0和16(,而3不连接。现在我想计算,如果集合创建了一个完全连通的图。换句话说,如果图的巨大组件包含所有节点。目前,我只使用networkx解决了这个问题,首先在networkx中创建一个图,然后使用nx.is_connected(G)

G = nx.Graph()
for key in set:
G.add_node(key)
for n1 in G.nodes(data=True):
for n2 in G.nodes(data=True):
if bin(n1[0]^n2[0]).count("1") == 1:  #compare if binary rep. differs at one position
G.add_edge(n1[0], n2[0], weight=1)
if nx.is_connected(G):

这是有问题的,因为它很慢,我更喜欢而不是使用networkx。你能帮我吗?谢谢

这可以在basepython中使用字典来构造图,然后广度优先搜索来确定图是否完全连接。这是一个没有networkx的实现。

def giantComponentExists(nums):
# Construct a graph as a dictionary
graph = {n:[] for n in nums}
# Add edges between nodes
for n1 in nums:
for n2 in nums:
if bin(n1^n2).count("1") == 1:  #compare if binary rep. differs at one position
graph[n1].append(n2)
# BFS search to determine if graph is fully connected
fringe = [nums.pop()]
visited = set()
while fringe:
for edge in graph[fringe[0]]:
if edge not in visited:
fringe += [edge]
visited.add(fringe[0])
fringe.pop(0)
return len(visited) == len(graph.keys())
example1 = {0,1,16}
example2 = {0,3,16}
print(giantComponentExists(example1))
print(giantComponentExists(example2))

如果你关心速度,那么C++就是最好的选择。

要确定一个图是否完全连通,只需确定最大集团的数量恰好为一就足够了。

这是用于查找最大集团的伪代码

LOOP
CONSTRUCT empty current set
SELECT V arbitrary vertex
add V to current set
remove V from graph
LOOP      // while set is growing
added_to_set = false
LOOP V over vertices in graph
LOOP Vset over current set
IF Vset connected to V
add V to current set
remove V from graph
added_to_set = true
break;
IF added_to_set == false
break;    // the set is maximal
ADD current set to list of sets
IF graph has no remaining vertices
OUTPUT sets found
STOP

有关此的C++实现,请参阅https://github.com/JamesBremner/PathFinder2/blob/dbd6ff06edabd6a6d35d5eb10ed7972dc2d779a6/src/cPathFinder.cpp#L483

此代码可以在不到一秒钟的时间内处理数千个节点的图形。networkx的性能如何?

您的Graph实例化有点低效。Graph基本上是一部字典中的字典。如果图形足够大,则逐个添加边可能导致子分区的复制。如果Graph对象是用预先计算的所有边实例化的,那么这个问题就会消失。通过一些小的更改,绝大多数执行时间都花在";"边缘检测";,即此处:

bin(n1[0]^n2[0]).count("1") == 1

代码

假设脚本binary_graph.py包含以下两个版本的代码:

#!/usr/bin/env python
import networkx as nx
from itertools import combinations
@profile
def v1(nodes):
G = nx.Graph()
for key in nodes:
G.add_node(key)
for n1 in G.nodes(data=True):
for n2 in G.nodes(data=True):
if bin(n1[0]^n2[0]).count("1") == 1:  #compare if binary rep. differs at one position
G.add_edge(n1[0], n2[0], weight=1)
return nx.is_connected(G)
@profile
def v2(nodes):
edges = [(n1, n2) for (n1, n2) in combinations(nodes, 2) if bin(n1 ^ n2).count("1") == 1]
G = nx.Graph(edges)
return nx.is_connected(G)
if __name__ == '__main__':
nodes = range(1000)
print(v1(nodes))
print(v2(nodes))

分析

通过运行来评测脚本

kernprof -l binary_graph.py
python -m line_profiler binary_graph.py.lprof

这会产生以下分析信息:

Timer unit: 1e-06 s
Total time: 0.888128 s
File: binary_graph.py
Function: v1 at line 5
Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
5                                           @profile
6                                           def v1(nodes):
7         1          9.0      9.0      0.0      G = nx.Graph()
8      1001        326.0      0.3      0.0      for key in nodes:
9      1000       1457.0      1.5      0.2          G.add_node(key)
10      1001        348.0      0.3      0.0      for n1 in G.nodes(data=True):
11   1001000     312677.0      0.3     35.2          for n2 in G.nodes(data=True):
12   1000000     548470.0      0.5     61.8              if bin(n1[0]^n2[0]).count("1") == 1:  #compare if binary rep. differs at one position
13      9864      22631.0      2.3      2.5                  G.add_edge(n1[0], n2[0], weight=1)
14         1       2210.0   2210.0      0.2      return nx.is_connected(G)
Total time: 0.175228 s
File: binary_graph.py
Function: v2 at line 16
Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
16                                           @profile
17                                           def v2(nodes):
18         1     160153.0 160153.0     91.4      edges = [(n1, n2) for (n1, n2) in combinations(nodes, 2) if bin(n1 ^ n2).count("1") == 1]
19         1      12890.0  12890.0      7.4      G = nx.Graph(edges)
20         1       2185.0   2185.0      1.2      return nx.is_connected(G)

换句话说,通过更优化的networkxGraph实例化,可以清楚地看到,您的绝大多数执行时间都与networkx无关。

最新更新