获取 pandas 数据框中子节点的所有直接中间和最终父节点



我有父子关系的数据帧,如下所示:

**child                Parent              relationship**
A1x2                 bc11                direct_parent
bc11                 Aw00                direct_parent
bc11                 Aw00                ultimate_parent
Aee1                 Aee0                direct_parent
Aee1                 Aee0                ultimate_parent

我想在新的数据帧中获取所有子节点的所有祖先。结果将如下所示:

node                   ancesstory_tree
A1x2                    [A1x2,bc11,Aw00]   
Aee1                    [Aee1,Aee0]

注意:真正的数据集在子级和最终父级之间可能有很多直接的前置节点。

另一种方法,使用networkx包中的from_pandas_edgelistancestors

import networkx as nx
# Create the Directed Graph
G = nx.from_pandas_edgelist(df,
source='Parent',
target='child',
create_using=nx.DiGraph())
# Create dict of nodes and ancestors
ancestors = {n: {n} | nx.ancestors(G, n) for n in df['child'].unique()}
# Convert dict back to DataFrame if necessary
df_ancestors = pd.DataFrame([(k, list(v)) for k, v in ancestors.items()],
columns=['node', 'ancestry_tree'])
print(df_ancestors)

[出]

node       ancestry_tree
0  A1x2  [A1x2, Aw00, bc11]
1  bc11        [bc11, Aw00]
2  Aee1        [Aee1, Aee0]

若要从输出表中筛选出"中间子项",可以仅使用out_degree方法筛选到最后一个子项 - 最后一个子项应具有out_degree== 0

last_children = [n for n, d in G.out_degree() if d == 0]
ancestors = {n: {n} | nx.ancestors(G, n) for n in last_children}
df_ancestors = pd.DataFrame([(k, list(v)) for k, v in ancestors.items()],
columns=['node', 'ancestry_tree'])

[出]

node       ancestry_tree
0  A1x2  [A1x2, Aw00, bc11]
1  Aee1        [Aee1, Aee0]
  • 创建关系字典
  • 逐步完成每个不是parentchild
  • 跟踪祖先路径以及后代set
    • 这很重要,因为如果我们遇到已经看到的节点,我们希望终止 while 循环

relate = dict(zip(df.child, df.Parent))
paths = {}
nodes = {}
for child in cp.keys() - {*cp.values()}:
paths[child] = [child]
nodes[child] = {child}
parent = relate[child]
while parent in relate and parent not in nodes[child]:
paths[child].append(parent)
nodes[child].add(parent)
parent = relate[parent]
paths[child].append(parent)
pd.Series(paths).rename_axis('node').reset_index(name='ancestry_tree')

node       ancestry_tree
0  Aee1        [Aee1, Aee0]
1  A1x2  [A1x2, bc11, Aw00]

最新更新