父子关系层次结构



从pandas数据帧父-子表获取父的所有后代

我想做一些类似于上面的事情,但我希望输出有更多的层次结构,而不是按父母分组。所以child_id总是变成parent_id,除非没有子代;在这种情况下,检查last parent_id

Current Output:
parent_id  child_id
0        1000      2010
1        1000      2100
2        1000      2110
3        1000      3000
4        1000      3011
5        1000      3033
6        1000      3102
7        1000      3111
Preferred Output:
parent_id  child_id
0        1000      2010
1        2010      3011
2        3011      3050
3        2010      3102
4        2010      4001
5        1000      3000
6        3000      3011
7        3011      3050
8        3000      3033
9        1000      3102
10       1000      3111
etc. etc.

我想出了一些办法。我不知道它是最好的/最快的/最有效的,但它有效。

第一件事是使用上面的脚本创建父-子关系(如果还不存在(,并添加一个名为level的列,该列描述了零件在树0中的最高级别。然后

#this part of the script will create a row for each branch of the tree
dfsort = df[df['level'] == 0][['parent_id','child_id']].rename(columns = {'parent_id':f'level 0', 'child_id':f'level 1'})

for i in sorted(df['level'].unique()[1:]):
df1 = df[df['level'] == i][['parent_id','child_id']].rename(columns = {'parent_id':f'level {i}', 'child_id':f'level {i+1}'})
dfsort = pd.merge(dfsort, 
df,                  
how = 'left', on = [f'level {i}']
dfsort = dfsort[sorted(dfsort.columns)]
#create a node column to drop duplicates on (in case any similar parent child relations are used across multiple higher level parts
dfsort['Node'] = dfsort.astype(str).apply(list, axis =1 )
dfsort['Node'] = dfsort['Node'].apply(lambda x: [i for i in x if i != 'nan'])
#now that you have this relationship you can break it out in the correct order using another for loop
dfsort2 = pd.DataFrame()
#append a new dataframe with the parent childs from above table one row at a time
for i in range(len(dfsort)):
for l in range(len(dfsort.iloc[i][:-1])):
df = dfsort.iloc[[i]][[f'level {i}',f'level {i-1}', 'Node']].rename(columns = {f'level {l}':'parent_id', f'level {l+1}'})
df['level'] = l
dfsort2 = pd.concat([dfsort2, df])
dfsort2 = dfsort[(dfsort2['parent_id'].notna()) & 
(dfsort2['child_id'].notna())]
dfsort2 ['order node index'] = dfsort2 .apply(lambda x: x['Node'].index(x['child_id']), axis = 1)
dfsort2 ['Query Node'] = dfsort2 .apply(lambda x: x['Node'][:x['order node index'] + 1], axis = 1).apply(lambda x: ",".join(x))
del dfsort2 ['order node index'], dfsort2 ['Node']
dfsort2 = dfsort2 .drop_duplicates()

最新更新