具有多个节点的Pandas多索引合并



我有三个网络节点。每个都有一个数据帧。在这些数据帧中,其中一个具有两个ID。另外两个各有一个。示例如下:

df(node0(:

| id1 | id2 | col1| col2|
| :-: | :-: | :-: | :-: |
| 1   | A   |  h  |  i  |
| 2   | B   |  j  |  k  |
| 3   | C   |  l  |  r  |
| 4   | D   |  v  |  d  |
| 5   | E   |  x  |  y  |
| 6   | F   |  w  |  p  |

df1(node1(:

| id1 | col3| col4|
| :-: | :-: | :-: |
| 1   | k   |  h  |
| 2   | d   |  j  |
| 3   | e   |  l  |

df2(node2(:

| id2 | col5| col6|
| :-: | :-: | :-: |
| B   | k   |  h  |
| C   | d   |  j  |
| D   | d   |  j  |
| E   | e   |  l  |

*请注意,所有col值都无关紧要,可以是任何值。仅关注ID

现在,我想以一种方式合并这三个数据帧:

  1. 将df与df1和df2相交,分别具有ID。即df:id1-df1:id1,df:id2-df2:id2

  2. 取决于输入、步骤1的交集或并集结果。以生成每个节点的最终结果。在这种情况下:

交叉

node0:

| id1 | id2 | col1| col2|
| :-: | :-: | :-: | :-: |
| 2   | B   |  j  |  k  |
| 3   | C   |  l  |  r  |

节点1:

| id1 | col3| col4|
| :-: | :-: | :-: |
| 2   | d   |  j  |
| 3   | e   |  l  |

节点2:

| id2 | col5| col6|
| :-: | :-: | :-: |
| B   | k   |  h  |
| C   | d   |  j  |

UNION

node0:

| id1 | id2 | col1| col2|
| :-: | :-: | :-: | :-: |
| 1   | A   |  h  |  i  |
| 2   | B   |  j  |  k  |
| 3   | C   |  l  |  r  |
| 4   | D   |  v  |  d  |
| 5   | E   |  x  |  y  |

节点1:

| id1 | col3| col4|
| :-: | :-: | :-: |
| 1   | k   |  h  |
| 2   | d   |  j  |
| 3   | e   |  l  |
| 4   |     |     |
| 5   |     |     |

节点2:

| id2 | col5| col6|
| :-: | :-: | :-: |
| A   |     |     |
| B   | k   |  h  |
| C   | d   |  j  |
| D   | d   |  j  |
| E   | e   |  l  |

现在,UNION的结果可能看起来很奇怪。但请记住,始终先将基df和其他两个相交。然后将结果进行交集或并集。

最后,在联合中为这些ID填充None是在消耗,但不是在原始的相应数据帧中。

在某些情况下,基本df(node0(的id可能少于df1或df2。其中的想法是相同的,您在基本df中填充None。

重要提示:无论相交或并集,三个输出总是有相同的行,但可以是不同的列。秩序不容破坏

我尝试了几种没有水果的方法。有时秩序混乱。

示例代码:

import pandas as pd
# INTERSECT or UNION 
inputs = "UNION"
if inputs == "INTERSECT":
merge_method = 'inner'
elif inputs == "UNION":
merge_method = 'outer'
else:
raise ValueError("inputs must be "INTERSECT" or "UNION" ")
d = {'id1': [1, 2, 3, 4, 5, 6], 
'id2': ['A', 'B', 'C', 'D', 'E', 'F'],
'col1': ['h', 'j', 'l', 'v', 'x', 'w'],
'col2': ['i', 'k', 'r', 'd', 'y', 'p']}
'''
# Case where df has less ids than df1 or df2
d = {'id1': [1, 2], 
'id2': ['A', 'B'],
'col1': ['h', 'j'],
'col2': ['i', 'k']}
'''
d1 = {'id1': [1, 2, 3],
'col3': ['k', 'd', 'e'],
'col4': ['h', 'j', 'l']}
d2 = {'id2': ['B', 'C', 'D', 'E'],
'col5': ['k', 'd', 'd', 'e'],
'col6': ['h', 'j', 'j', 'l']}
df = pd.DataFrame(d)
df1 = pd.DataFrame(d1)
df2 = pd.DataFrame(d2)
# Expend base df if it has less ids than df1 or df2
df0 = pd.merge(df, df1, how='outer', on='id1')[df.columns.tolist()]
df0 = pd.merge(df0, df2, how='outer', on='id2')[df.columns.tolist()]
# Intersect df with df1 and df2 with there respective ID. i.e. df: id1 - df1: id1, df: id2 - df2: id2
int_df1 = pd.merge(df0, df1, how='inner', on='id1')[['id1', 'id2']]
int_df2 = pd.merge(df0, df2, how='inner', on='id2')[['id1', 'id2']]
print("df and df1 intersection: n{}ndf and df2 intersection: n{}n".format(int_df1, int_df2))
# Depend on the input, Intersect or Union results from step 1. to produce final results for EACH NODES
merge_df = pd.merge(int_df1, int_df2, how=merge_method, on=['id1', 'id2'])
df_out = pd.merge(merge_df, df0, how='inner', on=['id1', 'id2'])
df1_out = pd.merge(merge_df['id1'], df1, how=merge_method, on='id1')
df2_out = pd.merge(merge_df['id2'], df2, how=merge_method, on='id2')
# Filling None in the union for these IDs are expending but not in original respective dataframe
mask = (~df_out['id1'].isin(df['id1'])) | (~df_out['id2'].isin(df['id2']))
df_out.loc[mask] = df_out.loc[mask].fillna("None")
mask = ~df1_out['id1'].isin(df1['id1'])
df1_out.loc[mask] = df1_out.loc[mask].fillna("None")
mask = ~df2_out['id2'].isin(df2['id2'])
df2_out.loc[mask] = df2_out.loc[mask].fillna("None")
print("merge_df: n{}nnode0: n{}nnode1: n{}nnode2: n{}n".format(merge_df, df_out, df1_out, df2_out))

INTERSECT的输出:

df and df1 intersection: 
id1 id2
0    1   A
1    2   B
2    3   C
df and df2 intersection:
id1 id2
0    2   B
1    3   C
2    4   D
3    5   E
merge_df: 
id1 id2
0    2   B
1    3   C
node0:
id1 id2 col1 col2
0    2   B    j    k
1    3   C    l    r
node1:
id1 col3 col4
0    2    d    j
1    3    e    l
node2:
id2 col5 col6
0   B    k    h
1   C    d    j

UNION的输出:

df and df1 intersection: 
id1 id2
0    1   A
1    2   B
2    3   C
df and df2 intersection:
id1 id2
0    2   B
1    3   C
2    4   D
3    5   E
merge_df: 
id1 id2
0    1   A
1    2   B
2    3   C
3    4   D
4    5   E
node0:
id1 id2 col1 col2
0    1   A    h    i
1    2   B    j    k
2    3   C    l    r
3    4   D    v    d
4    5   E    x    y
node1:
id1  col3  col4
0    1     k     h
1    2     d     j
2    3     e     l
3    4  None  None
4    5  None  None
node2:
id2  col5  col6
0   A  None  None
1   B     k     h
2   C     d     j
3   D     d     j
4   E     e     l

最新更新