Python:基于主键组合连接数据集



我需要根据主键合并三个数据帧,并保留空值。我的数据帧就是这样,

df1
col1 col2 cola colb
123  abc  JNT  80
123  abc  MTN  82
124  de   KTU  90
df2
col1 col2 colc  cold
123  abc  testa vala
124  de   test2 val2
124  de  test3 val3
df3
col1 col2 cole
123  abc  data1

这是我尝试过的逻辑。

import pandas as pd
df1=pd.DataFrame(data={'col1':[123,123,124],'col2':['abc','abc','de'],'cola':['JNT','MTN','KTU'],'colb':[80,82,90]})
df2=pd.DataFrame(data={'col1':[123,124,124],'col2':['abc','de','de'],'colc':['testa','test2','test3'],'cold':['vala','val2','val3']})
df3=pd.DataFrame(data={'col1':[123],'col2':['abc'],'cole':['data1']})
cols=['col1','col2']
df=[df1,df2,df3]
df_final=pd.DataFrame()
for f in range(len(df)):    
df[f]=df[f].set_index(cols)
df_final=pd.concat([df1,df2,df3],axis=1)
print (df_final)
col1 col2 cola  colb  col1 col2   colc  cold   col1 col2   cole
0   123  abc  JNT    80   123  abc  testa  vala  123.0  abc  data1
1   123  abc  MTN    82   124   de  test2  val2    NaN  NaN    NaN
2   124   de  KTU    90   124   de  test3  val3    NaN  NaN    NaN

但我需要像这样的最终输出

col1  col2  cola  colb  colc  cold  cole
123   abc   JNT   80    testa vala  data1
123   abc   MTN   82
124   de                test2  val2
124   de                test3  val3

有人能帮帮我吗。提前感谢!!!

正如我所说,您需要cumcount创建帮助密钥

cols = ['col1','col2']
df1['help'] = df1.groupby(cols).cumcount()
df2['help'] = df2.groupby(cols).cumcount()
df3['help'] = df3.groupby(cols).cumcount()
out = pd.concat(([x.set_index(cols+['help'])for x in [df1,df2,df3]]),axis=1).reset_index(level=[0,1])
out
Out[13]: 
col1 col2 cola  colb   colc  cold   cole
help                                          
0      123  abc  JNT  80.0  testa  vala  data1
1      123  abc  MTN  82.0    NaN   NaN    NaN
0      124   de  KTU  90.0  test2  val2    NaN
1      124   de  NaN   NaN  test3  val3    NaN

最新更新