我需要根据主键合并三个数据帧,并保留空值。我的数据帧就是这样,
df1
col1 col2 cola colb
123 abc JNT 80
123 abc MTN 82
124 de KTU 90
df2
col1 col2 colc cold
123 abc testa vala
124 de test2 val2
124 de test3 val3
df3
col1 col2 cole
123 abc data1
这是我尝试过的逻辑。
import pandas as pd
df1=pd.DataFrame(data={'col1':[123,123,124],'col2':['abc','abc','de'],'cola':['JNT','MTN','KTU'],'colb':[80,82,90]})
df2=pd.DataFrame(data={'col1':[123,124,124],'col2':['abc','de','de'],'colc':['testa','test2','test3'],'cold':['vala','val2','val3']})
df3=pd.DataFrame(data={'col1':[123],'col2':['abc'],'cole':['data1']})
cols=['col1','col2']
df=[df1,df2,df3]
df_final=pd.DataFrame()
for f in range(len(df)):
df[f]=df[f].set_index(cols)
df_final=pd.concat([df1,df2,df3],axis=1)
print (df_final)
col1 col2 cola colb col1 col2 colc cold col1 col2 cole
0 123 abc JNT 80 123 abc testa vala 123.0 abc data1
1 123 abc MTN 82 124 de test2 val2 NaN NaN NaN
2 124 de KTU 90 124 de test3 val3 NaN NaN NaN
但我需要像这样的最终输出
col1 col2 cola colb colc cold cole
123 abc JNT 80 testa vala data1
123 abc MTN 82
124 de test2 val2
124 de test3 val3
有人能帮帮我吗。提前感谢!!!
正如我所说,您需要cumcount
创建帮助密钥
cols = ['col1','col2']
df1['help'] = df1.groupby(cols).cumcount()
df2['help'] = df2.groupby(cols).cumcount()
df3['help'] = df3.groupby(cols).cumcount()
out = pd.concat(([x.set_index(cols+['help'])for x in [df1,df2,df3]]),axis=1).reset_index(level=[0,1])
out
Out[13]:
col1 col2 cola colb colc cold cole
help
0 123 abc JNT 80.0 testa vala data1
1 123 abc MTN 82.0 NaN NaN NaN
0 124 de KTU 90.0 test2 val2 NaN
1 124 de NaN NaN test3 val3 NaN