在某些列上交错 2 个数据帧



>我有 2 个数据帧

df1:
StartLocation,StartDevice,StartPort,EndLocation,EndDevice,EndPort,LinkType,Speed 
DD1,Switch1,P1,AD1,Switch2,P2,MTP,1000
DD2,Switch2,P3,AD2,Switch3,P2,MTP,1000
DD3,Switch3,P5,AD3,Switch4,P6,MTP,1000
df2:
StartLocation,StartDevice,StartPort,EndLocation,EndDevice,EndPort
AB11,RU15,P1,AJ11,RU25,P2
AB12,RU18,P2,AB11,RU35,P2
AB13,RU19,P3,AB11,RU40,P4

我想交错两个数据帧,我已经尝试了一些选项,但似乎无法让它工作。我接近以下代码的功能,但它没有加入适当的列

import pandas as pd
from toolz import interleave
df3 = pd.DataFrame(interleave([df1.values, df2.values]), columns=df1)

预期输出如下所示

StartLocation,StartDevice,StartPort,EndLocation,EndDevice,EndPort,LinkType,Speed 
DD1,Switch1,P1,AD1,Switch2,P2,MTP,1000
AB11,RU15,P1,AJ11,RU25,P2,nan,nan
DD2,Switch2,P3,AD2,Switch3,P2,MTP,1000
AB12,RU18,P2,AB11,RU35,P2,nan,nan
DD3,Switch3,P5,AD3,Switch4,P6,MTP,1000
AB13,RU19,P3,AB11,RU40,P4,nan,nan

我认为它应该很简单,但我找不到合适的语法。谁能给出任何想法?

提前感谢您的帮助!

如果列名相同,唯一的区别是可以使用其中一个数据帧中的一些新列名:

df3 = pd.DataFrame(interleave([df1.values, df2.values]), columns=df1.columns)
print (df3)
StartLocation StartDevice StartPort EndLocation EndDevice EndPort LinkType  
0           DD1     Switch1        P1         AD1   Switch2      P2      MTP   
1          AB11        RU15        P1        AJ11      RU25      P2     None   
2           DD2     Switch2        P3         AD2   Switch3      P2      MTP   
3          AB12        RU18        P2        AB11      RU35      P2     None   
4           DD3     Switch3        P5         AD3   Switch4      P6      MTP   
5          AB13        RU19        P3        AB11      RU40      P4     None   
Speed  
0  1000.0  
1     NaN  
2  1000.0  
3     NaN  
4  1000.0  
5     NaN  

适用于任何列名称的更通用的解决方案是使用之前DataFrame.align来防止每个数据帧的正确对齐列:

print (df1)
EndDevice EndLocation EndPort LinkType  Speed StartDevice StartLocation  
0   Switch2         AD1      P2      MTP   1000     Switch1           DD1   
1   Switch3         AD2      P2      MTP   1000     Switch2           DD2   
2   Switch4         AD3      P6      MTP   1000     Switch3           DD3   
StartPort  
0        P1  
1        P3  
2        P5  
print (df2)
EndDevice EndLocation EndPort  LinkType  Speed StartDevice StartLocation  
0      RU25        AJ11      P2       NaN    NaN        RU15          AB11   
1      RU35        AB11      P2       NaN    NaN        RU18          AB12   
2      RU40        AB11      P4       NaN    NaN        RU19          AB13   
StartPort  
0        P1  
1        P2  
2        P3  

df3 = pd.DataFrame(interleave([df1.values, df2.values]), columns=df1.columns)
print (df3)
EndDevice EndLocation EndPort LinkType   Speed StartDevice StartLocation  
0   Switch2         AD1      P2      MTP  1000.0     Switch1           DD1   
1      RU25        AJ11      P2      NaN     NaN        RU15          AB11   
2   Switch3         AD2      P2      MTP  1000.0     Switch2           DD2   
3      RU35        AB11      P2      NaN     NaN        RU18          AB12   
4   Switch4         AD3      P6      MTP  1000.0     Switch3           DD3   
5      RU40        AB11      P4      NaN     NaN        RU19          AB13   
StartPort  
0        P1  
1        P1  
2        P3  
3        P2  
4        P5  
5        P3  

Index.unionDataFrame.reindex的另一个想法:

cols = df1.columns.union(df2.columns, sort=False)
df1 = df1.reindex(cols, axis=1)
df2 = df2.reindex(cols, axis=1)
print (df1)
StartLocation StartDevice StartPort EndLocation EndDevice EndPort LinkType  
0           DD1     Switch1        P1         AD1   Switch2      P2      MTP   
1           DD2     Switch2        P3         AD2   Switch3      P2      MTP   
2           DD3     Switch3        P5         AD3   Switch4      P6      MTP   
Speed  
0   1000  
1   1000  
2   1000  
print (df2)
StartLocation StartDevice StartPort EndLocation EndDevice EndPort  LinkType  
0          AB11        RU15        P1        AJ11      RU25      P2       NaN   
1          AB12        RU18        P2        AB11      RU35      P2       NaN   
2          AB13        RU19        P3        AB11      RU40      P4       NaN   
Speed  
0    NaN  
1    NaN  
2    NaN  

df3 = pd.DataFrame(interleave([df1.values, df2.values]), columns=cols)
print (df3)
StartLocation StartDevice StartPort EndLocation EndDevice EndPort LinkType  
0           DD1     Switch1        P1         AD1   Switch2      P2      MTP   
1          AB11        RU15        P1        AJ11      RU25      P2      NaN   
2           DD2     Switch2        P3         AD2   Switch3      P2      MTP   
3          AB12        RU18        P2        AB11      RU35      P2      NaN   
4           DD3     Switch3        P5         AD3   Switch4      P6      MTP   
5          AB13        RU19        P3        AB11      RU40      P4      NaN   
Speed  
0  1000.0  
1     NaN  
2  1000.0  
3     NaN  
4  1000.0  
5     NaN  

最新更新