梳理具有重叠列的pandas数据框



我有两个数据帧,可以使用以下代码创建:

import yfinance as yf
symbols = ['QQQ', 'GBTC']
df1 = yf.download(symbols, start="2019-01-01", end="2019-01-07")
symbols = ['GBTC', 'TLT']
df2 = yf.download(symbols, start="2019-01-01", end="2019-01-07")

df1df2的含量如下

> df1
Adj Close              Close              High               Low  
GBTC         QQQ   GBTC         QQQ  GBTC         QQQ  GBTC   
Date                                                                          
2018-12-31     3.965  152.132996  3.965  154.259995  4.15  154.979996  3.95   
2019-01-02     4.620  152.744461  4.620  154.880005  4.65  155.750000  4.13   
2019-01-03     4.520  147.754257  4.520  149.820007  4.62  153.259995  4.32   
2019-01-04     4.530  154.075851  4.530  156.229996  4.65  157.000000  4.41   
Open               Volume            
QQQ   GBTC         QQQ     GBTC       QQQ  
Date                                                          
2018-12-31  152.710007  4.140  154.470001  3829000  53015300  
2019-01-02  150.880005  4.155  150.990005  2948200  58576700  
2019-01-03  149.490005  4.325  152.600006  1503000  74820200  
2019-01-04  151.740005  4.585  152.339996  2020700  74709300
> df2
Adj Close              Close              High               Low  
GBTC         TLT   GBTC         TLT  GBTC         TLT  GBTC   
Date                                                                          
2018-12-31     3.965  116.845848  3.965  121.510002  4.15  121.559998  3.95   
2019-01-02     4.620  117.461304  4.620  122.150002  4.65  122.160004  4.13   
2019-01-03     4.520  118.797966  4.520  123.540001  4.62  123.860001  4.32   
2019-01-04     4.530  117.422844  4.530  122.110001  4.65  122.559998  4.41   
Open               Volume            
TLT   GBTC         TLT     GBTC       TLT  
Date                                                          
2018-12-31  120.459999  4.140  120.650002  3829000  17409000  
2019-01-02  121.339996  4.155  121.660004  2948200  19841500  
2019-01-03  122.230003  4.325  122.290001  1503000  21187000  
2019-01-04  121.650002  4.585  122.339996  2020700  12970200  

df1df2都包含GBTC列。

如何将df1df2合并为具有以下内容的新数据框?

> df3
Adj Close                          Close                          
GBTC         QQQ         TLT   GBTC         QQQ         TLT   
Date                                                                          
2018-12-31     3.965  152.132996  116.845848  3.965  154.259995  121.510002   
2019-01-02     4.620  152.744461  117.461304  4.620  154.880005  122.150002   
2019-01-03     4.520  147.754257  118.797966  4.520  149.820007  123.540001   
2019-01-04     4.530  154.075851  117.422844  4.530  156.229996  122.110001   
High                           Low                           Open  
GBTC         QQQ         TLT  GBTC         QQQ         TLT   GBTC   
Date                                                                            
2018-12-31  4.15  154.979996  121.559998  3.95  152.710007  120.459999  4.140   
2019-01-02  4.65  155.750000  122.160004  4.13  150.880005  121.339996  4.155   
2019-01-03  4.62  153.259995  123.860001  4.32  149.490005  122.230003  4.325   
2019-01-04  4.65  157.000000  122.559998  4.41  151.740005  121.650002  4.585   
Volume                      
QQQ         TLT     GBTC       QQQ       TLT  
Date                                                             
2018-12-31  154.470001  120.650002  3829000  53015300  17409000  
2019-01-02  150.990005  121.660004  2948200  58576700  19841500  
2019-01-03  152.600006  122.290001  1503000  74820200  21187000  
2019-01-04  152.339996  122.339996  2020700  74709300  12970200 

我可能有多个重叠的列。

看来pandas.DataFrame.merge不能达到我的目标。

  • unstack()这样你就有两个数据帧来做merge()
  • df2中选择新值作为首选
  • pivot()塑形
dfm = pd.merge(df1.unstack().to_frame().reset_index(), df2.unstack().to_frame().reset_index(), on=["level_0","level_1","Date"],how="outer")
(dfm.assign(**{"0_y":dfm["0_y"].fillna(dfm["0_x"])})
.drop(columns="0_x")
.rename(columns={"0_y":0})
.pivot(index=["level_0","level_1"], columns="Date", values=0).T
)
(Adj密切,GBTC)(Adj密切,"回调")(Adj密切,TLT)(‘关闭’,‘GBTC)("关闭"、"回调")(‘关闭’,‘TLT)(‘高’,‘GBTC)("高","回调")(‘高’,‘TLT)("低"、"GBTC")("低","回调")("低"、"TLT")("开放"、"GBTC")("开放"、"回调")("开放"、"TLT")("卷","GBTC")("卷"、"回调")("卷","TLT")2019-01-02 00:00:004.62152.7444.62154.884.65155.75122.164.13">150.88121.344.155150.99121.662.9482e+065.85767e+071.98415e+071.98415e2019-01-03 00:00:004.52147.7544.52149.824.62153.26123.864.32">149.49122.234.325152.6122.291.503e+067.48202e+072.1187e+072019-01-04 00:00:004.53154.0764.53156.23122.114.65>157122.564.41151.654.585152.342.0207e+067.47093e+071.29702e+071.29702e+071.29702e

最新更新