我有两个数据帧,可以使用以下代码创建:
import yfinance as yf
symbols = ['QQQ', 'GBTC']
df1 = yf.download(symbols, start="2019-01-01", end="2019-01-07")
symbols = ['GBTC', 'TLT']
df2 = yf.download(symbols, start="2019-01-01", end="2019-01-07")
df1
和df2
的含量如下
> df1
Adj Close Close High Low
GBTC QQQ GBTC QQQ GBTC QQQ GBTC
Date
2018-12-31 3.965 152.132996 3.965 154.259995 4.15 154.979996 3.95
2019-01-02 4.620 152.744461 4.620 154.880005 4.65 155.750000 4.13
2019-01-03 4.520 147.754257 4.520 149.820007 4.62 153.259995 4.32
2019-01-04 4.530 154.075851 4.530 156.229996 4.65 157.000000 4.41
Open Volume
QQQ GBTC QQQ GBTC QQQ
Date
2018-12-31 152.710007 4.140 154.470001 3829000 53015300
2019-01-02 150.880005 4.155 150.990005 2948200 58576700
2019-01-03 149.490005 4.325 152.600006 1503000 74820200
2019-01-04 151.740005 4.585 152.339996 2020700 74709300
> df2
Adj Close Close High Low
GBTC TLT GBTC TLT GBTC TLT GBTC
Date
2018-12-31 3.965 116.845848 3.965 121.510002 4.15 121.559998 3.95
2019-01-02 4.620 117.461304 4.620 122.150002 4.65 122.160004 4.13
2019-01-03 4.520 118.797966 4.520 123.540001 4.62 123.860001 4.32
2019-01-04 4.530 117.422844 4.530 122.110001 4.65 122.559998 4.41
Open Volume
TLT GBTC TLT GBTC TLT
Date
2018-12-31 120.459999 4.140 120.650002 3829000 17409000
2019-01-02 121.339996 4.155 121.660004 2948200 19841500
2019-01-03 122.230003 4.325 122.290001 1503000 21187000
2019-01-04 121.650002 4.585 122.339996 2020700 12970200
df1
和df2
都包含GBTC
列。
如何将df1
和df2
合并为具有以下内容的新数据框?
> df3
Adj Close Close
GBTC QQQ TLT GBTC QQQ TLT
Date
2018-12-31 3.965 152.132996 116.845848 3.965 154.259995 121.510002
2019-01-02 4.620 152.744461 117.461304 4.620 154.880005 122.150002
2019-01-03 4.520 147.754257 118.797966 4.520 149.820007 123.540001
2019-01-04 4.530 154.075851 117.422844 4.530 156.229996 122.110001
High Low Open
GBTC QQQ TLT GBTC QQQ TLT GBTC
Date
2018-12-31 4.15 154.979996 121.559998 3.95 152.710007 120.459999 4.140
2019-01-02 4.65 155.750000 122.160004 4.13 150.880005 121.339996 4.155
2019-01-03 4.62 153.259995 123.860001 4.32 149.490005 122.230003 4.325
2019-01-04 4.65 157.000000 122.559998 4.41 151.740005 121.650002 4.585
Volume
QQQ TLT GBTC QQQ TLT
Date
2018-12-31 154.470001 120.650002 3829000 53015300 17409000
2019-01-02 150.990005 121.660004 2948200 58576700 19841500
2019-01-03 152.600006 122.290001 1503000 74820200 21187000
2019-01-04 152.339996 122.339996 2020700 74709300 12970200
我可能有多个重叠的列。
看来pandas.DataFrame.merge
不能达到我的目标。
unstack()
这样你就有两个数据帧来做merge()
- 从df2中选择新值作为首选
- 用
pivot()
塑形
dfm = pd.merge(df1.unstack().to_frame().reset_index(), df2.unstack().to_frame().reset_index(), on=["level_0","level_1","Date"],how="outer")
(dfm.assign(**{"0_y":dfm["0_y"].fillna(dfm["0_x"])})
.drop(columns="0_x")
.rename(columns={"0_y":0})
.pivot(index=["level_0","level_1"], columns="Date", values=0).T
)
(Adj密切,GBTC)(Adj密切,"回调") (Adj密切,TLT) (‘关闭’,‘GBTC) ("关闭"、"回调") (‘关闭’,‘TLT) (‘高’,‘GBTC) ("高","回调") (‘高’,‘TLT) ("低"、"GBTC") ("低","回调") ("低"、"TLT") ("开放"、"GBTC") ("开放"、"回调") ("开放"、"TLT") ("卷","GBTC") ("卷"、"回调") ("卷","TLT") 2019-01-02 00:00:00 4.62 152.744 4.62 154.88 4.65 155.75 122.16 4.13 ">150.88 121.34 4.155 150.99 121.66 2.9482e+06 5.85767e+07 1.98415e+07 1.98415e 2019-01-03 00:00:00 4.52 147.754 4.52 149.82 4.62 153.26 123.86 4.32 ">149.49 122.23 4.325 152.6 122.29 1.503e+06 7.48202e+07 2.1187e+07 2019-01-04 00:00:00 4.53 154.076 4.53 156.23 122.11 4.65 >157 122.56 4.41 151.65 4.585 152.34 2.0207e+06 7.47093e+07 1.29702e+07 1.29702e+07 1.29702e