我正在尝试使用存储在单独数据框中的价格来更新仓位数据帧。我想将最新价格放在"last_price"列中,将该价格的数据放在"last_date"列中。
import datetime
from scipy import stats
import numpy as np
import pandas as pd
df_portfolio = pd.DataFrame({ 'amount' : np.random.randint(low=0, high=10, size=(4)),
'timestamp' : pd.Timestamp('20130102'),
'exch' : pd.Categorical(["e1","e1","e2","e2"]),
'token' : pd.Categorical(["BTC","ETH","ETH","LTC"])
})
df_ETH_price = pd.DataFrame({
'date': ('2018-08-11','2018-08-12','2018-08-13'),
'price' : (322.11,319.57,286.50)
})
df_portfolio['last_price'] = np.nan
df_portfolio['last_date'] = "?"
print(df_portfolio)
print (df_ETH_price)
预期成果如下:
amount exch timestamp token last_price last_date
0 7 e1 2013-01-02 BTC NaN ?
1 4 e1 2013-01-02 ETH 286.50 2018-08-13
2 2 e2 2013-01-02 ETH 286.50 2018-08-13
3 9 e2 2013-01-02 LTC NaN ?
首先从df_ETH_price
中的最新行创建一个数据帧:
df_ETH_price['date'] = pd.to_datetime(df_ETH_price['date'])
latest = df_ETH_price.assign(token='ETH').sort_values('date', ascending=False).head(1)
print(latest)
date price token
2 2018-08-13 286.5 ETH
然后与df_portfolio
合并:
res = pd.merge(df_portfolio, latest, how='left')
print(res)
amount exch timestamp token date price
0 1 e1 2013-01-02 BTC NaT NaN
1 3 e1 2013-01-02 ETH 2018-08-13 286.5
2 6 e2 2013-01-02 ETH 2018-08-13 286.5
3 0 e2 2013-01-02 LTC NaT NaN
我会在您的价格跟踪数据框中创建一个标志,以便很明显正在谈论哪种硬币:
m = df_ETH_price.assign(token='ETH').tail(1)
然后合并:
df_portfolio.merge(
m, how='outer'
).rename(columns={'date': 'last_date', 'price': 'last_price'})
amount timestamp exch token last_date last_price
0 2 2013-01-02 e1 BTC NaN NaN
1 0 2013-01-02 e1 ETH 2018-08-13 286.5
2 3 2013-01-02 e2 ETH 2018-08-13 286.5
3 7 2013-01-02 e2 LTC NaN NaN
.combine_first
是更新DataFrame
中的值的好方法
import pandas as pd
# Make datetime
df_ETH_price['date'] = pd.to_datetime(df_ETH_price.date)
# Find the last valid row + tidy up naming for the join.
df_last = (df_ETH_price.loc[[df_ETH_price.date.idxmax()]]
.add_prefix('last_')
.assign(token='ETH')
.set_index('token'))
df_portfolio = df_last.combine_first(df_portfolio.set_index('token')).reset_index()
输出:
token amount timestamp exch last_date last_price
0 BTC 3 2013-01-02 e1 NaT NaN
1 ETH 6 2013-01-02 e1 2018-08-13 286.5
2 ETH 4 2013-01-02 e2 2018-08-13 286.5
3 LTC 8 2013-01-02 e2 NaT NaN
如果你有多个数据帧需要这样做,我会考虑这样的事情:
def update_price(df_port, df, token):
df_last = (df.loc[[df.date.idxmax()]]
.add_prefix('last_')
.assign(token=token)
.set_index('token'))
return df_last.combine_first(df_port.set_index('token')).reset_index()
然后,您可以简单地执行以下操作:
df_LTC_price = pd.DataFrame({
'date': ('2018-08-11','2018-08-12','2018-08-13'),
'price' : (322.11,319.57,280.50)
})
df_portfolio = update_price(df_portfolio, df_ETH_price, 'ETH')
df_portfolio = update_price(df_portfolio, df_LTC_price, 'LTC')
输出:
token amount exch last_date last_price timestamp
0 BTC 6.0 e1 NaT NaN 2013-01-02
1 ETH 1.0 e1 2018-08-13 286.5 2013-01-02
2 ETH 3.0 e2 2018-08-13 286.5 2013-01-02
3 LTC 3.0 e2 2018-08-13 280.5 2013-01-02
由于.combine_first
优先考虑df_last
因此,如果您明天收到更多数据,您甚至可以使用它来更新df_portfolio
,例如,如果您明天收到更多数据,则可以使用相同的token
假设df_ETH_price
继续按时间顺序排列:
latest_ETH = df_ETH_price.iloc[-1]
df_portfolio.loc[df_portfolio['token'] == 'ETH', 'last_price'] = latest_ETH.price
df_portfolio.loc[df_portfolio['token'] == 'ETH', 'last_date'] = latest_ETH.date