使用单独数据帧中的最新数据更新一个数据帧



我正在尝试使用存储在单独数据框中的价格来更新仓位数据帧。我想将最新价格放在"last_price"列中,将该价格的数据放在"last_date"列中。

import datetime
from scipy import stats
import numpy as np
import pandas as pd

df_portfolio = pd.DataFrame({ 'amount' : np.random.randint(low=0, high=10,     size=(4)),
'timestamp' : pd.Timestamp('20130102'),
'exch' : pd.Categorical(["e1","e1","e2","e2"]),
'token' : pd.Categorical(["BTC","ETH","ETH","LTC"])
})
df_ETH_price = pd.DataFrame({
'date': ('2018-08-11','2018-08-12','2018-08-13'),
'price' : (322.11,319.57,286.50)    
})
df_portfolio['last_price'] = np.nan
df_portfolio['last_date'] = "?"
print(df_portfolio)
print (df_ETH_price)

预期成果如下:

amount exch  timestamp token  last_price last_date
0       7   e1 2013-01-02   BTC         NaN         ?
1       4   e1 2013-01-02   ETH         286.50      2018-08-13
2       2   e2 2013-01-02   ETH         286.50      2018-08-13
3       9   e2 2013-01-02   LTC         NaN         ?

首先从df_ETH_price中的最新行创建一个数据帧:

df_ETH_price['date'] = pd.to_datetime(df_ETH_price['date'])
latest = df_ETH_price.assign(token='ETH').sort_values('date', ascending=False).head(1)
print(latest)
date  price token
2 2018-08-13  286.5   ETH

然后与df_portfolio合并:

res = pd.merge(df_portfolio, latest, how='left')
print(res)
amount exch  timestamp token       date  price
0       1   e1 2013-01-02   BTC        NaT    NaN
1       3   e1 2013-01-02   ETH 2018-08-13  286.5
2       6   e2 2013-01-02   ETH 2018-08-13  286.5
3       0   e2 2013-01-02   LTC        NaT    NaN

我会在您的价格跟踪数据框中创建一个标志,以便很明显正在谈论哪种硬币:

m = df_ETH_price.assign(token='ETH').tail(1)

然后合并:

df_portfolio.merge(
m, how='outer'
).rename(columns={'date': 'last_date', 'price': 'last_price'})

amount  timestamp exch token   last_date  last_price
0       2 2013-01-02   e1   BTC         NaN         NaN
1       0 2013-01-02   e1   ETH  2018-08-13       286.5
2       3 2013-01-02   e2   ETH  2018-08-13       286.5
3       7 2013-01-02   e2   LTC         NaN         NaN

.combine_first是更新DataFrame中的值的好方法

import pandas as pd
# Make datetime
df_ETH_price['date'] = pd.to_datetime(df_ETH_price.date)
# Find the last valid row + tidy up naming for the join.
df_last = (df_ETH_price.loc[[df_ETH_price.date.idxmax()]]
.add_prefix('last_')
.assign(token='ETH')
.set_index('token'))
df_portfolio = df_last.combine_first(df_portfolio.set_index('token')).reset_index()

输出:

token  amount  timestamp exch  last_date  last_price
0   BTC       3 2013-01-02   e1        NaT         NaN
1   ETH       6 2013-01-02   e1 2018-08-13       286.5
2   ETH       4 2013-01-02   e2 2018-08-13       286.5
3   LTC       8 2013-01-02   e2        NaT         NaN

如果你有多个数据帧需要这样做,我会考虑这样的事情:

def update_price(df_port, df, token):
df_last = (df.loc[[df.date.idxmax()]]
.add_prefix('last_')
.assign(token=token)
.set_index('token'))
return df_last.combine_first(df_port.set_index('token')).reset_index()

然后,您可以简单地执行以下操作:

df_LTC_price = pd.DataFrame({
'date': ('2018-08-11','2018-08-12','2018-08-13'),
'price' : (322.11,319.57,280.50)    
})
df_portfolio = update_price(df_portfolio, df_ETH_price, 'ETH')
df_portfolio = update_price(df_portfolio, df_LTC_price, 'LTC')

输出:

token  amount exch  last_date  last_price  timestamp
0   BTC     6.0   e1        NaT         NaN 2013-01-02
1   ETH     1.0   e1 2018-08-13       286.5 2013-01-02
2   ETH     3.0   e2 2018-08-13       286.5 2013-01-02
3   LTC     3.0   e2 2018-08-13       280.5 2013-01-02

由于.combine_first优先考虑df_last因此,如果您明天收到更多数据,您甚至可以使用它来更新df_portfolio,例如,如果您明天收到更多数据,则可以使用相同的token

进行更新。

假设df_ETH_price继续按时间顺序排列:

latest_ETH = df_ETH_price.iloc[-1]
df_portfolio.loc[df_portfolio['token'] == 'ETH', 'last_price'] = latest_ETH.price
df_portfolio.loc[df_portfolio['token'] == 'ETH', 'last_date'] = latest_ETH.date

最新更新