我有一个包含货币交易的DataFrame:
import pandas as pd
data = [[1653663281618, -583.8686, 'USD'],
[1653741652125, -84.0381, 'USD'],
[1653776860252, -33.8723, 'CHF'],
[1653845294504, -465.4614, 'CHF'],
[1653847155140, 22.285, 'USD'],
[1653993629537, -358.04640000000006, 'USD']]
df = pd.DataFrame(data = data, columns = ['time', 'qty', 'currency_1'])
我需要添加新列"balance"这将计算所有先前交易的"qty"列的总和。我有一个简单的函数:
def balance(row):
table = df[(df['time'] < row['time']) & (df['currency_1'] == row['currency_1'])]
return table['qty'].sum()
df['balance'] = df.apply(balance, axis = 1)
但是我真正的DataFrame非常大,.apply方法的工作速度非常慢。在这种情况下,是否可以避免使用apply函数?比如np,在哪里?
您可以在这里使用pandascumsum
:
编辑添加条件后:我不知道transform
与apply
相比表现如何,我会说在你的真实数据上试试。目前想不出更简单的解决办法。
df['balance'] = df.groupby('currency_1')['qty'].transform(lambda x: x.shift().cumsum())
print(df)
time qty currency_1 balance
0 1653663281618 -583.8686 USD NaN
1 1653741652125 -84.0381 USD -583.8686
2 1653776860252 -33.8723 CHF NaN
3 1653845294504 -465.4614 CHF -33.8723
4 1653847155140 22.2850 USD -667.9067
5 1653993629537 -358.0464 USD -645.6217
老回答:
df['Balance'] = df['qty'].shift(fill_value=0).cumsum()
print(df)
time qty currency_1 Balance
0 1653663281618 -583.8686 USD 0.0000
1 1653741652125 -84.0381 USD -583.8686
2 1653776860252 -33.8723 USD -667.9067
3 1653845294504 -465.4614 USD -701.7790
4 1653847155140 22.2850 USD -1167.2404
5 1653993629537 -358.0464 USD -1144.9554