好的,所以我想根据交易数据进行回溯测试。我想循环浏览贸易数据,并在此基础上重新采样/agg一个ohlcv。如果ohlcv的长度大于参数的最大值,我想使用ohlcv来计算它上的指标值。但当我重新采样/agg我的数据时,我总是只得到一行(也在循环的后面(。我还认为这不是通过数据帧循环的最佳方式吗?
因此,每次迭代我都会在历史交易中添加一行新行,然后基于该历史交易数据帧,我希望获得重新采样的ohlcv作为输出。
csv数据如下所示:
index,timestamp,trade_id,price,amount,taker_side_sell
10,1609688256174,359832187,33456.54,0.091,True
9,1609688256179,359832188,33460.03,0.003,False
8,1609688256179,359832189,33460.04,0.05,False
7,1609688256179,359832190,33460.66,0.029,False
6,1609688256251,359832191,33458.9,0.007,True
5,1609688256251,359832192,33458.8,0.007,True
4,1609688256251,359832193,33458.17,0.009,True
3,1609688256365,359832194,33461.89,0.001,True
2,1609688256399,359832195,33460.68,0.007,False
1,1609688256416,359832196,33460.67,0.223,True
0,1609688256458,359832197,33460.68,0.024,False
我的代码:
import pandas as pd
histTrades = pd.read_csv('data/binanceFutures/btc-usdt.csv')
histTrades = histTrades.set_index('timestamp')
histTrades.index = pd.to_datetime(histTrades.index, unit='ms')
ohlcv = pd.DataFrame()
for i in range(len(histTrades.index)):
df = histTrades.iloc[:i]
ohlcv['open'] = df.resample('1T')['price'].agg('first')
ohlcv['high'] = df.resample('1T')['price'].agg('max')
ohlcv['low'] = df.resample('1T')['price'].agg('min')
ohlcv['close'] = df.resample('1T')['price'].agg('last')
ohlcv['volume'] = df.resample('1T')['amount'].agg('sum')
if len(ohlcv) > calcMinKlines(params):
pass
# Calculate the indicator values
这是你想要的吗?
import pandas as pd
histTrades = pd.read_csv('example.csv')
histTrades = histTrades.set_index('timestamp')
histTrades.index = pd.to_datetime(histTrades.index, unit='ms')
ohlcv_data = []
for i in range(len(histTrades.index)):
df = histTrades.iloc[:i]
ohlcv_data.append([(float(x) if len(x) > 0 else None)for x in [
df.resample('1T')['price'].agg('first'),
df.resample('1T')['price'].agg('max'),
df.resample('1T')['price'].agg('min'),
df.resample('1T')['price'].agg('last'),
df.resample('1T')['amount'].agg('sum'),
]])
#if len(ohlcv) > calcMinKlines(params):
# pass
# Calculate the indicator values
ohlcv = pd.DataFrame(ohlcv_data,columns=['open','high','low','close','volume'])
open high low close volume
0 NaN NaN NaN NaN NaN
1 33456.54 33456.54 33456.54 33456.54 0.091
2 33456.54 33460.03 33456.54 33460.03 0.094
3 33456.54 33460.04 33456.54 33460.04 0.144
4 33456.54 33460.66 33456.54 33460.66 0.173
5 33456.54 33460.66 33456.54 33458.90 0.180
6 33456.54 33460.66 33456.54 33458.80 0.187
7 33456.54 33460.66 33456.54 33458.17 0.196
8 33456.54 33461.89 33456.54 33461.89 0.197
9 33456.54 33461.89 33456.54 33460.68 0.204
10 33456.54 33461.89 33456.54 33460.67 0.427
从这里的行与行之间读取,似乎OP是关于如何按给定频率的组获得所有数据的累积计数。在计数中,每行报告";打开";(组中的第一个值(;高":到目前为止的最大电流等
比显式循环更有效的方法是使用groupby
生成这样的累积计数。它将具有与原始df
:相同的行数
grb = df.groupby(pd.Grouper(freq='1T'))
hlv = grb.agg({
'price': ['cummax', 'cummin'],
'amount': ['cumsum', 'cumcount'],
})
hlv.columns = ['high', 'low', 'volume', 'count']
ohlcv = pd.concat([
grb['price'].transform('first').to_frame('open'),
hlv,
df['price'].to_frame('latest'),
], axis=1)[['open', 'high', 'low', 'latest', 'volume', 'count']]
(请注意,latest
实际上只是价格,而不是grb['price'].transform('last')
,后者将是整个团队的最后一个,引入了peek(。
在作为例子提供的原始数据上,我们得到了ohlcv
:
open high low latest volume count
date
2020-12-17 00:17:13.378 89.973 89.973 89.973 89.973 10.5 0
2020-12-17 00:17:13.451 89.973 89.974 89.973 89.974 21.1 1
2020-12-17 00:17:14.088 89.973 89.974 89.936 89.936 22.2 2
2020-12-17 00:17:14.549 89.973 90.016 89.936 90.016 32.5 3
2020-12-17 00:17:14.564 89.973 90.016 89.936 90.016 44.4 4
2020-12-24 12:50:17.243 75.166 75.166 75.166 75.166 3.6 0
2020-12-24 12:50:18.224 75.166 75.171 75.166 75.171 3.8 1
2020-12-24 12:50:18.527 75.166 75.172 75.166 75.172 4.0 2
2020-12-24 12:50:18.527 75.166 75.174 75.166 75.174 4.1 3
2020-12-24 12:50:18.527 75.166 75.190 75.166 75.190 5.6 4
由此,您可以计算整个历史或部分历史的指标(可能也是一次性的(