重新采样数据帧的一部分,并将其交易到ohlcv(panda)



好的,所以我想根据交易数据进行回溯测试。我想循环浏览贸易数据,并在此基础上重新采样/agg一个ohlcv。如果ohlcv的长度大于参数的最大值,我想使用ohlcv来计算它上的指标值。但当我重新采样/agg我的数据时,我总是只得到一行(也在循环的后面(。我还认为这不是通过数据帧循环的最佳方式吗?

因此,每次迭代我都会在历史交易中添加一行新行,然后基于该历史交易数据帧,我希望获得重新采样的ohlcv作为输出。

csv数据如下所示:

index,timestamp,trade_id,price,amount,taker_side_sell
10,1609688256174,359832187,33456.54,0.091,True
9,1609688256179,359832188,33460.03,0.003,False
8,1609688256179,359832189,33460.04,0.05,False
7,1609688256179,359832190,33460.66,0.029,False
6,1609688256251,359832191,33458.9,0.007,True
5,1609688256251,359832192,33458.8,0.007,True
4,1609688256251,359832193,33458.17,0.009,True
3,1609688256365,359832194,33461.89,0.001,True
2,1609688256399,359832195,33460.68,0.007,False
1,1609688256416,359832196,33460.67,0.223,True
0,1609688256458,359832197,33460.68,0.024,False

我的代码:

import pandas as pd
histTrades = pd.read_csv('data/binanceFutures/btc-usdt.csv')
histTrades = histTrades.set_index('timestamp')
histTrades.index = pd.to_datetime(histTrades.index, unit='ms')
ohlcv = pd.DataFrame()
for i in range(len(histTrades.index)):
df = histTrades.iloc[:i]
ohlcv['open'] = df.resample('1T')['price'].agg('first')
ohlcv['high'] = df.resample('1T')['price'].agg('max')
ohlcv['low'] = df.resample('1T')['price'].agg('min')
ohlcv['close'] = df.resample('1T')['price'].agg('last')
ohlcv['volume'] = df.resample('1T')['amount'].agg('sum')
if len(ohlcv) > calcMinKlines(params):
pass
# Calculate the indicator values

这是你想要的吗?

import pandas as pd

histTrades = pd.read_csv('example.csv')
histTrades = histTrades.set_index('timestamp')
histTrades.index = pd.to_datetime(histTrades.index, unit='ms')
ohlcv_data = []
for i in range(len(histTrades.index)):
df = histTrades.iloc[:i]
ohlcv_data.append([(float(x) if len(x) > 0 else None)for x in [
df.resample('1T')['price'].agg('first'),
df.resample('1T')['price'].agg('max'),
df.resample('1T')['price'].agg('min'),
df.resample('1T')['price'].agg('last'),
df.resample('1T')['amount'].agg('sum'),
]])
#if len(ohlcv) > calcMinKlines(params):
#    pass
# Calculate the indicator values
ohlcv = pd.DataFrame(ohlcv_data,columns=['open','high','low','close','volume'])
open    high    low close   volume
0   NaN NaN NaN NaN NaN
1   33456.54    33456.54    33456.54    33456.54    0.091
2   33456.54    33460.03    33456.54    33460.03    0.094
3   33456.54    33460.04    33456.54    33460.04    0.144
4   33456.54    33460.66    33456.54    33460.66    0.173
5   33456.54    33460.66    33456.54    33458.90    0.180
6   33456.54    33460.66    33456.54    33458.80    0.187
7   33456.54    33460.66    33456.54    33458.17    0.196
8   33456.54    33461.89    33456.54    33461.89    0.197
9   33456.54    33461.89    33456.54    33460.68    0.204
10  33456.54    33461.89    33456.54    33460.67    0.427

从这里的行与行之间读取,似乎OP是关于如何按给定频率的组获得所有数据的累积计数。在计数中,每行报告";打开";(组中的第一个值(;高":到目前为止的最大电流等

比显式循环更有效的方法是使用groupby生成这样的累积计数。它将具有与原始df:相同的行数

grb = df.groupby(pd.Grouper(freq='1T'))
hlv = grb.agg({
'price': ['cummax', 'cummin'],
'amount': ['cumsum', 'cumcount'],
})
hlv.columns = ['high', 'low', 'volume', 'count']
ohlcv = pd.concat([
grb['price'].transform('first').to_frame('open'),
hlv,
df['price'].to_frame('latest'),
], axis=1)[['open', 'high', 'low', 'latest', 'volume', 'count']]

(请注意,latest实际上只是价格,而不是grb['price'].transform('last'),后者将是整个团队的最后一个,引入了peek(。

在作为例子提供的原始数据上,我们得到了ohlcv:

open    high     low  latest  volume  count
date                                                                  
2020-12-17 00:17:13.378  89.973  89.973  89.973  89.973    10.5      0
2020-12-17 00:17:13.451  89.973  89.974  89.973  89.974    21.1      1
2020-12-17 00:17:14.088  89.973  89.974  89.936  89.936    22.2      2
2020-12-17 00:17:14.549  89.973  90.016  89.936  90.016    32.5      3
2020-12-17 00:17:14.564  89.973  90.016  89.936  90.016    44.4      4
2020-12-24 12:50:17.243  75.166  75.166  75.166  75.166     3.6      0
2020-12-24 12:50:18.224  75.166  75.171  75.166  75.171     3.8      1
2020-12-24 12:50:18.527  75.166  75.172  75.166  75.172     4.0      2
2020-12-24 12:50:18.527  75.166  75.174  75.166  75.174     4.1      3
2020-12-24 12:50:18.527  75.166  75.190  75.166  75.190     5.6      4

由此,您可以计算整个历史或部分历史的指标(可能也是一次性的(

最新更新