为回溯测试和机器学习指定测试行



我想使用机器学习来预测资产的价格变动。 到目前为止,我得到了数据和结果。 现在我想回测模型。 前提非常简单:只要预测值为 1 就买入并持有。我想应用预测模型并从下到上迭代测试行到指定数字,检查预测输出是否与相应的标签匹配(这里的标签是 -1,1(,然后进行一些计算。

这是代码:

def backtest():
x = df[['open', 'high', 'low', 'close', 'vol']]
y = df['label']
z = np.array(df['log_ret'].values)
test_size = 366
rf = RandomForestClassifier(n_estimators = 100)
rf.fit(x[:-test_size],y[:-test_size])
invest_amount = 1000
trade_qty = 0
correct_count = 0
for i in range(1, test_size):
if rf.predict(x[-i])[0] == y[-i]:
correct_count += 1
if rf.predict(x[-i])[0] == 1:
invest_return = invest_amount + (invest_amount * (z[-i]/100))
trade_qty += 1

print('accuracy:', (correct_count/test_size)*100)
print('total trades:', trade_qty)
print('profits:', invest_return)
backtest()

到目前为止,我被困在这一点上:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~anaconda3libsite-packagespandascoreindexesbase.py in get_loc(self, key, method, tolerance)
2645             try:
-> 2646                 return self._engine.get_loc(key)
2647             except KeyError:
pandas_libsindex.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas_libsindex.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas_libshashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas_libshashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: -1
During handling of the above exception, another exception occurred:
KeyError                                  Traceback (most recent call last)
<ipython-input-29-feab89792f26> in <module>
22 
23 for i in range(1, test_size):
---> 24     if rf.predict(x[-i])[0] == y[-i]:
25         correct_count += 1
26 
~anaconda3libsite-packagespandascoreframe.py in __getitem__(self, key)
2798             if self.columns.nlevels > 1:
2799                 return self._getitem_multilevel(key)
-> 2800             indexer = self.columns.get_loc(key)
2801             if is_integer(indexer):
2802                 indexer = [indexer]
~anaconda3libsite-packagespandascoreindexesbase.py in get_loc(self, key, method, tolerance)
2646                 return self._engine.get_loc(key)
2647             except KeyError:
-> 2648                 return self._engine.get_loc(self._maybe_cast_indexer(key))
2649         indexer = self.get_indexer([key], method=method, tolerance=tolerance)
2650         if indexer.ndim > 1 or indexer.size > 1:
pandas_libsindex.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas_libsindex.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas_libshashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas_libshashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: -1

下面的代码通过一些修改解决了这个问题:

def backtest():
x = df[['open', 'high', 'low', 'close', 'vol']]
y = df['label']
z = np.array(df['log_ret'].values)
test_size = 366
rf = RandomForestClassifier(n_estimators = 100)
rf.fit(x[:-test_size],y[:-test_size])
invest_amount = 1000
trade_qty = 0
correct_count = 0
for i in range(1, test_size)[::-1]:
if rf.predict(x[x.index == i])[0] == y[i]:
correct_count += 1
if rf.predict(x[x.index == i])[0] == 1:
invest_return = invest_amount + (invest_amount * (z[i]/100))
trade_qty += 1
print('accuracy:', (correct_count/test_size)*100)
print('total trades:', trade_qty)
print('profits:', invest_return)
backtest()

解释修改:

  1. 通过筛选索引x[x.index == i]访问数据帧行;
  2. 修改向后范围的负指数,range(1, test_size)[::-1]调整较少;

生成测试用例:

import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
data = {'open': np.random.rand(1000), 
'high': np.random.rand(1000), 
'low': np.random.rand(1000), 
'close': np.random.rand(1000), 
'vol': np.random.rand(1000),
'log_ret': np.random.rand(1000),
'label': np.random.choice([-1,1], 1000)}
df = pd.DataFrame(data)

这将产生以下结果:

>> backtest()
accuracy: 99.72677595628416
total trades: 181
profits: 1006.8351193358026

最新更新