多列的测试列车数据帧



我有一个csv文件

Date,Open,High,Low,Close,Adj Close,Volume,Cash EPS,Book Value,Div/share,Net profit/share,NPM,ROE,ROCE,ROA,DEBT/EQ,ATR,CR
2004-04-26,82.924217,82.924217,82.924217,82.924217,60.026066,0,221.24,488.21,129.5,186.6,26.11,38.22,38.22,24.2,0,92.67,1.65
2004-04-27,82.778122,82.778122,79.765625,80.24453,58.086323,28616000,221.24,488.21,129.5,186.6,26.11,38.22,38.22,24.2,0,92.67,1.65

仅给出2行,便于计算。我已经创建了一个数据帧

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.preprocessing import MinMaxScaler
dataframe1 = pd.read_csv('test.csv')
df = dataframe1.dropna()
scaler=MinMaxScaler(feature_range=(0,1))
df1=scaler.fit_transform(np.array(df1).reshape(-1,1))
min_max_scaler = MinMaxScaler()
df[["Open", "High", "Low", "Close", "Adj Close", "Volume", "Book Value", "Div/share", "Net profit/share", "NPM", "ROE", "ROCE", "ROA", "DEBT/EQ", "ATR", "CR"]] = min_max_scaler.fit_transform(df[["Open", "High", "Low", "Close", "Adj Close", "Volume", "Book Value", "Div/share", "Net profit/share", "NPM", "ROE", "ROCE", "ROA", "DEBT/EQ", "ATR", "CR"]])

为了训练数据集,我需要日期和预测,即关闭列。但是,关闭列值取决于多个列(即此csv中存在的所有列(

我如何训练日期和结束列的数据,但要基于所有其他列,以便预测未来的结束?

如果我理解这个问题,那么您正在寻找一个多变量时间序列模型。换句话说,为了进行前瞻性预测,每个时间步长都需要多个变量输入。下面是一些例子的链接:

https://www.relataly.com/stock-market-prediction-with-multivariate-time-series-in-python/1815/

此外,我建议研究一下Kaggle股票市场预测竞争,有数百个例子可以说明人们是如何处理这个问题的。

https://www.kaggle.com/c/two-sigma-financial-news

最新更新