股票价格预测LSTM模型的结果显著高于预期



我制作了LSTM模型来估计第二天的股价。我用过tensorflow和keras。

然而,我不明白为什么我的模型的预测价格几乎总是比当前股价高出2或3个因素。有人知道我做错了什么吗?

代码如下所示:

def StockPredictor(stock, startdate, enddate, pricetype):

#Get the stock quote
df = web.DataReader(stock, data_source = 'yahoo', start=startdate, end=enddate)
#df = pd.read_csv('StockData/TATA.csv')

#Create a new dataframe with only the price type chosen
data = df.filter([pricetype])
dataset = data.values  #convert dataset into a numpy array
training_data_len = math.ceil(len(dataset) * 0.80) #ik wil 80% van de dataset gebruiken om het LSTM model te trainen (naar boven afronden met math.ceil)

#Scale the data (normalizing imput data) (helps the model)
scaler = MinMaxScaler(feature_range=(0,1))  #scaled_data allemaal waardes tussen 0 en 1
scaled_data = scaler.fit_transform(dataset)  #computes min and max values for scaling and transforms data based on these values

#Create the training data set
#Create the scaled training data set
train_data = scaled_data[0:training_data_len , :]
#split data into x_train and y_train datasets
x_train, y_train = [], []  #x_train independent training feature, y dependent
for i in range(60, len(train_data)):
x_train.append(train_data[i-60:i,0])  #bevat de waardes van 60 vorige periodes 
y_train.append(train_data[i, 0])    #bevat 61e waarde waarvan we willen dat model het voorspelt

#Convert x_train and y_train to numpy arrays
x_train, y_train = np.array(x_train), np.array(y_train)

#reshape data (LSTM expects data to be 3D in form of no. of samples, no. of timestamps and no. of features) (x_train is now 2D)
x_train = np.reshape(x_train, (x_train.shape[0],x_train.shape[1], 1)) #reshape tot 3D, x_train.shape[0] = no of rows in 2D x_train, [1] is no of colums van 2D x_train

#Build the LSTM model
model=Sequential()
model.add(LSTM(50, return_sequences=True, input_shape=(x_train.shape[1], 1)))
model.add(LSTM(50, return_sequences= False))
model.add(Dense(25))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mean_squared_error') #model has loss function and optimizer

#training the model with the fit function
model.fit(x_train, y_train, batch_size=1, epochs=1) #epoch is no of iterations of the dataset forth and backwarth in neural network

#Create the testing data set
#Create new array containing scale valuels from index
test_data = scaled_data[training_data_len - 60: , :]
#create datasets x_test and y_test
x_test = []
y_test = dataset[training_data_len:, :]
for i in range(60, len(test_data)):
x_test.append(test_data[i-60:i,0])

#convert data into numpy array
x_test = np.array(x_test)

#Reshape data (zelfde uitleg als regel 65)
x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], 1))

#Get models predicted price values
#predictions afhankelijk van x_test moeten zelfde values krijgen als y_test
predictions = model.predict(x_test) #want predictions to contain same values as y_test
predictions = scaler.inverse_transform(predictions) #unscale the values

#Get the RMSE (om het model te testen)
rmse = np.sqrt(np.mean(predictions - y_test)**2)
rmse

#Plot the data
train = data[:training_data_len]
valid = data[training_data_len:]
valid['Predictions'] = predictions

print('The RMSE for the training model =', rmse)

new_df = df.filter([pricetype])
#get the last 60 days
last_60_days = new_df[-60:].values
last_60_days_scaled = scaler.transform(last_60_days)
#create empty list
X_test = []
#append past 60 days to list
X_test.append(last_60_days)
#Convert X_test to numpy array
X_test = np.array(X_test)
#reshape to 3D
X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1],1))
#Get predicted scaled price
pred_price = model.predict(X_test)
#undo scaling
pred_price = scaler.inverse_transform(pred_price)
print('Predicted price for the next day is :',pred_price)

return pred_price 
allprices = []
for i in range(10):
pred_price = StockPredictor()
allprices.append(pred_price)

average_pred_price = sum(allprices) / len(allprices)

您在01之间使用最小-最大缩放器,其中定义的高点是历史高点。你的LSTM模型将预测一个新高,当你inverse_transform预测时,它可能会高于缩放器拟合的最小值和最大值。

因此,缩放器可能是导致你的预测高出2倍的罪魁祸首。使用标准缩放器可能会有所帮助,或者根本不进行缩放。

旁注

LSTM仅仅根据价格数据预测股价是行不通的。

可能发生的情况是,您的LSTM模型将以T+1滞后预测价格,即以1天滞后预测价格。

价格数据本身就包含噪音,这是由零售交易员带来的,尤其是现在的社会情绪交易。LSTM可能过度拟合历史噪声,因此不代表未来的";噪声">

有关噪音问题的更多信息,请查看此链接-https://www.investopedia.com/articles/trading/06/marketnoise.asp

最新更新