给定非文本顺序数据,用于多类分类的 LSTM 的 y 训练形状应该是什么?



问题描述

我有一个数据集(特征=175,n_time_steps=954,序列数=737)。 第 1-174 列是特征,最后一个目标列包含 3 个不同的类。我想使用 LSTM 进行多类分类,只预测最后一个时间步长,即使用 953 步和特征来预测步骤 954 的类。我正在为y_train输入的结构而苦苦挣扎。我将不胜感激关于如何正确重塑y_train

以解决此问题的任何想法。

数据

我有 737 个产品,每个产品都有 954 天的销售额。目标类为(0-当产品不存在时,1-产品为A型,2-产品为B型)。我需要使用 953 天和 174 个特征来预测序列最后日期 (954) 每个产品的类别。测试集有100个产品,训练组-637个产品。

整形后X_train有(637、953、175)形状。 y_train具有以下形状 (637, 1).当我to_categorical运行时,形状是(637,2)。两种y_train形状在拟合到 LSTM 模型时都会引发错误。

当我拟合形状y_train (637, 1) 时,错误是

ValueError: You are passing a target array of shape (637, 1) while using as loss `categorical_crossentropy`. `categorical_crossentropy` expects targets to be binary matrices (1s and 0s) of shape (samples, classes). If your targets are integer classes, you can convert them to the expected format via:
from keras.utils import to_categorical
y_binary = to_categorical(y_int)

Alternatively, you can use the loss function `sparse_categorical_crossentropy` instead, which does expect integer targets.

当我拟合形状to_categorical(y_train)(637, 2) 时,错误是

ValueError: Error when checking target: expected dense_45 to have shape (1,) but got array with shape (2,)

当我更改为"sparse_categorical_crossentropy"并适合形状 (637,1) y_train时,错误是

InvalidArgumentError: Received a label value of 1 which is outside the valid range of [0, 1).  Label values: 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 1 1 0 1 0 1 1 0 1 0 1 0 0 1 0 0 1 0 0 0 0 1 1 1 1 0 0 0 1 0 0 1 0 0 0 1 1 0 0 1 1 0 0 1 1 1 1 1 0 0 1 0 0 0 0 0 0 0 0 0
[[{{node loss_13/dense_48_loss/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits}}]] 

这是我的模型

model = Sequential([
LSTM(units=1024, 
input_shape=(periods_to_train,features), kernel_initializer='he_uniform',
activation ='linear', kernel_constraint=maxnorm(3), return_sequences=False),
Dropout(rate=0.5),
Dense(units=1024,kernel_initializer='he_uniform', 
activation='linear', kernel_constraint=maxnorm(3)),
Dropout(rate=0.5),
Dense(units=1024, kernel_initializer='he_uniform',
activation='linear', kernel_constraint=maxnorm(3)),
Dropout(rate=0.5),
Dense(units=periods_to_predict, kernel_initializer='he_uniform', activation='softmax')])
#Compile model
optimizer = Adamax(lr=0.001, decay=0.1)
model.compile(loss='sparse_categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])
configure(gpu_ind=True)
model.fit(X_train, y_train ,validation_split=0.1, batch_size=100, epochs=8, shuffle=True)

看来你对网络的理解是正确的。因此,我重新创建了一个最小的工作示例,以与您相同的方式生成数据和训练。当我像您一样将时间步长 (periods_to_train) 设置为 953 时,我也会出现一些奇怪的错误。但多项研究表明,使用 LSTM 的时间步长依赖性不超过 200 到 500 个,因为模型输出将开始"忘记"早期信息。

下面是您尝试执行的操作的最小工作示例代码,仅使用 100 个时间步长。在我的情况下没有错误(张量流版本 1.14.0):

import tensorflow as tf
import tensorflow.keras.backend as K
import numpy as np

data_size = 637
periods_to_train=100
features = 175
periods_to_predict = 3
X_train=np.random.rand(data_size,periods_to_train,features)
y_train=np.random.randint(0,3,data_size).reshape(-1,1)
K.clear_session()
model = tf.keras.models.Sequential([
tf.keras.layers.LSTM(
units=1024, input_shape=(periods_to_train,features), kernel_initializer='he_uniform',
activation ='linear', kernel_constraint=tf.keras.constraints.max_norm(3.), return_sequences=False),
tf.keras.layers.Dropout(rate=0.5),
tf.keras.layers.Dense(
units=1024,kernel_initializer='he_uniform', 
activation='linear', kernel_constraint=tf.keras.constraints.max_norm(3)),
tf.keras.layers.Dropout(rate=0.5),
tf.keras.layers.Dense(
units=1024, kernel_initializer='he_uniform',
activation='linear', kernel_constraint=tf.keras.constraints.max_norm(3)),
tf.keras.layers.Dropout(rate=0.5),
tf.keras.layers.Dense(
units=periods_to_predict, kernel_initializer='he_uniform', 
activation='softmax')])

optimizer = tf.keras.optimizers.Adamax(lr=0.001, decay=0.1)
model.compile(loss='sparse_categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])
model.fit(X_train, y_train ,validation_split=0.1, batch_size=64, epochs=1, shuffle=True)

相关内容

  • 没有找到相关文章

最新更新