输入层与Tensorflow 2D CNN不兼容



我正在尝试使用频谱图作为输入来训练一个用于语音情感识别任务的CNN模型。我已经重塑了频谱图,使其具有我认为足够的形状(num_frequency_bins, num_time_frames, 1),但是在试图将模型拟合到存储在Tensorflow数据集中的数据集时,我得到了以下错误:

Input 0 of layer "sequential_12" is incompatible with the layer: expected shape=(None, 257, 1001, 1), found shape=(257, 1001, 1)

我尝试重塑光谱图,使其形状为(1, num_frequency_bins, num_time_frames, 1),但在创建顺序模型时产生了错误:

ValueError: Exception encountered when calling layer "resizing_14" (type Resizing).
'images' must have either 3 or 4 dimensions.
Call arguments received:
• inputs=tf.Tensor(shape=(None, 1, 257, 1001, 1), dtype=float32)

因此,在创建模型时,我将形状传递为(num_frequency_bins, num_time_frames, 1),然后使用4维数据将模型拟合到训练数据中,但这会产生此错误:

InvalidArgumentError: slice index 0 of dimension 0 out of bounds. [Op:StridedSlice] name: strided_slice/

所以我现在有点不知所措。我真的不知道该做什么,也不知道该如何去解决这个问题。我四处阅读,但没有发现任何有用的东西。非常感谢您的帮助。

下面是一些context的代码。

dataset = [[specgram_files[i], labels[i]] for i in range(len(specgram_files))]
specgram_files_and_labels_dataset = tf.data.Dataset.from_tensor_slices((specgram_files, labels))
def read_npy_file(data):
# 'data' stores the file name of the numpy binary file storing the features of a particular sound file
# item() returns numpy array of size 1 as a suitable python scalar.
# data.item() then returns the bytes string stored in the numpy array.
# decode() is then called on the bytes string to decode it from a bytes string to a regular string
# so that it can be passed as a parameter in np.load()
data = np.load(data.item().decode())
# Shape of data is now (1, rows, columns)
# Needs to be reshaped to (rows, columns, 1):
data = np.reshape(data, (data.shape[0], data.shape[1], 1))
return data.astype(np.float32)
specgram_dataset = specgram_files_and_labels_dataset.map(
lambda file, label: tuple([tf.numpy_function(read_npy_file, [file], [tf.float32]), label]),
num_parallel_calls=tf.data.AUTOTUNE)
num_files = len(train_df)
num_train = int(0.8 * num_files)
num_val = int(0.1 * num_files)
num_test = int(0.1 * num_files)
specgram_dataset.shuffle(buffer_size=1000)
specgram_train_ds = specgram_dataset.take(num_train)
specgram_test_ds = specgram_dataset.skip(num_train)
specgram_val_ds = specgram_test_ds.take(num_val)
specgram_test_ds = specgram_test_ds.skip(num_val)
batch_size = 32
specgram_train_ds.batch(batch_size)
specgram_val_ds.batch(batch_size)
specgram_train_ds = specgram_train_ds.cache().prefetch(tf.data.AUTOTUNE)
specgram_val_ds = specgram_val_ds.cache().prefetch(tf.data.AUTOTUNE)
for specgram, label in specgram_train_ds.take(1):
input_shape = specgram.shape
num_emotions = len(train_df["emotion"].unique())
model = models.Sequential([
layers.Input(shape=input_shape),
# downsampling the input. 
layers.Resizing(32, 128),
layers.Conv2D(32, 3, activation="relu"),
layers.MaxPooling2D(),
layers.Conv2D(64, 3, activation="relu"),
layers.MaxPooling2D(),
layers.Flatten(),
layers.Dense(128, activation="softmax"),
layers.Dense(num_emotions)
])
model.compile(
optimizer=tf.keras.optimizers.Adam(0.01),
loss=tf.keras.losses.SparseCategoricalCrossentropy(),
metrics=["accuracy"]
)
EPOCHS = 10
model.fit(
specgram_train_ds,
validation_data=specgram_val_ds,
epochs=EPOCHS,
callbacks=tf.keras.callbacks.EarlyStopping(verbose=1, patience=2)
)

假设您知道您的input_shape,我建议首先将其硬编码到您的模型中:

model = models.Sequential([
layers.Input(shape=(257, 1001, 1),
# downsampling the input. 
layers.Resizing(32, 128),
layers.Conv2D(32, 3, activation="relu"),
layers.MaxPooling2D(),
layers.Conv2D(64, 3, activation="relu"),
layers.MaxPooling2D(),
layers.Flatten(),
layers.Dense(128, activation="softmax"),
layers.Dense(num_emotions)
])

同样,当使用tf.data.Dataset.batch时,您应该将Dataset输出分配给一个变量:

batch_size = 32
specgram_train_ds = specgram_train_ds.batch(batch_size)
specgram_val_ds = specgram_val_ds.batch(batch_size)

之后,确保specgram_train_ds确实有正确的形状:

specgrams, _ = next(iter(specgram_train_ds.take(1)))
assert specgrams.shape == (32, 257, 1001, 1)

最新更新