为什么基于Dense layers的模型比基于Conv2D的模型给出更好的结果?



在Tensorflow中,基于Dense层的模型训练结果优于基于等效Conv2D层的模型。

结果:

  1. Using Dense: loss: 16.1930 - mae: 2.5369 - mse: 16.1930
  2. Using Conv2D: loss: 83.7851 - mae: 6.5585 - mse: 83.7851

这应该是预期的还是我们做错了什么?

我们使用的代码如下(改编自这里):

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
import pandas as pd
import sys
model_type = int(sys.argv[1]) # 0: Dense, Else: Conv2D
verbose = 0
# load data & normalize
(train_features, train_labels), (test_features, test_labels) = keras.datasets.boston_housing.load_data()
train_mean = np.mean(train_features, axis=0)
train_std = np.std(train_features, axis=0)
train_features_norm = (train_features - train_mean) / train_std
test_features_norm = (test_features - train_mean) / train_std
train_labels_norm = train_labels
test_labels_norm = test_labels
input_height = train_features_norm.shape[1]
# model
if model_type == 0:
model = keras.Sequential([
layers.InputLayer(input_shape=(input_height)),
layers.Dense(20, activation='relu'),
layers.Dense(1)])
else:
train_features_norm = np.reshape(train_features_norm, (-1, input_height, 1))
test_features_norm = np.reshape(test_features_norm, (-1, input_height, 1))

model = keras.Sequential([
layers.InputLayer(input_shape=(input_height, 1, 1)),
layers.Conv2D(20, (input_height, 1), activation='relu'),
layers.Conv2D(1, (1, 1))]) # replacing this layer with Dense(1) gives the same results

model.compile(
optimizer=tf.optimizers.Adam(),
loss='mse',
metrics=['mae', 'mse'])
model.summary()
# training
early_stop = keras.callbacks.EarlyStopping(
monitor='val_loss',
patience=50)
history = model.fit(
train_features_norm,
train_labels_norm,
epochs=1000,
verbose=verbose,
validation_split=0.1)
# results
hist = pd.DataFrame(history.history)
hist['epoch'] = history.epoch
print(hist)
rmse_final = np.sqrt(float(hist['val_mse'].tail(1)))
print('Final Root Mean Square Error on validation set: {}'.format(round(rmse_final, 3)))
# compare how the model perfoms on the test dataset
mse, _, _ = model.evaluate(test_features_norm, test_labels_norm)
rmse = np.sqrt(mse)
print('Root Mean Square Error on test set: {}'.format(round(rmse, 3)))

注:model_type可用于选择基于Dense layers(= 0)的模型,也可用于选择基于Conv2D(任何其他值)的模型。


的背景我们有一个不支持密集层的系统(BeagleBone AI使用TIDL)。但是,它确实支持Conv2D层,并且据我们所知,Conv2D可以配置为等同于Dense层。

例如,在具有两个单元/输出、无偏置和两个输入的Dense层中,输出为:

  • 1 = W11 * I1 + W12 * I2
  • o2 = w21 * i1 + w22 * i2

0 -输出,1 -输入,W -权重

同样,在具有两个1x1输出通道,无偏置,一个1x2输入通道和一个1x2核的Conv2D层中,输出为:

  • 1 = K11 * I11 + K12 * I12
  • o2 = k21 * i11 + k22 * i12

0 -输出通道,1 -输入通道,K -核权值

这意味着它们在数学上是等价的。但是当使用Dense层时,训练效果更好。

我明白了!您必须重塑输出张量,使其只有两个维度(batch_size, 1)
我得到这个测试数据评估:loss: 17.9552 - mae: 2.7125 - mse: 17.9552
它略高于您使用密集层的结果,但至少看起来是可比较的。这是我的模型:

filters = 20
model = keras.Sequential([
layers.InputLayer(input_shape=(input_height,)),
# first Conv layer
layers.Reshape((input_height, 1, 1)),
layers.Conv2D(filters, (input_height, 1), data_format='channels_last', padding='valid'),
layers.Activation('relu'),
# second conv layer
layers.Reshape((filters, 1, 1)),
layers.Conv2D(1, (filters, 1)),
# reshape the final result !!!
layers.Reshape((1,)), 
])

有两个问题:

  1. 特征的形状(None, input_height, 1)与模型输入的形状(None, input_height, 1,1)不匹配。
  2. 标签的形状(None, 1)与模型输出的形状(None, 1,1,1)不匹配。

这些都对模型的性能有影响。两者都需要达到基于Dense层的模型的性能水平。

修复(为特征添加额外的模糊,重塑标签):

if model_type == 0:
...
else:
train_features_norm = np.reshape(train_features_norm, (-1, input_height, 1, 1))
test_features_norm = np.reshape(test_features_norm, (-1, input_height, 1, 1))
train_labels_norm = np.reshape(train_labels_norm, (-1, 1, 1, 1))
test_labels_norm = np.reshape(test_labels_norm, (-1, 1, 1, 1))

...

这应该是预期的还是我们做错了什么?

不,这是意料之外的。我不确定原始代码是否可以认为是错误的。我的期望是(因为它没有像往常那样抱怨形状不匹配),因为"缺失"。尺寸是1,这并不重要。好吧,他们有。

谢谢@elbe。你的回答是我意识到上述问题的关键。

相关内容

  • 没有找到相关文章

最新更新