我正在尝试训练一个模型(通过在自动编码器的编码器网络中合并vgg16),但解码器网络的输入形状要求是(7,7,512)。虽然我的数据是灰度vgg16需要3个颜色通道,所以我已经复制了数据数组三次这样做,这不是问题。问题是在这里,我试图重塑数组,这是不发生的,给我错误。代码:train_X和train_Y是包含大小为5k的训练数据集的列表,每个训练数据集的dim =224,224,都是灰度的。之后我做了->
train_X=np.array(X_train)
train_Y=np.array(Y_train)
train_X=train_X/255.0
train_Y=train_Y/255.0
print(train_Y.shape)
train_Y = np.repeat(train_Y[..., np.newaxis], 3, -1)
print(train_Y.shape)
#same for train_X
print(train_Y.shape)
print(train_X.shape)
output->(5000,224,224,3) &(5000, 224, 224, 3)
trainx = train_X.reshape((7,7,512))
error: ValueError: cannot shaping array size 50176 into shape (7,7,512)
我要训练的网络:
#encoder
encoder_input = Input(shape=(7,7,512,))
#Decoder
decoder_output = Conv2D(256, (3,3), activation='relu', padding='same')(encoder_input)
decoder_output = Conv2D(128, (3,3), activation='relu', padding='same')(decoder_output)
decoder_output = UpSampling2D((2, 2))(decoder_output)
decoder_output = Conv2D(64, (3,3), activation='relu', padding='same')(decoder_output)
decoder_output = UpSampling2D((2, 2))(decoder_output)
decoder_output = Conv2D(32, (3,3), activation='relu', padding='same')(decoder_output)
decoder_output = UpSampling2D((2, 2))(decoder_output)
decoder_output = Conv2D(16, (3,3), activation='relu', padding='same')(decoder_output)
decoder_output = UpSampling2D((2, 2))(decoder_output)
decoder_output = Conv2D(2, (3, 3), activation='tanh', padding='same')(decoder_output)
decoder_output = UpSampling2D((2, 2))(decoder_output)
model = Model(inputs=encoder_input, outputs=decoder_output)
编码器vgg16。模型简介:
Metal device set to: Apple M1 Pro
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
block1_conv1 (Conv2D) (None, 224, 224, 64) 1792
block1_conv2 (Conv2D) (None, 224, 224, 64) 36928
block1_pool (MaxPooling2D) (None, 112, 112, 64) 0
block2_conv1 (Conv2D) (None, 112, 112, 128) 73856
block2_conv2 (Conv2D) (None, 112, 112, 128) 147584
block2_pool (MaxPooling2D) (None, 56, 56, 128) 0
block3_conv1 (Conv2D) (None, 56, 56, 256) 295168
block3_conv2 (Conv2D) (None, 56, 56, 256) 590080
block3_conv3 (Conv2D) (None, 56, 56, 256) 590080
block3_pool (MaxPooling2D) (None, 28, 28, 256) 0
block4_conv1 (Conv2D) (None, 28, 28, 512) 1180160
block4_conv2 (Conv2D) (None, 28, 28, 512) 2359808
block4_conv3 (Conv2D) (None, 28, 28, 512) 2359808
block4_pool (MaxPooling2D) (None, 14, 14, 512) 0
block5_conv1 (Conv2D) (None, 14, 14, 512) 2359808
block5_conv2 (Conv2D) (None, 14, 14, 512) 2359808
block5_conv3 (Conv2D) (None, 14, 14, 512) 2359808
block5_pool (MaxPooling2D) (None, 7, 7, 512) 0
=================================================================
Total params: 14,714,688
Trainable params: 14,714,688
Non-trainable params: 0
_________________________________________________________________
我试了好几招,但都没能解决这个问题。
您正在尝试使用VGG_16
pretrained model
作为encoder network
。如果我理解正确,你正在使用编码器和解码器去噪图像。编码器是预训练的VGG_16
模型,解码器是一个模型(给它们不同的名称),您正在使用train_y
作为去噪图像的基础真值进行训练。
将图像输入编码器网络
trainx_encoded = model.predict(train_x)
检查model_decoder
的output-shape
,并确保它匹配train_y
的形状。
for layer in model_decoder.layers:
print(layer.output_shape)
我自己运行这个告诉我,输出层的形状为(224,224,2)。您有两个选项:
- 通过将最后一个转换层更新为具有3个通道,将解码器网络的输出形状更改为(224,224,3)。
decoder_output = Conv2D(3, (3, 3), activation='tanh', padding='same')(decoder_output)
- 将train_y数据保留为一个通道的灰度,并更新上面的图层为一个通道。
decoder_output = Conv2D(1, (3, 3), activation='tanh', padding='same')(decoder_output)
然后使用编码后的数据作为第二个模型的训练数据。
model_decoder.fit(trainx_encoded , train_y)