仅使用预训练网络中的某些层进行迁移学习

我正在构建一个将人脸分割为皮肤和非皮肤区域的模型。作为一个模型，我使用这里显示的模型/方法作为起点，并在末尾添加一个具有S形激活的致密层。该模型非常适合我的目的，给出了很好的骰子度量分数。该模型使用来自Resnet50的2个预训练层作为用于特征检测的模型主干。我已经阅读了几篇文章、书籍和代码，但找不到任何关于如何确定选择哪一层进行特征提取的信息。我将Resnet50架构与Xception进行了比较，选择了两个类似的层，替换了原始网络中的层(此处(，并运行了培训。我得到了类似的结果，不是更好也不是更糟。我有以下问题

如何确定哪一层负责低级/高级功能
就训练时间和可训练参数的数量而言，仅使用预训练层是否比使用完全预训练网络更好
我在哪里可以找到有关仅使用预训练网络中的层的更多信息

这是快速浏览的代码

def DeeplabV3Plus(image_size, num_classes):
model_input = keras.Input(shape=(image_size, image_size, 3))
resnet50 = keras.applications.ResNet50(
weights="imagenet", include_top=False, input_tensor=model_input)
x = resnet50.get_layer("conv4_block6_2_relu").output
x = DilatedSpatialPyramidPooling(x)
input_a = layers.UpSampling2D(size=(image_size // 4 // x.shape[1], image_size // 4 // x.shape[2]), interpolation="bilinear")(x)
input_b = resnet50.get_layer("conv2_block3_2_relu").output
input_b = convolution_block(input_b, num_filters=48, kernel_size=1)
x = layers.Concatenate(axis=-1)([input_a, input_b])
x = convolution_block(x)
x = convolution_block(x)
x = layers.UpSampling2D(size=(image_size // x.shape[1], image_size // x.shape[2]), interpolation="bilinear")(x)
model_output = layers.Conv2D(num_classes, kernel_size=(1, 1), padding="same")(x)
return keras.Model(inputs=model_input, outputs=model_output)

这是我使用Xception层作为主干的修改代码

def DeeplabV3Plus(image_size, num_classes):
model_input = keras.Input(shape=(image_size, image_size, 3))

Xception_model = keras.applications.Xception(
weights="imagenet", include_top=False, input_tensor=model_input)
xception_x1 = Xception_model.get_layer("block9_sepconv3_act").output
x = DilatedSpatialPyramidPooling(xception_x1)
input_a = layers.UpSampling2D(size=(image_size // 4 // x.shape[1], image_size // 4 // x.shape[2]), interpolation="bilinear")(x)
input_a = layers.AveragePooling2D(pool_size=(2, 2))(input_a)
xception_x2 = Xception_model.get_layer("block4_sepconv1_act").output
input_b = convolution_block(xception_x2, num_filters=256, kernel_size=1)
x = layers.Concatenate(axis=-1)([input_a, input_b])
x = convolution_block(x)
x = convolution_block(x)
x = layers.UpSampling2D(size=(image_size // x.shape[1], image_size // x.shape[2]),interpolation="bilinear")(x)
x = layers.Conv2D(num_classes, kernel_size=(1, 1), padding="same")(x)
model_output = layers.Dense(x.shape[2], activation='sigmoid')(x)
return keras.Model(inputs=model_input, outputs=model_output)

提前感谢！

通常，第一层(离输入更近的层(负责学习高级特征，而最后一层更特定于数据集/任务。这就是为什么在迁移学习时，你通常只想删除最后几层，用其他可以处理你特定问题的层来代替它们的原因
这取决于情况。在不删除或添加任何新层的情况下传输整个网络，基本上意味着网络不会学到任何新东西(除非你没有冻结层——在这种情况下，你正在进行微调(。另一方面，如果删除一些层并添加更多层，则可训练参数的数量仅取决于刚刚添加的新层

我建议你做的是：

从预先训练的网络中删除几个层，冻结这些层，然后再添加几个层(甚至只添加一个(
以一定的学习率训练新网络(通常这种学习率不是很低(
微调！：解冻所有层，降低学习率，重新训练整个网络

相关内容

最新更新

热门标签：