我试图将我的模型升级到tensorflow 2.4,但升级后网络的精度较低。我注意到单个批次的损失函数是不同的,即使:
- 我在两个版本中使用相同路径的
model = keras.models.load_model('path/to/model.h5')
(此文件是使用tf 1.12创建的) - 我检查权重是否匹配
- 我检查使用的批处理是相同的 我在专有数据集和
keras.datasets.mnist
上复制了这个问题。我期望,如果我设法在两个版本上实现相同的损失,我也将在训练后实现相同的准确性。
要求tf 1.12版本
# python version == 3.6
tensorflow_gpu==1.12
keras==2.2.4
h5py==2.10.0
opencv-python==4.2.0.34
要求tf 2.4.1
# python version == 3.8
tensorflow==2.4.1
h5py==2.10.0
opencv-python==4.5.3.56
模型定义(这在两个版本中是相同的):
def mobile_net(no_classes):
base = MobileNetV2(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
for layer in base.layers:
layer.trainable = False
x = GlobalAveragePooling2D()(base.output)
x = Dense(32, activation='relu')(x)
x = Dense(128, activation='relu')(x)
y = GlobalMaxPooling2D()(base.output)
y = Dense(32, activation='relu')(y)
y = Dense(128, activation='relu')(y)
conc = Add()([x, y])
conc = Dense(32, activation='relu')(conc)
prediction = Dense(no_classes, activation='softmax')(conc)
model = Model(inputs=base.input, outputs=prediction)
optimizer = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)
model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
return model
训练方法(两个版本几乎相同):
keras.backend.set_image_dim_ordering('tf') # only in tf 1.12
# load data
x_train, y_train = ...
x_train, y_train = x_train[:4], y_train[:4] # select just one batch for testing purposes
model = keras.models.load_model('path/to/model.h5') # in tf 1.12
model = tensorflow.keras.models.load_model('path/to/model.h5') # in tf 2.4
print(f'check that the values are the same: {x_train.sum() + y_train.argmax(axis=1).sum()}')
weights = model.get_weights()
print(f'check that weights are the same: {[weight.sum() for weight in weights]}')
model.fit(x_train, y_train, batch_size=4, verbose=2)
tf 1.12输出:
检查值是否相同:18266047
检查权重是否相同:[-4.311309,37.386337,26.299068,…], -10.376889, 0.0, -13.127711, 0.0, 4.9316425, 0.0) 1/1
时代
- 18s - loss: 2.6805 - acc: 0.2500
tf 2.4输出:
检查值是否相同:18266047
检查权重是否相同:[-4.311309,37.386337,26.299068,…]
1/1 - 6s -损耗:2.8985 -精度:0.2500
损失的差异从何而来?
这种差异来自于MobileNet包含BatchNormalization层的事实。它们的行为在Tensorflow 2.x中改变了。你可以在这里阅读更多。要重新创建Tensorflow 1。我在模型创建代码中添加了以下片段:
def mobile_net(no_classes):
base = MobileNetV2(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
### changed in Tensorflow 2.4.1
for layer in base.layers:
if layer.__class__ == BatchNormalization:
layer.trainable = True
else:
layer.trainable = False
### end of change
x = GlobalAveragePooling2D()(base.output)
x = Dense(32, activation='relu')(x)
x = Dense(128, activation='relu')(x)
y = GlobalMaxPooling2D()(base.output)
y = Dense(32, activation='relu')(y)
y = Dense(128, activation='relu')(y)
conc = Add()([x, y])
conc = Dense(32, activation='relu')(conc)
prediction = Dense(no_classes, activation='softmax')(conc)
model = Model(inputs=base.input, outputs=prediction)
optimizer = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)
model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
return model
模型现在在tf 1.12和2.4.1上返回相同的损失。