为什么一个更轻的Keras模型会以与更大的原始模型相同的推理速度运行



我用以下架构训练了一个Keras模型:

def make_model(input_shape, num_classes):
inputs = keras.Input(shape=input_shape)
# Image augmentation block
x = inputs
# Entry block
x = layers.experimental.preprocessing.Rescaling(1.0 / 255)(x)
x = layers.Conv2D(32, 3, strides=2, padding="same")(x)
x = layers.BatchNormalization()(x)
x = layers.Activation("relu")(x)
x = layers.Conv2D(64, 3, padding="same")(x)
x = layers.BatchNormalization()(x)
x = layers.Activation("relu")(x)
previous_block_activation = x  # Set aside residual
for size in [128, 256, 512, 728]:
x = layers.Activation("relu")(x) 
x = layers.SeparableConv2D(size, 3, padding="same")(x)
x = layers.BatchNormalization()(x)
x = layers.Activation("relu")(x) 
x = layers.SeparableConv2D(size, 3, padding="same")(x)
x = layers.BatchNormalization()(x)
x = layers.MaxPooling2D(3, strides=2, padding="same")(x)
# Project residual
residual = layers.Conv2D(size, 1, strides=2, padding="same")(
previous_block_activation
)
x = layers.add([x, residual])  # Add back residual
previous_block_activation = x  # Set aside next residual
x = layers.SeparableConv2D(1024, 3, padding="same")(x)
x = layers.BatchNormalization()(x)
x = layers.Activation("relu")(x)
x = layers.GlobalAveragePooling2D()(x)
if num_classes == 2:
activation = "sigmoid"
units = 1
else:
activation = "softmax"
units = num_classes
x = layers.Dropout(0.5)(x)
outputs = layers.Dense(units, activation=activation)(x)
return keras.Model(inputs, outputs)

该模型有超过200万个可训练参数。

然后,我用30万训练了一个更轻的模型。可训练参数:

def make_model(input_shape, num_classes):
inputs = keras.Input(shape=input_shape) 
# Image augmentation block
x = inputs
# Entry block
x = layers.experimental.preprocessing.Rescaling(1.0 / 255)(x)
x = layers.Conv2D(64, kernel_size=(7, 7), activation=tf.keras.layers.LeakyReLU(alpha=0.01), padding = "same", input_shape=image_size + (3,))(x)
x = layers.MaxPooling2D(pool_size=(2, 2))(x)
x = layers.Conv2D(192, kernel_size=(3, 3), activation=tf.keras.layers.LeakyReLU(alpha=0.01), padding = "same", input_shape=image_size + (3,))(x)
x = layers.Conv2D(128, kernel_size=(1, 1), activation=tf.keras.layers.LeakyReLU(alpha=0.01), padding = "same", input_shape=image_size + (3,))(x)
x = layers.MaxPooling2D(pool_size=(2, 2))(x)
x = layers.Conv2D(128, kernel_size=(3, 3), activation=tf.keras.layers.LeakyReLU(alpha=0.01), padding = "same", input_shape=image_size + (3,))(x)
x = layers.MaxPooling2D(pool_size=(2, 2))(x)
x = layers.Dropout(0.5)(x)
x = layers.GlobalAveragePooling2D()(x)
if num_classes == 2:
activation = "sigmoid"
units = 1
else:
activation = "softmax"
units = num_classes
x = layers.Dropout(0.5)(x)
outputs = layers.Dense(units, activation=activation)(x)

return keras.Model(inputs, outputs)

然而,最后一个模型(它要轻得多,甚至可以接受较小的输入大小(似乎以相同的速度运行,仅以每秒2张图像进行分类。因为它是一个较小的型号,所以速度不应该有区别吗?看看代码,有没有明显的原因导致情况并非如此?

在这两种情况下,我都使用相同的推理方法:

image_size = (180, 180)
batch_size = 32

model = keras.models.load_model('model_13.h5')
t_end = time.time() + 10
iters = 0
while time.time() < t_end:
img = keras.preprocessing.image.load_img(
"test2.jpg", target_size=image_size
)

img_array = image.img_to_array(img)
#print(img_array.shape)
img_array = tf.expand_dims(img_array, 0)  # Create batch axis

predictions = model.predict(img_array)
score = predictions[0]
print(score)
iters += 1
if score < 0.5:
print('Fire')
else:
print('No Fire')

print('TOTAL: ', iters)

参数的数量最多表示模型训练或运行推理的速度。这可能取决于许多其他因素。

以下是一些可能影响模型吞吐量的示例:

  1. 激活功能:ReLu的激活速度比具有经验项的ELU或GELU更快。计算指数不仅比线性数慢,而且梯度的计算也要复杂得多,因为在Relu是常数的情况下,激活的斜率(例如1(
  2. 用于数据的位精度。一些硬件加速器可以在float16中比在float32中更快地进行计算,并且读取更少的比特可以减少延迟
  3. 某些图层可能没有参数,但执行固定计算。即使没有向网络的权重添加任何参数,仍会执行计算
  4. 你训练硬件的架构。可以比其他过滤器更有效地计算某些过滤器大小和批次大小
  5. 有时计算硬件的速度并不是瓶颈,而是加载和预处理数据的输入管道

如果不进行测试,很难判断,但在您的特定示例中,我想,以下可能会减慢您的推理速度:

  1. 具有7x7 conv的大感知场
  2. leaky_relu比relu稍慢
  3. 可能您的数据输入管道是瓶颈,而不是推理速度。如果推理速度比数据准备快得多,那么两个模型的速度可能相同。但实际上HW是空闲的并且等待数据

要了解发生了什么,您可以更改一些参数并评估速度,也可以通过使用tensorboard跟踪硬件来分析输入管道。这里有一个小指南:https://www.tensorflow.org/tensorboard/tensorboard_profiling_keras

最佳,Sascha

相关内容

  • 没有找到相关文章

最新更新