TFLite量化模型仍然输出浮点值

我已经有了一个CNN，但现在有必要将其放入一些特定的硬件中。为此，我被告知要量化模型，因为硬件只能使用整数运算。

我在这里读到了一个很好的解决方案：如何确保TFLite解释器只使用int8操作？

我写了这个代码使它工作：

model_file = "models/my_cnn.h5"
# load data
model = tf.keras.models.load_model(model_file, custom_objects={'tf': tf}, compile=False)
# convert
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.uint16 # or tf.uint8
converter.inference_output_type = tf.uint16  # or tf.uint8
qmodel = converter.convert()
with open('thales.tflite', 'wb') as f:
f.write(qmodel)
interpreter = tf.lite.Interpreter(model_content=qmodel)
interpreter.allocate_tensors()
# predict
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
print(input_details)
print(output_details)
image = read_image("test.png")
interpreter.set_tensor(input_details[0]['index'], image)
interpreter.invoke()
output_data = interpreter.get_tensor(output_details[0]['index'])
print(output_data)

当我们查看打印的输出时，我们可以看到，首先是细节：

input_details

[{'name': 'input_1', 'index': 87, 'shape': array([  1, 160, 160,   3], dtype=int32), 'shape_signature': array([  1, 160, 160,   3], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]

output_details

[{'name': 'Identity', 'index': 88, 'shape': array([  1, 160, 160,   1], dtype=int32), 'shape_signature': array([  1, 160, 160,   1], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]

量化模型的输出为：

...
[[0.        ]
[0.        ]
[0.        ]
...
[0.00390625]
[0.00390625]
[0.00390625]]
[[0.        ]
[0.        ]
[0.        ]
...
[0.00390625]
[0.00390625]
[0.00390625]]]]

所以，我这里有几个问题：

在输入/输出细节中，我们可以看到输入/输出层是int32，但我在代码uint16中指定
此外，在输入/输出细节中，我们可以看到；float32"；作为dtype，我不明白为什么。
最后，最大的问题是输出中包含浮点数，这是不应该发生的。所以看起来这个模型并没有真正转换成整数。

我如何才能真正量化我的CNN，以及为什么它不能使用此代码？

converter.inference_input_type和converter.inference_output_type仅支持tf.int8或tf.uint8，而不支持tf.uint16。

相关内容

最新更新

热门标签：