如何使用保存的文本分类模型对新的文本数据集进行预测

我在本指南下训练了一个文本分类器：https://developers.google.com/machine-learning/guides/text-classification/step-4

并将模型保存为

model.save('~./output/model.h5')

在这种情况下，我如何使用这个模型对另一个新数据集上的文本进行分类？

谢谢

import tensorflow as tf
# Recreate the exact same model, including its weights and the optimizer
new_model = tf.keras.models.load_model('~./output/model.h5')
# Show the model architecture
new_model.summary()
# Apply the same process of data preparation while training the model.
# Lets say after Data preprocessing you have stored the processed data in test_data
# check model accuracy from unseen/new dataset
loss, acc = new_model.evaluate(test_data,  test_labels, verbose=2)
print('Restored model, accuracy: {:5.2f}%'.format(100*acc))

您可以使用tensorflow的文本标记化实用程序类(Tokenizer(来处理测试数据中的未知单词

Num_words是词汇表大小(它选择最频繁的单词(
分配oov_token="某个字符串"，用于vocab大小之外的所有令牌/单词(基本上测试数据中的新单词将作为oov_togen字符串处理。
拟合列车数据，然后为列车和测试数据生成令牌序列。

tf.keras.preprocessing.text.令牌(num_words=无，filters='"#$%&((*+，-./：<>[\]^_`｛|｝~\t\t，lower=True，split="，char_level=False，oov_token=None，document_count=0，**kwargs)

相关内容

最新更新

热门标签：