我在本指南下训练了一个文本分类器:https://developers.google.com/machine-learning/guides/text-classification/step-4
并将模型保存为
model.save('~./output/model.h5')
在这种情况下,我如何使用这个模型对另一个新数据集上的文本进行分类?
谢谢
import tensorflow as tf
# Recreate the exact same model, including its weights and the optimizer
new_model = tf.keras.models.load_model('~./output/model.h5')
# Show the model architecture
new_model.summary()
# Apply the same process of data preparation while training the model.
# Lets say after Data preprocessing you have stored the processed data in test_data
# check model accuracy from unseen/new dataset
loss, acc = new_model.evaluate(test_data, test_labels, verbose=2)
print('Restored model, accuracy: {:5.2f}%'.format(100*acc))
您可以使用tensorflow的文本标记化实用程序类(Tokenizer(来处理测试数据中的未知单词
-
Num_words是词汇表大小(它选择最频繁的单词(
-
分配oov_token="某个字符串",用于vocab大小之外的所有令牌/单词(基本上测试数据中的新单词将作为oov_togen字符串处理。
-
拟合列车数据,然后为列车和测试数据生成令牌序列。
tf.keras.preprocessing.text.令牌(num_words=无,filters='"#$%&((*+,-./:<>[\]^_`{|}~\t\t,lower=True,split=",char_level=False,oov_token=None,document_count=0,**kwargs)