是否可以通过TFLite Model Maker创建多类文本分类器Tensorflow Lite模型?



我尝试构建一个Android应用程序,使用Tensorflow Lite Model Maker提供的AverageWordVecModelSpec来预测文本分类。

我正在使用图书内容来测试我的应用是否正常工作。我为这个实验提供了3本书。代码如下:

!pip install git+https://github.com/tensorflow/examples.git#egg=tensorflow-examples[model_maker]
import numpy as np
import os
import tensorflow as tf
assert tf.__version__.startswith('2')
from tensorflow_examples.lite.model_maker.core.data_util.text_dataloader import TextClassifierDataLoader
from tensorflow_examples.lite.model_maker.core.task.model_spec import AverageWordVecModelSpec
from tensorflow_examples.lite.model_maker.core.task import text_classifier
data_path = '/content/drive/My Drive/datasetps'
model_spec = AverageWordVecModelSpec()
train_data = TextClassifierDataLoader.from_folder(os.path.join(data_path, 'train'), model_spec=model_spec, class_labels=['categorya', 'categoryb'])
test_data = TextClassifierDataLoader.from_folder(os.path.join(data_path, 'test'), model_spec=model_spec, is_training=False, shuffle=False)
model = text_classifier.create(train_data, model_spec=model_spec)
loss, acc = model.evaluate(test_data)
model.export(export_dir='.')

当我只使用 2 个类/书籍时,它有效(与 tensorflow 团队提供的示例相同(: 即使它有很小的卷曲,它也能正常工作——因为我实际上只将每本书 20 个示例页面作为数据集

你可以看到我这里有合理的损失值, 但是当我尝试添加第 3 类时我遇到了问题:

train_data = TextClassifierDataLoader.from_folder(os.path.join(data_path, 'train'), model_spec=model_spec, class_labels=['categorya', 'categoryb', 'categoryc'])
test_data = TextClassifierDataLoader.from_folder(os.path.join(data_path, 'test'), model_spec=model_spec, is_training=False, shuffle=False)

以下是涉及三等舱的训练结果: 在此处输入图像描述

您可以看到,损失值大于 1 是不合理的。 我试图找到我应该更改哪一行代码(来自 Tensorflow Model Maker(来解决它,并最终在这个论坛上提出了这个问题。

那么是否有可能使用文本分类器的多类模型 AverageWordVecModelSpec TFlite model maker?

这是可能的。我建议先对标签进行编码,然后按照工作流程进行操作:

from tflite_model_maker import model_spec
from tflite_model_maker import text_classifier
from tflite_model_maker import TextClassifierDataLoader
from tflite_model_maker import ExportFormat
from sklearn.model_selection import train_test_split
import pandas as pd 
df = pd.read_excel('data_set.xls')
col = ['sentence', 'your_label']
df = df[col]
# Encoding happens here
df.your_label = pd.Categorical(df.your_label)
df['label'] = df.book_label.cat.codes

train, test = train_test_split(df, test_size=0.2)
train.to_csv('train.csv', index=False)
test.to_csv('test.csv', index=False)

spec = model_spec.get('average_word_vec')
train_data = TextClassifierDataLoader.from_csv(
filename='train.csv',
text_column='sentence',
label_column='label',
model_spec=spec,
delimiter=',',
is_training=True)
test_data = TextClassifierDataLoader.from_csv(
filename='test.csv',
text_column='sentence',
label_column='label',
model_spec=spec,
delimiter=',',
is_training=False)
model = text_classifier.create(train_data, model_spec=spec, batch_size=5, epochs=4)
config = configs.QuantizationConfig.create_dynamic_range_quantization(optimizations=[tf.lite.Optimize.OPTIMIZE_FOR_LATENCY])
model.export(export_dir='average_word_vec/', export_format=[ExportFormat.LABEL, ExportFormat.VOCAB])

相关内容

最新更新