如何在TensorFlow中使用符号、数字和字母

我正在研究我的第一个Tensorflow模型，当我训练数据集时，我的准确率从使用sci试剂盒时的60%左右下降到了25%。一位朋友告诉我，这可能与一些数据有关，例如；781C376B-E380-C052-448B-B4AB6F3D"；。运行模型时，如何处理数据中的符号(此处为破折号(、数字和字母？

目前我正在研究文本矢量化，这样它可以更容易地读取我的数据。

您可以使用tf.strings.unicode_decode((将编码的字符串标量转换为代码点向量。它为字符串中的每个字符提供唯一的编号。

例如：

# A batch of Unicode strings, each represented as a UTF8-encoded string.
batch_utf8 = [s.encode('UTF-8') for s in
[u'781C376B-E380-C052-448B-B4AB6F3D']]
batch_chars_ragged = tf.strings.unicode_decode(batch_utf8,
input_encoding='UTF-8')
for sentence_chars in batch_chars_ragged.to_list():
print(sentence_chars)
output:[55, 56, 49, 67, 51, 55, 54, 66, 45, 69, 51, 56, 48, 45, 67, 48, 53, 50, 45, 52, 52, 56, 66, 45, 66, 52, 65, 66, 54, 70, 51, 68]

有关详细信息，请参阅本文档。非常感谢。

相关内容

最新更新

热门标签：