如何在tf.Dataset上调整TextVectorization层

我像这样加载我的数据集:

self.train_ds = tf.data.experimental.make_csv_dataset(
self.config["input_paths"]["data"]["train"],
batch_size=self.params["batch_size"],
shuffle=False,
label_name="tags",
num_epochs=1,
)

我的TextVectorization图层看起来像这样:

vectorizer = tf.keras.layers.TextVectorization(
standardize=code_standaridization,
split="whitespace",
output_mode="int",
output_sequence_length=params["input_dim"],
max_tokens=100_000,
)

我想这就足够了:

vectorizer.adapt(data_provider.train_ds)

但它不是，我有这个错误:

TypeError: Expected string, but got Tensor("IteratorGetNext:0", shape=(None, None), dtype=string) of type 'Tensor'.

我能以某种方式适应我的矢量器对TensorFlow数据集吗?

最可能的问题是，当您尝试适应时，您在train_ds中使用batch_size而没有.unbatch()。

你必须这样做:

vectorizer.adapt(train_ds.unbatch().map(lambda x, y: x).batch(BATCH_SIZE))

.unbatch()解决了您当前看到的错误，.map()是需要的，因为TextVectorization层对批量字符串进行操作，因此您需要从数据集中获取它们

相关内容

最新更新

热门标签：