在model.fit()中添加单个张量



我有一个由张量组成的数据集。样本张量如下所示:

(<tf.Tensor: shape=(1,), dtype=string, numpy=
array([b"Some text"],
dtype=object)>, <tf.Tensor: shape=(), dtype=int64, numpy=0>)

我不想把整个数据集作为输入,而是想迭代地获得张量并将它们输入到模型中。

我试过了,但我得到了

IndexError: list index out of range
for element in dataset:
model.fit(x=element)

实现所需输出的最佳方式是什么?

提前谢谢!

你可以在这里找到我的型号:

import pandas as pd
import tensorflow as tf
df = pd.read_csv('labeled_tweets_processed.csv')
labels = df.pop('class')
dataset = tf.data.Dataset.from_tensor_slices((df, labels))
VOCAB_SIZE = 1000
encoder = tf.keras.layers.TextVectorization(
max_tokens=VOCAB_SIZE)
encoder.adapt(dataset.map(lambda text, label: text))
BUFFER_SIZE = 2
BATCH_SIZE = 1
train_dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)
model = tf.keras.Sequential([
encoder,
tf.keras.layers.Embedding(
input_dim=len(encoder.get_vocabulary()),
output_dim=64,
# Use masking to handle the variable sequence lengths
mask_zero=True),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(1)
])
model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
optimizer=tf.keras.optimizers.Adam(1e-4),
metrics=['accuracy'])

还有我的一些数据集:

(<tf.Tensor: shape=(1,), dtype=string, numpy=
array([b'text1'],
dtype=object)>, <tf.Tensor: shape=(), dtype=int64, numpy=1>)
(<tf.Tensor: shape=(1,), dtype=string, numpy=
array([b"text2"],
dtype=object)>, <tf.Tensor: shape=(), dtype=int64, numpy=0>)
(<tf.Tensor: shape=(1,), dtype=string, numpy=
array([b"text3"],
dtype=object)>, <tf.Tensor: shape=(), dtype=int64, numpy=0>)

不太确定为什么要在循环中调用model.fit,但可以尝试以下操作:

import pandas as pd
import tensorflow as tf
df = pd.DataFrame(data = {'texts': ['Some text', 'Some text', 'Some text', 'Some text', 'Some text'],
'class': [0, 0, 1, 1, 1]})
labels = df.pop('class')
dataset = tf.data.Dataset.from_tensor_slices((df, labels))
VOCAB_SIZE = 1000
encoder = tf.keras.layers.TextVectorization(
max_tokens=VOCAB_SIZE)
encoder.adapt(dataset.map(lambda text, label: text))
BUFFER_SIZE = 2
BATCH_SIZE = 1
train_dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)
model = tf.keras.Sequential([
encoder,
tf.keras.layers.Embedding(
input_dim=len(encoder.get_vocabulary()),
output_dim=64,
# Use masking to handle the variable sequence lengths
mask_zero=True),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(1)
])
model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
optimizer=tf.keras.optimizers.Adam(1e-4),
metrics=['accuracy'])
for x, y in train_dataset:
model.fit(x, y, epochs=2)

最新更新