使用 Tensorflow 的 fit() 和 evaluate() 内置方法模拟流学习



我试图实现的是使用Tensorflow的fit()evaluate()方法模拟流式学习方法。

在得到社区的一些帮助后,我现在有一个这样的脚本:

import pandas as pd
import tensorflow as tf
df = pd.read_csv('labeled_tweets_processed.csv')
labels = df.pop('class')
dataset = tf.data.Dataset.from_tensor_slices((df, labels))
VOCAB_SIZE = 1000
encoder = tf.keras.layers.TextVectorization(
max_tokens=VOCAB_SIZE)
encoder.adapt(dataset.map(lambda text, label: text))
BUFFER_SIZE = 2
BATCH_SIZE = 1
train_dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)
model = tf.keras.Sequential([
encoder,
tf.keras.layers.Embedding(
input_dim=len(encoder.get_vocabulary()),
output_dim=64,
# Use masking to handle the variable sequence lengths
mask_zero=True),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(1)
])
model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
optimizer=tf.keras.optimizers.Adam(1e-4),
metrics=['accuracy'])

使用以下命令设置模型并训练模型:

history = model.fit(train_dataset, epochs=1)

我实际上想做的是模拟一个流式处理环境,在这个环境中,我有一个类似Predict -> Fit的管道进入模型。

我认为可以通过使用以下方法来完成:

for x, y in enumerate(train_dataset):
test_loss, test_acc = model.evaluate([x, y])
model.fit(y)

但它似乎不能像这样正常工作。

模拟所描述的环境的正确方法是什么?

迭代数据集的每个条目并输入所需方法的最佳方式是什么?

提前非常感谢!

更新1:

我现在拥有的,但导致模型精度非常低。不确定指标是否以正确的方式更新。

for idx, (x, y) in enumerate(train_dataset):
pred = model.predict_on_batch(x)
print(model.test_on_batch(x, pred, reset_metrics=False, return_dict=True))
model.train_on_batch(x, y, reset_metrics=False)
print(f"After {idx} entries")

您可以尝试以下操作:

for idx, (x, y) in enumerate(train_dataset):
test_loss, test_acc = model.evaluate(x, y)
model.fit(x, y, epochs=1)

更新1:也许可以尝试使用自定义训练循环:

import pandas as pd
import tensorflow as tf
df = pd.DataFrame(data = {'texts': ['Some text ssss', 'Some text', 'Some text', 'Some text', 'Some text'],
'class': [0, 0, 1, 1, 1]})
labels = df.pop('class')
dataset = tf.data.Dataset.from_tensor_slices((df, labels))
VOCAB_SIZE = 1000
encoder = tf.keras.layers.TextVectorization(
max_tokens=VOCAB_SIZE)
encoder.adapt(dataset.map(lambda text, label: text))
BUFFER_SIZE = 2
BATCH_SIZE = 3
train_dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)
model = tf.keras.Sequential([
encoder,
tf.keras.layers.Embedding(
input_dim=len(encoder.get_vocabulary()),
output_dim=64,
# Use masking to handle the variable sequence lengths
mask_zero=True),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
opt = tf.keras.optimizers.Adam(1e-4)
loss_fn = tf.keras.losses.BinaryCrossentropy()
train_acc_metric = tf.keras.metrics.BinaryAccuracy()
test_acc_metric = tf.keras.metrics.BinaryAccuracy()
epochs = 2
for epoch in range(epochs):
print("nStart of epoch %d" % (epoch + 1,))

for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
pred = model(x_batch_train)
test_acc_metric.update_state(y_batch_train, pred)
print("Current test acc: %.4f" % (float(test_acc_metric.result()),))
with tf.GradientTape() as tape:
logits = model(x_batch_train, training=True)
loss_value = loss_fn(y_batch_train, logits)
grads = tape.gradient(loss_value, model.trainable_weights)
opt.apply_gradients(zip(grads, model.trainable_weights))
train_acc_metric.update_state(y_batch_train, logits)
print("Current train acc: %.4f" % (float(train_acc_metric.result()),))
test_acc = test_acc_metric.result()
print("Total test acc over epoch: %.4f" % (float(test_acc),))
test_acc_metric.reset_states()
train_acc = train_acc_metric.result()
print("Total train acc over epoch: %.4f" % (float(train_acc),))
train_acc_metric.reset_states()
Start of epoch 1
Current test acc: 0.6922
Current train acc: 0.6922
Current test acc: 0.6936
Current train acc: 0.6936
Current test acc: 0.6928
Current train acc: 0.6928
Current test acc: 0.6934
Current train acc: 0.6934
Current test acc: 0.6938
Current train acc: 0.6938
Total test acc over epoch: 0.6938
Total train acc over epoch: 0.6938
Start of epoch 2
Current test acc: 0.6914
Current train acc: 0.6914
Current test acc: 0.6914
Current train acc: 0.6914
Current test acc: 0.6926
Current train acc: 0.6926
Current test acc: 0.6932
Current train acc: 0.6932
Current test acc: 0.6936
Current train acc: 0.6936
Total test acc over epoch: 0.6936
Total train acc over epoch: 0.6936

最新更新