对于相同的网络架构,两种不同风格的Tensorflow实现会导致两种不同的结果和行为


  • 操作系统平台:Linux Centos 7.6
  • 分布:Intel Xeon Gold 6152(22x3.70 GHz(
  • GPU型号:NVIDIA特斯拉V100 32 GB
  • 节点数/CPU/内核数/GUP:26/52/1144/104
  • TensorFlow安装自(源代码或二进制(:官方网页
  • TensorFlow版本(使用下面的命令(:2.1.0
  • Python版本:3.6.8

问题描述:

当我使用第二种实现方式(见下文(实现我提出的方法时,我意识到算法的性能确实很奇怪。更准确地说,随着历元数量的增加,精度降低,损失值增加。

所以我缩小了问题的范围,最后,我决定修改TensorFlow官方页面上的一些代码,以检查发生了什么。正如TFv2官方网页中所解释的,有两种实现风格,我采用了如下。

  • 我已经修改了";开始TF v2";下面的链接:

    TensorFlow 2初学者快速入门

如下:

import tensorflow as tf
from sklearn.preprocessing import OneHotEncoder
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
learning_rate = 1e-4
batch_size = 100
n_classes = 2
n_units = 80

# Generate synthetic data / load data sets
x_in, y_in = make_classification(n_samples=1000, n_features=10, n_informative=4, n_redundant=2, n_repeated=2, n_classes=2, n_clusters_per_class=2, weights=[0.5, 0.5], 
flip_y=0.01, class_sep=1.0, hypercube=True, shift=0.0, scale=1.0, shuffle=True, random_state=42)
x_in = x_in.astype('float32')
y_in = y_in.astype('float32').reshape(-1, 1)
one_hot_encoder = OneHotEncoder(sparse=False)
y_in = one_hot_encoder.fit_transform(y_in)
y_in = y_in.astype('float32')
x_train, x_test, y_train, y_test = train_test_split(x_in, y_in, test_size=0.4, random_state=42, shuffle=True)
x_test, x_val, y_test, y_val = train_test_split(x_test, y_test, test_size=0.5, random_state=42, shuffle=True)
print("shapes:", x_train.shape, y_train.shape, x_test.shape, y_test.shape, x_val.shape, y_val.shape)
V = x_train.shape[1]
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(n_units, activation='relu', input_shape=(V,)),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(n_classes)
])
loss_fn = tf.keras.losses.BinaryCrossentropy(from_logits=True)
model.compile(optimizer='adam', loss=loss_fn, metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test, verbose=2)

输出如预期,如下所示:

600/600 [==============================] - 0s 419us/sample - loss: 0.7114 - accuracy: 0.5350
Epoch 2/5
600/600 [==============================] - 0s 42us/sample - loss: 0.6149 - accuracy: 0.6050
Epoch 3/5
600/600 [==============================] - 0s 39us/sample - loss: 0.5450 - accuracy: 0.6925
Epoch 4/5
600/600 [==============================] - 0s 46us/sample - loss: 0.4895 - accuracy: 0.7425
Epoch 5/5
600/600 [==============================] - 0s 40us/sample - loss: 0.4579 - accuracy: 0.7825
test: 200/200 - 0s - loss: 0.4110 - accuracy: 0.8350

更准确地说,随着历元数量的增加,训练精度增加,损失值减少(这是预期的,也是正常的(。

然而,以下代码块改编自以下链接:

TensorFlow 2快速启动专家

如下所示:

import tensorflow as tf
from sklearn.preprocessing import OneHotEncoder
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
learning_rate = 1e-4
batch_size = 100
n_classes = 2
n_units = 80
# Generate synthetic data / load data sets
x_in, y_in = make_classification(n_samples=1000, n_features=10, n_informative=4, n_redundant=2, n_repeated=2, n_classes=2, n_clusters_per_class=2, weights=[0.5, 0.5],flip_y=0.01, class_sep=1.0, hypercube=True, shift=0.0, scale=1.0, shuffle=True, random_state=42)
x_in = x_in.astype('float32')
y_in = y_in.astype('float32').reshape(-1, 1)
one_hot_encoder = OneHotEncoder(sparse=False)
y_in = one_hot_encoder.fit_transform(y_in)
y_in = y_in.astype('float32')
x_train, x_test, y_train, y_test = train_test_split(x_in, y_in, test_size=0.4, random_state=42, shuffle=True)
x_test, x_val, y_test, y_val = train_test_split(x_test, y_test, test_size=0.5, random_state=42, shuffle=True)
print("shapes:", x_train.shape, y_train.shape, x_test.shape, y_test.shape, x_val.shape, y_val.shape)
training_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train)).batch(batch_size)
valid_dataset = tf.data.Dataset.from_tensor_slices((x_val, y_val)).batch(batch_size)
testing_dataset = tf.data.Dataset.from_tensor_slices((x_test, y_test)).batch(batch_size)
V = x_train.shape[1]

class MyModel(tf.keras.models.Model):
def __init__(self):
super(MyModel, self).__init__()
self.d1 = tf.keras.layers.Dense(n_units, activation='relu', input_shape=(V,))
self.d2 = tf.keras.layers.Dropout(0.2)
self.d3 = tf.keras.layers.Dense(n_classes,)
def call(self, x):
x = self.d1(x)
x = self.d2(x)
return self.d3(x)
# Create an instance of the model
model = MyModel()
loss_object = tf.keras.losses.BinaryCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.Adam()
train_loss = tf.keras.metrics.Mean(name='train_loss')
train_accuracy = tf.keras.metrics.BinaryCrossentropy(name='train_accuracy')
test_loss = tf.keras.metrics.Mean(name='test_loss')
test_accuracy = tf.keras.metrics.BinaryCrossentropy(name='test_accuracy')

@tf.function
def train_step(images, labels):
with tf.GradientTape() as tape:
# training=True is only needed if there are layers with different
# behavior during training versus inference (e.g. Dropout).
predictions = model(images,)  # training=True
loss = loss_object(labels, predictions)
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
train_loss(loss)
train_accuracy(labels, predictions)

@tf.function
def test_step(images, labels):
# training=False is only needed if there are layers with different
# behavior during training versus inference (e.g. Dropout).
predictions = model(images,)  # training=False
t_loss = loss_object(labels, predictions)
test_loss(t_loss)
test_accuracy(labels, predictions)

EPOCHS = 5
for epoch in range(EPOCHS):
# Reset the metrics at the start of the next epoch
train_loss.reset_states()
train_accuracy.reset_states()
test_loss.reset_states()
test_accuracy.reset_states()
for images, labels in training_dataset:
train_step(images, labels)
for test_images, test_labels in testing_dataset:
test_step(test_images, test_labels)

template = 'Epoch {}, Loss: {}, Accuracy: {}, Test Loss: {}, Test Accuracy: {}'
print(template.format(epoch + 1,train_loss.result(), train_accuracy.result(),                           test_loss.result(), test_accuracy.result()))

行为确实很奇怪。这是这段代码的输出:

Epoch 1, Loss: 0.7299721837043762, Accuracy: 3.8341376781463623, Test Loss: 0.7290592193603516, Test Accuracy: 3.6925911903381348
Epoch 2, Loss: 0.6725851893424988, Accuracy: 3.1141700744628906, Test Loss: 0.6695905923843384, Test Accuracy: 3.2315549850463867
Epoch 3, Loss: 0.6256862878799438, Accuracy: 2.75959849357605, Test Loss: 0.6216427087783813, Test Accuracy: 2.920461416244507
Epoch 4, Loss: 0.5873140096664429, Accuracy: 2.4249706268310547, Test Loss: 0.5828182101249695, Test Accuracy: 2.575272560119629
Epoch 5, Loss: 0.555053174495697, Accuracy: 2.2128372192382812, Test Loss: 0.5501811504364014, Test Accuracy: 2.264410972595215

正如人们所看到的,准确性的值不仅奇怪,而且不仅没有增加,一旦划时代的数量增加,它们就会减少?

你能解释一下这里发生了什么吗?

正如评论中所指出的,我在使用评估指标时犯了错误。我应该使用BinaryAccuracy。

此外,最好在高级版本中编辑调用,如下所示:

def call(self, x, training=False):
x = self.d1(x)
if training:
x = self.d2(x, training=training)
return self.d3(x)

相关内容

最新更新