我在Tensorflow(使用Google Colab)中变换numpy数组时遇到了一个奇怪的行为:
from matplotlib import pyplot as plt
import tensorflow as tf
import numpy as np
seed = int(np.random.randint(0, 2 ** 16))
(train_x, train_y), (test_x, test_y) = tf.keras.datasets.cifar10.load_data()
train_x = train_x / 255.0 # this line
train_x = tf.random.shuffle(train_x, seed=seed)
train_y = tf.random.shuffle(train_y, seed=seed)
train_dataset = tf.data.Dataset.from_tensor_slices((train_x, train_y))
for i in train_dataset.take(10):
print(f"Label: {i[1].numpy()[0]}", end=', ')
plt.figure()
plt.imshow(i[0])
以这种方式洗牌train_x和train_y(都是numpy数组)之后,我从视觉上确认了索引之间的关系得到了维护,即似乎每次洗牌调用都会重置rng,并且两次都得到相同的排列。然而,当我注释掉规范化步骤(标记为'this line')时,改组破坏了索引之间的关系。
我不明白这种行为,想找出为什么会发生这种情况。谢谢你的帮助。
对于我来说,在google colab上,无论是否包含归一化行,您的代码都没有复制相同的排列。
产生相同排列的是设置顶级种子,而不是作为参数给函数的种子,如下所示:
import tensorflow as tf
seed = 11030
tf.random.set_seed(seed)
(train_x, train_y), (test_x, test_y) = tf.keras.datasets.cifar10.load_data()
train_x = train_x / 255.0 # this line
train_x = tf.random.shuffle(train_x)
train_y = tf.random.shuffle(train_y)
train_dataset = tf.data.Dataset.from_tensor_slices((train_x, train_y))
# ...visualize output or print results of arrays to confirm...