当我们对tf.keras.preprocessing.image_dataset_from_directory对象使用.

我创建了一个这样的数据生成器：

# Create test_dataset
test_dataset = 
tf.keras.preprocessing.image_dataset_from_directory(directory=test_dir,
labels='inferred', 
label_mode='int', 
class_names=None,
seed=42, 
)
# Explore the first batch
for images, labels in test_dataset.take(1):
print(labels)

它返回：

tf.Tensor([5 3 8 3 8 5 7 6 3 8 4 2 4 5 5 4 0 1 0 5 5 2 6 0 7 9 9 0 4 9 6 4], shape=(32,), dtype=int32)

如果我重新运行最后一部分如下：

for images, labels in test_dataset.take(1):
print(labels)

它返回与第一次不同的东西：

tf.Tensor([0 6 2 5 5 7 5 2 7 4 0 5 0 4 6 5 8 7 7 3 5 1 1 9 5 2 6 6 6 6 2 0], shape=(32,), dtype=int32)

如果我重新创建test_dataset并按如下方式探索它：

# Create test_dataset
test_dataset = 
tf.keras.preprocessing.image_dataset_from_directory(directory=test_dir,
labels='inferred', 
label_mode='int', 
class_names=None,
seed=42, 
)
# Explore the first batch
for images, labels in test_dataset.take(1):
print(labels)

它返回与第一次相同的

tf.Tensor([5 3 8 3 8 5 7 6 3 8 4 2 4 5 5 4 0 1 0 5 5 2 6 0 7 9 9 0 4 9 6 4], shape=(32,), dtype=int32)

好吧，我得出的结论是，当我使用take方法时，批次会弹出并丢失，无法再用于建模和验证等。

我的问题是：

我说得对吗？如果运行test_dataset.take(1)，第一批是否丢失
如果以上问题的答案是肯定的，那么在尝试探索tf.keras.preprocessing.image_dataset_from_directory对象中的批次时，有没有什么方法不松细菌

这不是关于丢失批次。函数tf.keras.preprocessing.image_dataset_from_directory有一个参数shuffle，默认值为True。也就是说，数据集在每次迭代时都会被打乱。

如果我们深入研究源代码：

if shuffle:
# Shuffle locally at each iteration
dataset = dataset.shuffle(buffer_size=batch_size * 8, seed=seed)
dataset = dataset.batch(batch_size)

正如您所看到的，它在引擎盖下创建了一个具有shuffle方法的tf.data对象。Shuffle方法默认情况下有一个参数reshuffle_each_iteration = True。使用2nd-take方法，您将再次对数据集进行迭代，这将导致数据集再次被打乱。

如果为数据集设置shuffle = False，则数据将按字母数字顺序排序，并且其顺序在每次迭代时不会更改。

相关内容

最新更新

热门标签：