我应该如何改变加载数据集的方式，以便利用kaggles TPU

我使用tensorflow的Keras API训练一个模型，该模型可以检测卡纳达语脚本中的哪些字符在图像中，卡纳达语是一种南印度语言，可以有657个以上的分类类别，因为字符是辅音和元音的组合。为了更清楚，请参考维基百科的这篇文章。

该模型的数据集是一个带有多个子目录的单个目录，每个子目录对应一个类，如下所示：目录结构

或者，如果您访问此处的kaggle公共链接，您可以更清楚地看到结构。

以下是我生产的进口产品：

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Activation, Dense, Flatten, BatchNormalization, Conv2D, MaxPool2D
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.metrics import categorical_crossentropy
from tensorflow.keras.preprocessing.image import ImageDataGenerator

我使用ImageDataGenerator加载数据，因为我可以轻松地将数据集拆分为单独的训练集和验证集。下面是我用来构建这两个集合的代码：

# Creating training and validation data generators
datagen=ImageDataGenerator(validation_split=0.01)
train_generator=datagen.flow_from_directory(
directory="../input/kannada-images-with-noise/Images_with_noise",
subset="training",
batch_size=256,
shuffle=True,
classes=image_classes,
color_mode='grayscale',
target_size=(75,75))
valid_generator=datagen.flow_from_directory(
directory="../input/kannada-images-with-noise/Images_with_noise",
subset="validation",
batch_size=256,
shuffle=True,
classes=image_classes,
color_mode='grayscale',
target_size=(75,75))
# Creating step sizes
STEP_SIZE_TRAIN=train_generator.n//train_generator.batch_size
STEP_SIZE_VALID=valid_generator.n//valid_generator.batch_size

然后我把这些生成器传递给model.fit((函数，就像一样

# Training our model
model.fit(
x=train_generator, 
steps_per_epoch=STEP_SIZE_TRAIN, 
validation_data=valid_generator,
validation_steps=STEP_SIZE_VALID,
epochs=25,
verbose=1
)

到目前为止，我一直坚持这种方法，因为它简单明了。然而，如果我想利用kaggle上可用的TPU，我将不得不改变加载数据的方式，并使用tf.data.Dataset，因为ImageDataGenerator无法使用kaggle数据集的谷歌云服务链接来获取数据。

如何使用tf.data.Dataset加载数据？如果你能点链接我可以遵循的任何例子或教程，我将不胜感激。如果对我来说，改变目录的结构方式更好，请告诉我必须如何做。

我知道两种方法请注意，TPUClusterResolver的tpu参数是仅用于Colab的特殊地址。如果您在Google计算引擎(GCE(上运行，则应该传入CloudTPU的名称。

resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='')
tf.config.experimental_connect_to_cluster(resolver)
# This is the TPU initialization code that has to be at the beginning.
tf.tpu.experimental.initialize_tpu_system(resolver)
print("All devices: ", tf.config.list_logical_devices('TPU'))

INFO:tensorflow:Initializing the TPU system: grpc://10.240.1.74:8470
INFO:tensorflow:Initializing the TPU system: grpc://10.240.1.74:8470
INFO:tensorflow:Clearing out eager caches
INFO:tensorflow:Clearing out eager caches
INFO:tensorflow:Finished initializing TPU system.
INFO:tensorflow:Finished initializing TPU system.
All devices:  [LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:7', device_type

手动放置设备TPU初始化后，您可以使用手动设备放置将计算放置在单个TPU设备上。

a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
with tf.device('/TPU:0'):
c = tf.matmul(a, b)
print("c device: ", c.device)
print(c)

c device:  /job:worker/replica:0/task:0/device:TPU:0
tf.Tensor(
[[22. 28.]
[49. 64.]], shape=(2, 2), dtype=float32)

大多数情况下，用户希望以数据并行的方式在多个TPU上运行模型。分发策略是一种抽象，可用于驱动CPU、GPU或TPU上的模型。只需交换分发策略，模型就会在给定的设备上运行。

strategy = tf.distribute.TPUStrategy(resolver)

INFO:tensorflow:Found TPU system:
INFO:tensorflow:Found TPU system:
INFO:tensorflow:*** Num TPU Cores: 8
INFO:tensorflow:*** Num TPU Cores: 8
INFO:tensorflow:*** Num TPU Workers: 1
INFO:tensorflow:*** Num TPU Workers: 1
INFO:tensorflow:*** Num TPU Cores Per Worker: 8
INFO:tensorflow:*** Num TPU Cores Per Worker: 8
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:2, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:2, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:3, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:3, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:4, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:4, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:5, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:5, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:6, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:6, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:7, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:7, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU_SYSTEM:0, TPU_SYSTEM, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU_SYSTEM:0, TPU_SYSTEM, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 0, 0)

要复制计算，使其可以在所有TPU内核中运行，您只需将其传递给strategy.run API即可。下面是一个例子，所有核心都将获得相同的输入(a，b(，并独立地对每个核心进行matmul。输出将是来自所有复制副本的值。

@tf.function
def matmul_fn(x, y):
z = tf.matmul(x, y)
return z
z = strategy.run(matmul_fn, args=(a, b))
print(z)

PerReplica:{
0: tf.Tensor(
[[22. 28.]
[49. 64.]], shape=(2, 2), dtype=float32),
1: tf.Tensor(
[[22. 28.]
[49. 64.]], shape=(2, 2), dtype=float32),
2: tf.Tensor(
[[22. 28.]
[49. 64.]], shape=(2, 2), dtype=float32),
3: tf.Tensor(
[[22. 28.]
[49. 64.]], shape=(2, 2), dtype=float32),
4: tf.Tensor(

相关内容

最新更新

热门标签：