Kaggle TPU:无法连接到所有地址



我在kaggle上尝试使用TPU来适应我的模型时遇到了一些问题。

Tpu已初始化:

try:
tpu = tf.distribute.cluster_resolver.TPUClusterResolver()
print(f'Running on TPU {tpu.master()}')
except ValueError:
tpu = None
if tpu:
tf.config.experimental_connect_to_cluster(tpu)
tf.tpu.experimental.initialize_tpu_system(tpu)
strategy = tf.distribute.experimental.TPUStrategy(tpu)
else:
strategy = tf.distribute.get_strategy()
AUTO = tf.data.experimental.AUTOTUNE
REPLICAS = strategy.num_replicas_in_sync
print(f'REPLICAS: {REPLICAS}')

但当我试图适应我的模型时,会出现以下错误:

{{function_node __inference_train_function_64094}} failed to connect to all addresses
GRPC error information:{"created":"@1609444822.190891136","description":"Failed to pick
subchannel","file":"third_party/grpc/src/core/ext/filters/client_channel/client_channel.cc",
file_line":3959,"referenced_errors": [{"created":"@1609444822.190889693"
,"description":"failed to connect to all addresses", […] 
[[{{node MultiDeviceIteratorGetNextFromShard}}]] [[RemoteCall][[IteratorGetNextAsOptional]]

您必须在策略范围内创建模型和优化器:

with strategy.scope():
model = create_model()
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['sparse_categorical_accuracy'])

相关内容

  • 没有找到相关文章

最新更新