小贝子编程

在Google Cloud ML引擎ClusterSpec上运行分布式Tensorflow

本文关键字：运行分布式 Tensorflow ClusterSpec 引擎 Google Cloud ML tensorflow-serving google-cloud-ml tensorflow google-cloud-ml-engine
更新时间 : 2023-09-13
英文 : Running distributed Tensorflow on Google Cloud ML engine ClusterSpec

我正在尝试在Google Cloud的ML引擎上运行一个大型分布式张量流模型，并且无法理解tf.train.ClusterSpec应该做什么。

在 Google Cloud 上运行作业时，您可以从"基本"、"STANDARD_1"、"PREMIUM_1"、"BASIC_GPU"或"自定义"中选择缩放层，每个级别都允许您访问不同类型的集群。但是，我找不到这些集群中计算机的名称/地址。

请在此处查看文档和示例。您应该使用环境变量TF_CONFIG设置群集规格;例如

tf_config = os.environ.get('TF_CONFIG')
# If TF_CONFIG is not available run local
if not tf_config:
return run('', True, *args, **kwargs)
tf_config_json = json.loads(tf_config)
cluster = tf_config_json.get('cluster')
...
cluster_spec = tf.train.ClusterSpec(cluster)

在Google Cloud ML引擎ClusterSpec上运行分布式Tensorflow

相关内容

最新更新

热门标签：