无法使用GCP上的ssh连接到TPU



我一直在学习https://cloud.google.com/tpu/docs/how-to.

我创建了一个TPU实例,并尝试使用gcloud compute ssh线路连接到它。然后,出现了此错误。

AppDataLocalGoogleCloud SDK>gcloud compute ssh node-1 --zone=asia-east1-c
PythonERROR: (gcloud.compute.ssh) Could not fetch resource:
- The resource 'projects/project-masker/zones/asia-east1-c/instances/node-1' was not found

在试图解决这个错误时,我发现tpus不包括在执行组中。

AppDataLocalGoogleCloud SDK>gcloud compute tpus list
PythonNAME    ZONE          ACCELERATOR_TYPE  NETWORK  RANGE             STATUS
node-2  asia-east1-c  v2-8              default  10.75.202.248/29  READY
node-1  asia-east1-c  v2-8              default  10.82.81.168/29   READY

AppDataLocalGoogleCloud SDK>gcloud compute tpus execution-groups list
PythonListed 0 items.

这是我尝试重新启动tpu时得到的结果。

PythonRequest issued for: [node-1]
Waiting for operation [projects/project-masker/locations/asia-east1-c/operations/operation-1625299249870-5c633787137b9-
e14800b7-d997be6b] to complete...done.
done: true
metadata:
'@type': type.googleapis.com/google.cloud.common.OperationMetadata
apiVersion: v1
cancelRequested: false
createTime: '2021-07-03T08:00:49.884674545Z'
endTime: '2021-07-03T08:01:31.161199334Z'
target: projects/project-masker/locations/asia-east1-c/nodes/node-1
verb: update
name: projects/project-masker/locations/asia-east1-c/operations/operation-1625299249870-5c633787137b9-e14800b7-d997be6b
response:
'@type': type.googleapis.com/google.cloud.tpu.v1.Node
acceleratorType: v2-8
apiVersion: V1
cidrBlock: 10.82.81.168/29
createTime: '2021-07-03T07:27:41.148997156Z'
health: HEALTHY
ipAddress: 10.82.81.170
name: projects/project-masker/locations/asia-east1-c/nodes/node-1
network: global/networks/default
networkEndpoints:
- ipAddress: 10.82.81.170
port: 8470
port: '8470'
schedulingConfig: {}
serviceAccount: service-...@cloud-tpu.iam.gserviceaccount.com
state: READY
tensorflowVersion: pytorch-1.9

我试着在谷歌上找到一些相关的文章,但一篇也找不到。我该怎么解决这个问题?

您不能直接通过SSH连接到TPU节点,因此gcloud compute ssh {tpu_name}不应该工作。

但是,您可以通过SSH直接连接到TPU VM,请参阅此链接。如果你已经在使用TPU虚拟机,那么你的问题是你正在尝试

gcloud compute ssh

而不是

gcloud alpha compute tpus tpu-vm ssh ...

最新更新