如何访问 Dataproc 集群元数据?



创建集群后,我正在尝试检索其他组件的 URL 地址(不使用 GCP 仪表板(。我正在使用de Dataproc python API,更具体地说是get_cluster()函数。

函数返回了很多数据,但我无法找到 Jupyter 网关 URL 或其他元数据。

from google.cloud import dataproc_v1
project_id, cluster_name = '', ''
region = 'europe-west4'
client = dataproc_v1.ClusterControllerClient(
client_options={
'api_endpoint': '{}-dataproc.googleapis.com:443'.format(region)
}
)

response = client.get_cluster(project_id, region, cluster_name)
print(response)

有人可以解决这个问题吗?

如果您已按照此文档通过启用组件网关来设置 Jupyter 访问,则可以按照此处所述访问 Web 界面。诀窍是,这包含在v1beta2版本的 API 响应中。

代码中所需的更改很少(除了google-cloud-dataproc库之外没有其他要求(。只需将dataproc_v1替换为dataproc_v1beta2,并使用response.config.endpoint_config访问端点:

from google.cloud import dataproc_v1beta2
project_id, cluster_name = '', ''
region = 'europe-west4'
client = dataproc_v1beta2.ClusterControllerClient(
client_options={
'api_endpoint': '{}-dataproc.googleapis.com:443'.format(region)
}
)

response = client.get_cluster(project_id, region, cluster_name)
print(response.config.endpoint_config)

就我而言,我得到:

http_ports {
key: "HDFS NameNode"
value: "https://REDACTED-dot-europe-west4.dataproc.googleusercontent.com/hdfs/dfshealth.html"
}
http_ports {
key: "Jupyter"
value: "https://REDACTED-dot-europe-west4.dataproc.googleusercontent.com/jupyter/"
}
http_ports {
key: "JupyterLab"
value: "https://REDACTED-dot-europe-west4.dataproc.googleusercontent.com/jupyter/lab/"
}
http_ports {
key: "MapReduce Job History"
value: "https://REDACTED-dot-europe-west4.dataproc.googleusercontent.com/jobhistory/"
}
http_ports {
key: "Spark History Server"
value: "https://REDACTED-dot-europe-west4.dataproc.googleusercontent.com/sparkhistory/"
}
http_ports {
key: "Tez"
value: "https://REDACTED-dot-europe-west4.dataproc.googleusercontent.com/apphistory/tez-ui/"
}
http_ports {
key: "YARN Application Timeline"
value: "https://REDACTED-dot-europe-west4.dataproc.googleusercontent.com/apphistory/"
}
http_ports {
key: "YARN ResourceManager"
value: "https://REDACTED-dot-europe-west4.dataproc.googleusercontent.com/yarn/"
}
enable_http_port_access: true

你需要 v1beat2

通过以下方式启用组件:

'endpoint_config': {
'enable_http_port_access': True
},

那么上面的答案将起作用:

client.get_cluster(project_id, region, cluster_name)

最新更新