创建集群后,我正在尝试检索其他组件的 URL 地址(不使用 GCP 仪表板(。我正在使用de Dataproc python API,更具体地说是get_cluster()
函数。
函数返回了很多数据,但我无法找到 Jupyter 网关 URL 或其他元数据。
from google.cloud import dataproc_v1
project_id, cluster_name = '', ''
region = 'europe-west4'
client = dataproc_v1.ClusterControllerClient(
client_options={
'api_endpoint': '{}-dataproc.googleapis.com:443'.format(region)
}
)
response = client.get_cluster(project_id, region, cluster_name)
print(response)
有人可以解决这个问题吗?
如果您已按照此文档通过启用组件网关来设置 Jupyter 访问,则可以按照此处所述访问 Web 界面。诀窍是,这包含在v1beta2
版本的 API 响应中。
代码中所需的更改很少(除了google-cloud-dataproc
库之外没有其他要求(。只需将dataproc_v1
替换为dataproc_v1beta2
,并使用response.config.endpoint_config
访问端点:
from google.cloud import dataproc_v1beta2
project_id, cluster_name = '', ''
region = 'europe-west4'
client = dataproc_v1beta2.ClusterControllerClient(
client_options={
'api_endpoint': '{}-dataproc.googleapis.com:443'.format(region)
}
)
response = client.get_cluster(project_id, region, cluster_name)
print(response.config.endpoint_config)
就我而言,我得到:
http_ports {
key: "HDFS NameNode"
value: "https://REDACTED-dot-europe-west4.dataproc.googleusercontent.com/hdfs/dfshealth.html"
}
http_ports {
key: "Jupyter"
value: "https://REDACTED-dot-europe-west4.dataproc.googleusercontent.com/jupyter/"
}
http_ports {
key: "JupyterLab"
value: "https://REDACTED-dot-europe-west4.dataproc.googleusercontent.com/jupyter/lab/"
}
http_ports {
key: "MapReduce Job History"
value: "https://REDACTED-dot-europe-west4.dataproc.googleusercontent.com/jobhistory/"
}
http_ports {
key: "Spark History Server"
value: "https://REDACTED-dot-europe-west4.dataproc.googleusercontent.com/sparkhistory/"
}
http_ports {
key: "Tez"
value: "https://REDACTED-dot-europe-west4.dataproc.googleusercontent.com/apphistory/tez-ui/"
}
http_ports {
key: "YARN Application Timeline"
value: "https://REDACTED-dot-europe-west4.dataproc.googleusercontent.com/apphistory/"
}
http_ports {
key: "YARN ResourceManager"
value: "https://REDACTED-dot-europe-west4.dataproc.googleusercontent.com/yarn/"
}
enable_http_port_access: true
你需要 v1beat2
通过以下方式启用组件:
'endpoint_config': {
'enable_http_port_access': True
},
那么上面的答案将起作用:
client.get_cluster(project_id, region, cluster_name)