dataproc在python中创建集群gcloud等效命令



如何在python中复制以下gcloud命令?

gcloud beta dataproc clusters create spark-nlp-cluster 
--region global 
--metadata 'PIP_PACKAGES=google-cloud-storage spark-nlp==2.5.3' 
--worker-machine-type n1-standard-1 
--num-workers 2 
--image-version 1.4-debian10 
--initialization-actions gs://dataproc-initialization-actions/python/pip-install.sh 
--optional-components=JUPYTER,ANACONDA 
--enable-component-gateway 

以下是我迄今为止在python中拥有的内容:


cluster_data = {
"project_id": project,
"cluster_name": cluster_name,
"config": {
"gce_cluster_config": {"zone_uri": zone_uri},
"master_config": {"num_instances": 1, "machine_type_uri": "n1-standard-1"},
"worker_config": {"num_instances": 2, "machine_type_uri": "n1-standard-1"},
"software_config":{"image_version":"1.4-debian10","optional_components":{"JUPYTER","ANACONDA"}}

},
}
cluster = dataproc.create_cluster(
request={"project_id": project, "region": region, "cluster": cluster_data}
)

不知道如何将这些gcloud命令转换为python:

--metadata 'PIP_PACKAGES=google-cloud-storage spark-nlp==2.5.3'   
--initialization-actions gs://dataproc-initialization-actions/python/pip-install.sh 
--enable-component-gateway 

您可以这样尝试:

cluster_data = {
"project_id": project,
"cluster_name": cluster_name,
"config": {
"gce_cluster_config": {"zone_uri": zone_uri},
"master_config": {"num_instances": 1, "machine_type_uri": "n1-standard-1"},
"worker_config": {"num_instances": 2, "machine_type_uri": "n1-standard-1"},
"software_config":{"image_version":"1.4-debian10","optional_components":{"JUPYTER","ANACONDA"}},
"initialization_actions":{"executable_file" : "gs://dataproc-initialization-actions/python/pip-install.sh"},
"gce_cluster_config": {"metadata": "PIP_PACKAGES=google-cloud-storage,spark-nlp==2.5.3"},
"endpoint_config": {"enable_http_port_access":True},

},
}

您可以访问更多:GCP集群配置

相关内容

  • 没有找到相关文章

最新更新