无法使用数据处理器 API 创建具有属性的群集



我正在尝试用python以编程方式创建一个集群:

import googleapiclient.discovery
dataproc = googleapiclient.discovery.build('dataproc', 'v1')
zone_uri ='https://www.googleapis.com/compute/v1/projects/{project_id}/zone/{zone}'.format(
project_id=my_project_id,
zone=my_zone,
)
cluster_data = {
'projectId': my_project_id,
'clusterName': my_cluster_name,
'config': {
'gceClusterConfig': {
'zoneUri': zone_uri
},
'softwareConfig' : {
'properties' : {'string' : {'spark:spark.executor.memory' : '10gb'}},
},
},
}
result = dataproc 
.projects() 
.regions() 
.clusters() 
.create(
projectId=my_project_id,
region=my_region,
body=cluster_data,
) 
.execute()

我一直收到此错误:Invalid JSON payload received. Unknown name "spark:spark.executor.memory" at 'cluster.config.software_config.properties[0].value': Cannot find field.">

API的文档在这里: https://cloud.google.com/dataproc/docs/reference/rest/v1/projects.regions.clusters#SoftwareConfig

属性

键以前缀:属性格式指定,例如 core:fs.defaultFS。

即使我将properties更改为{'string' : {'core:fs.defaultFS' : 'hdfs://'}},我也会收到同样的错误。

属性是一个键/值映射:

'properties': {
'spark:spark.executor.memory': 'foo'
}

文档本可以有一个更好的示例。一般来说,了解 API 外观的最佳方法是在云控制台中单击"等效 REST",或者在使用 gcloud 时单击--log-http。例如:

$ gcloud dataproc clusters create clustername --properties spark:spark.executor.memory=foo --log-http
=======================
==== request start ====
uri: https://dataproc.googleapis.com/v1/projects/projectid/regions/global/clusters?alt=json
method: POST
== body start ==
{"clusterName": "clustername", "config": {"gceClusterConfig": {"internalIpOnly": false, "zoneUri": "us-east1-d"}, "masterConfig": {"diskConfig": {}}, "softwareConfig": {"properties": {"spark:spark.executor.memory": "foo"}}, "workerConfig": {"diskConfig": {}}}, "projectId": "projectid"}
== body end ==
==== request end ====

最新更新