尝试使用 Python SDK 运行 Google 数据流示例。
我能够在本地运行:
python -m apache_beam.examples.wordcount --output OUTPUT_FILE
但是,当尝试在 GCP 上运行时:
python -m apache_beam.examples.wordcount
--project myproject
--job_name myproject-wordcount
--runner DataflowRunner
--staging_location gs://myproject/staging
--output gs://myproject/output
--network myproject-network
--zone europe-west1-b
--subnetwork regions/europe-west1/subnetworks/europe-west1
--temp_location gs://myproject/temp
我收到以下错误:
apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: Dataflow pipeline failed. State: FAILED, Error:
(9d6636d5e214c789): Workflow failed. Causes: (ab9869cb8161ec27): Error:
Message: Invalid value for field 'resource.properties.networkInterfaces[0].subnetwork': ''. Network interface must specify a subnet if the network resource is in custom subnet mode.
HTTP Code: 400
我正在使用 apache-beam Python SDK 0.6.0
谁能帮忙?
使用子网添加--subnetwork
default
解决了我的问题:
mvn compile exec:java
-Dexec.mainClass=my.MainClass
-Dexec.args="--runner=DataflowRunner
--project=my-project
--stagingLocation=gs://my-bucket/staging
--gcpTemplateLocation=gs://my-bucket/template
--templateLocation=gs://my-bucket/template
--gcpTempLocation=gs://my-bucket/tmp
--region=us-central1
--subnetwork=regions/us-central1/subnetworks/default"
-P dataflow-runner
这对您来说可能为时已晚,但如果其他人遇到同样的问题,问题就在这里:
这一行:--zone europe-west1-b
应该只是:
--region europe-west1