DataflowRunner上的Apache Beam作业从不启动,也不会生成日志——只在某些机器上?WinError



我正在尝试从Powershell运行一个简单的Beam管道。我使用的服务帐户可以访问其所需的所有GCS存储桶。这在我的个人笔记本电脑上运行得很好,但在我的工作笔记本电脑上,我得到了下面的INFO输出,作业从未显示在Dataflow控制台中,GCP或我能找到的任何其他地方也没有生成日志。

我只是想知道是什么原因导致了一台笔记本电脑而不是另一台?

(virtualenv) PS C:appsbeam> python -m apache_beam.examples.wordcount --input gs://dataflow-samples/shakespeare/kinglear.txt --output gs://dw_json/counts --runner DataflowRunner --project 'inspired-studio-111111' --region 'us-west1' --temp_location gs://dw_json_temp/tmp/
INFO:apache_beam.internal.gcp.auth:Setting socket default timeout to 60 seconds.
INFO:apache_beam.internal.gcp.auth:socket default timeout is 60.0 seconds.
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token

编辑我能够添加一些日志来输出回溯。我发现在验证管道选项时,应用程序无法访问GCS存储桶

https://www.googleapis.com/storage/v1/b/dataflow-staging-us-central1-9b3b14cdbfe093a43e2e0e83d1f47d1e?alt=json

[WinError 10061]由于目标计算机主动拒绝,因此无法建立连接

我在本地json密钥中使用的服务帐户可以完全访问这个bucket。

你知道这里堵着什么吗?

我假设gsutil ls gs://dw_json/counts适用于您?我想知道这是否与https://issues.apache.org/jira/browse/BEAM-2264这里没什么可做的;也许你可以添加一些额外的日志记录,看看它能走多远。

最新更新