在Kubernetes中运行Apache Beam python管道



这个问题看起来可能与此重复。

我正试图在Kubernetes的一个离线实例上使用flink来运行ApacheBeam-python管道。然而,由于我有具有外部依赖关系的用户代码,我将Python SDK工具用作外部服务,这会导致错误(如下所述(。

我用来启动beam-python SDK的kubernetes清单:

apiVersion: apps/v1
kind: Deployment
metadata:
name: beam-sdk
spec:
replicas: 1
selector:
matchLabels:
app: beam
component: python-beam-sdk
template:
metadata:
labels:
app: beam
component: python-beam-sdk
spec:
hostNetwork: True
containers:
- name: python-beam-sdk
image: apachebeam/python3.7_sdk:latest
imagePullPolicy: "Never"
command: ["/opt/apache/beam/boot", "--worker_pool"]
ports:
- containerPort: 50000
name: yay
apiVersion: v1
kind: Service
metadata:
name: beam-python-service
spec:
type: NodePort
ports:
- name: yay
port: 50000
targetPort: 50000
selector:
app: beam
component: python-beam-sdk

当我用以下选项启动我的管道时:

beam_options = PipelineOptions([
"--runner=FlinkRunner",
"--flink_version=1.9",
"--flink_master=10.101.28.28:8081",
"--environment_type=EXTERNAL",
"--environment_config=10.97.176.105:50000",
"--setup_file=./setup.py"
])

我收到以下错误消息(在python-sdk服务中(:

NAME                                 READY   STATUS    RESTARTS   AGE
beam-sdk-666779599c-w65g5            1/1     Running   1          4d20h
flink-jobmanager-74d444cccf-m4g8k    1/1     Running   1          4d20h
flink-taskmanager-5487cc9bc9-fsbts   1/1     Running   2          4d20h
flink-taskmanager-5487cc9bc9-zmnv7   1/1     Running   2          4d20h
(base) [~]$ sudo kubectl logs -f beam-sdk-666779599c-w65g5                                                                                                                   
2020/02/26 07:56:44 Starting worker pool 1: python -m apache_beam.runners.worker.worker_pool_main --service_port=50000 --container_executable=/opt/apache/beam/boot
Starting worker with command ['/opt/apache/beam/boot', '--id=1-1', '--logging_endpoint=localhost:39283', '--artifact_endpoint=localhost:41533', '--provision_endpoint=localhost:42233', '--control_endpoint=localhost:44977']
2020/02/26 09:09:07 Initializing python harness: /opt/apache/beam/boot --id=1-1 --logging_endpoint=localhost:39283 --artifact_endpoint=localhost:41533 --provision_endpoint=localhost:42233 --control_endpoint=localhost:44977
2020/02/26 09:11:07 Failed to obtain provisioning information: failed to dial server at localhost:42233
caused by:
context deadline exceeded

我不知道日志或工件端点(等等(是什么。通过检查源代码,似乎已经将端点硬编码为位于localhost。

(你在评论中说引用帖子的答案是有效的,所以我只会解决你遇到的特定错误,以防其他人点击它。(

你的理解是正确的;日志记录、工件等端点本质上是硬编码的,以使用localhost。这些端点仅由Beam内部使用,不可配置。因此,Beam工作人员被隐含地假设与Flink任务管理器在同一主机上。通常,这是通过使Beam工作池成为Flink任务管理器pod的一个sidecar而不是一个单独的服务来实现的。

最新更新