Flink群集启动错误:无法解析ResourceManager地址akka



需要以下错误的帮助,因为我似乎没有找到实际问题所在。我正在尝试在win10专业版的docker桌面上运行flink集群。

Dockerfile:

FROM SOME-LOCAL-REGISTERY-URL/flink:1.11
ADD build/libs/demoapp-service-all.jar /opt/flink/usrlib/demoapp-service-all.jar
volume /tmp
ADD conf/flink-conf.yaml /opt/flink/conf/flink-conf.yaml
ADD conf/log4j.properties /opt/flink/conf/log4j.properties

flink-conf.yaml:

jobmanager.rpc.address: jobmanager
jobmanager.rpc.port: 8092
jobmanager.memory.process.size: 1600m
taskmanager.memory.process.size: 1728m
taskmanager.numberOfTaskSlots: 1
parallelism.default: 1
state.backend: rocksdb
state.checkpoints.dir: file:///c:/Users/demo/checkpoint_dir
state.backend.rocksdb.memory.managed: true

我正在创建">demo/demap:1.0";从Dockefile手动镜像,然后启动flink集群作为"镜像">码头工人组成";

docker compose.yml:

version: "2.2"
services:
jobmanager:
image: demo/demoapp:1.0
ports:
- "8092:8092"
command: ["standalone-job", "-Dspring.profiles.active=dev"]

taskmanager:
image: demo/demoapp:1.0
depends_on:
- jobmanager
command: ["taskmanager", "-Dspring.profiles.active=dev"]
scale: 1

日志:

jobmanager_1   | Starting Job Manager
taskmanager_1  | Starting Task Manager
jobmanager_1   | Starting standalonejob as a console application on host aaf9a34c154f.
taskmanager_1  | Starting taskexecutor as a console application on host a96dd08d9ae6.
---------------------------------------------------------------------------------------------
taskmanager_1  | TM_RESOURCE_PARAMS extraction logs:
taskmanager_1  | jvm_params: -Xmx536870902 -Xms536870902 -XX:MaxDirectMemorySize=268435458 -XX:MaxMetaspaceSize=268435456
taskmanager_1  | dynamic_configs: -D taskmanager.memory.framework.off-heap.size=134217728b -D taskmanager.memory.network.max=134217730b -D taskmanager.memory.network.min=134217730b -D taskmanager.memory.framework.heap.size=134217728b -D taskmanager.memory.managed.size=536870920b -D taskmanager.cpu.cores=2.0 -D taskmanager.memory.task.heap.size=402653174b -D taskmanager.memory.task.off-heap.size=0b
taskmanager_1  | logs: INFO  [] - Loading configuration property: jobmanager.rpc.address, a96dd08d9ae6
taskmanager_1  | INFO  [] - Loading configuration property: jobmanager.rpc.port, 8092
taskmanager_1  | INFO  [] - Loading configuration property: jobmanager.memory.process.size, 1600m
taskmanager_1  | INFO  [] - Loading configuration property: taskmanager.memory.process.size, 1728m
taskmanager_1  | INFO  [] - Loading configuration property: taskmanager.numberOfTaskSlots, 2
taskmanager_1  | INFO  [] - Loading configuration property: parallelism.default, 1
taskmanager_1  | INFO  [] - Loading configuration property: state.backend, rocksdb
taskmanager_1  | INFO  [] - Loading configuration property: state.checkpoints.dir, file:///c:/Users/demo/checkpoint_dir
taskmanager_1  | INFO  [] - Loading configuration property: state.backend.rocksdb.memory.managed, true
taskmanager_1  | INFO  [] - Loading configuration property: blob.server.port, 6124
taskmanager_1  | INFO  [] - Loading configuration property: query.server.port, 6125
-------------------------------------------------------------------------------------- 
jobmanager_1   | JM_RESOURCE_PARAMS extraction logs:
jobmanager_1   | jvm_params: -Xmx1073741824 -Xms1073741824 -XX:MaxMetaspaceSize=268435456
jobmanager_1   | logs: INFO  [] - Loading configuration property: jobmanager.rpc.address, aaf9a34c154f
jobmanager_1   | INFO  [] - Loading configuration property: jobmanager.rpc.port, 8092
jobmanager_1   | INFO  [] - Loading configuration property: jobmanager.memory.process.size, 1600m
jobmanager_1   | INFO  [] - Loading configuration property: taskmanager.memory.process.size, 1728m
jobmanager_1   | INFO  [] - Loading configuration property: taskmanager.numberOfTaskSlots, 1
jobmanager_1   | INFO  [] - Loading configuration property: parallelism.default, 1
jobmanager_1   | INFO  [] - Loading configuration property: state.backend, rocksdb
jobmanager_1   | INFO  [] - Loading configuration property: state.checkpoints.dir, file:///c:/Users/demo/checkpoint_dir
jobmanager_1   | INFO  [] - Loading configuration property: state.backend.rocksdb.memory.managed, true
jobmanager_1   | INFO  [] - Loading configuration property: blob.server.port, 6124
jobmanager_1   | INFO  [] - Loading configuration property: query.server.port, 6125
---------------------------------------------------------------------------------------------

错误日志:

taskmanager_1  | 2020-11-25 10:15:41,179 INFO  org.apache.flink.runtime.net.ConnectionUtils                 [] - Trying to connect to address a96dd08d9ae6/172.18.0.3:8092
taskmanager_1  | 2020-11-25 10:15:41,180 INFO  org.apache.flink.runtime.net.ConnectionUtils                 [] - Failed to connect from address 'a96dd08d9ae6/172.18.0.3': Connection refused (Connection refused)
taskmanager_1  | 2020-11-25 10:15:41,181 INFO  org.apache.flink.runtime.net.ConnectionUtils                 [] - Failed to connect from address '/172.18.0.3': Connection refused (Connection refused)
taskmanager_1  | 2020-11-25 10:15:41,181 INFO  org.apache.flink.runtime.net.ConnectionUtils                 [] - Failed to connect from address '/172.18.0.3': Connection refused (Connection refused)
taskmanager_1  | 2020-11-25 10:15:41,182 INFO  org.apache.flink.runtime.net.ConnectionUtils                 [] - Failed to connect from address '/127.0.0.1': Connection refused (Connection refused)
taskmanager_1  | 2020-11-25 10:15:41,183 INFO  org.apache.flink.runtime.net.ConnectionUtils                 [] - Failed to connect from address '/172.18.0.3': Connection refused (Connection refused)
taskmanager_1  | 2020-11-25 10:15:41,183 INFO  org.apache.flink.runtime.net.ConnectionUtils                 [] - Failed to connect from address '/127.0.0.1': Connection refused (Connection refused)
taskmanager_1  | 2020-11-25 10:16:19,730 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] 
- Could not resolve ResourceManager address akka.tcp://flink@a96dd08d9ae6:8092/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@a96dd08d9ae6:8092/user/rpc/resourcemanager_*.

此外,除了错误,我从日志中不明白为什么taskmanager正在读取";jobmanager.rpc.address";以及";taskmanager.numberOfTaskSlots";不同于flink-conf.yaml。而JobManager读取正确。

请帮我解决我在这里缺少的东西。

不是在flink-conf.yaml中定义jobmanager.rpc.address,而是在docker-compose.yml文件中定义它,这为我解决了问题:

Dockerfile:

FROM flink:1.12.2-scala_2.12-java8
COPY --chown=flink:flink ./path/to/assembly.jar /opt/flink/usrlib/
COPY --chown=flink:flink ./conf/* /opt/flink/conf/

docker-compose.yml:

environment:
FLINK_PROPERTIES: |-
jobmanager.rpc.address: jobmanager

flink-conf.yaml:

# Other configurations.
# ...
# Leave last line empty.

最新更新