当我在kubernetes(v1.15.2(集群中启动我的apache flink 1.10任务管理器服务时,它显示的日志如下:
2020-05-01 08:34:55,847 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not resolve ResourceManager address akka.tcp://flink@flink-jobmanager:6123/user/resourcemanager, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@flink-jobmanager:6123/user/resourcemanager..
2020-05-01 08:34:55,847 WARN akka.remote.transport.netty.NettyTransport - Remote connection to [null] failed with java.net.NoRouteToHostException: No route to host
2020-05-01 08:34:55,848 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink@flink-jobmanager:6123] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@flink-jobmanager:6123]] Caused by: [java.net.NoRouteToHostException: No route to host]
2020-05-01 08:35:08,874 WARN akka.remote.transport.netty.NettyTransport - Remote connection to [null] failed with java.net.NoRouteToHostException: No route to host
2020-05-01 08:35:08,877 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink@flink-jobmanager:6123] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@flink-jobmanager:6123]] Caused by: [java.net.NoRouteToHostException: No route to host]
2020-05-01 08:35:08,878 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not resolve ResourceManager address akka.tcp://flink@flink-jobmanager:6123/user/resourcemanager, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@flink-jobmanager:6123/user/resourcemanager..
2020-05-01 08:35:21,907 WARN akka.remote.transport.netty.NettyTransport - Remote connection to [null] failed with java.net.NoRouteToHostException: No route to host
而taskmanager无法注册成功,我登录taskmanager,发现我可以成功ping jobmanager,如下所示:
flink@flink-taskmanager-54d85f57c7-nl9cf:~$ ping flink-jobmanager
PING flink-jobmanager.dabai-fat.svc.cluster.local (10.254.58.171) 56(84) bytes of data.
64 bytes from flink-jobmanager.dabai-fat.svc.cluster.local (10.254.58.171): icmp_seq=1 ttl=64 time=0.045 ms
64 bytes from flink-jobmanager.dabai-fat.svc.cluster.local (10.254.58.171): icmp_seq=2 ttl=64 time=0.076 ms
64 bytes from flink-jobmanager.dabai-fat.svc.cluster.local (10.254.58.171): icmp_seq=3 ttl=64 time=0.079 ms
那么为什么会发生这种情况,我该怎么办才能解决呢?
尝试在kubernetes taskmanger的pod容器中安装nmap:
apt-get udpate
apt-get install nmap -y
然后扫描作业管理器,确保pod的暴露端口6123是可访问的(在我的情况下,我发现无法从当前pod访问端口6123(。
nmap -T4 <your-jobmanager's-pod-ip>
希望得到帮助。