当我尝试在Flink文档-Native Kubernetes中执行示例时,收到以下错误。
在这篇文章的帮助下,我添加了一些额外的参数,成功地执行了文档中的第一个命令。
user@local:~/flink-1.14.4$ ./bin/kubernetes-session.sh
-Dkubernetes.cluster-id=dproc-example-flink-cluster-id
-Dtaskmanager.memory.process.size=4096m
-Dkubernetes.taskmanager.cpu=2
-Dtaskmanager.numberOfTaskSlots=4
-Dresourcemanager.taskmanager-timeout=3600000
-Dkubernetes.namespace=sdt-dproc-flink-test
-Dkubernetes.config.file=/home/devuser/.kube/config
-Dkubernetes.jobmanager.service-account=flink-service-account
在执行了上面的命令之后,我列出了新的pod,如下所示。
user@local:~/flink-1.14.4$ kubectl get pods
NAME READY STATUS RESTARTS AGE
dproc-example-flink-cluster-id-68c79bf67-mwh52 1/1 Running 0 1m
然后,我执行了下面的命令来提交示例作业。
user@local:~/flink-1.14.4$ ./bin/flink run --target kubernetes-session
-Dkubernetes.service-account=flink-service-account
-Dkubernetes.cluster-id=dproc-example-flink-cluster-id
-Dkubernetes.namespace=sdt-dproc-flink-test
-Dkubernetes.config.file=/home/devuser/.kube/config
examples/batch/WordCount.jar --input /home/user/sometexts.txt --output /tmp/flinksample
过了一段时间,我收到了以下日志:
2022-03-25 12:38:00,538 INFO org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - Retrieve flink cluster dproc-example-flink-cluster-id successfully, JobManager Web Interface: http://10.150.140.248:8081
------------------------------------------------------------
The program finished with the following exception:
org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: java.util.concurrent.ExecutionException: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph.
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:372)
at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)
at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114)
at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:812)
at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:246)
at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1054)
at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132)
at org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132)
Caused by: java.lang.RuntimeException: java.util.concurrent.ExecutionException: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph.
at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:316)
at org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1061)
at org.apache.flink.client.program.ContextEnvironment.executeAsync(ContextEnvironment.java:131)
at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:70)
at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:93)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)
... 8 more
Caused by: java.util.concurrent.ExecutionException: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph.
at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
at org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1056)
... 16 more
Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph.
at org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$11(RestClusterClient.java:433)
at java.base/java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:986)
at java.base/java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:970)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
at org.apache.flink.util.concurrent.FutureUtils.lambda$retryOperationWithDelay$9(FutureUtils.java:399)
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
at org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$1(RestClient.java:476)
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578)
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:571)
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:550)
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:491)
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:616)
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:609)
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117)
at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:262)
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170)
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.flink.util.concurrent.FutureUtils$RetryException: Could not complete the operation. Number of retries has been exhausted.
at org.apache.flink.util.concurrent.FutureUtils.lambda$retryOperationWithDelay$9(FutureUtils.java:395)
... 21 more
Caused by: java.util.concurrent.CompletionException: org.apache.flink.shaded.netty4.io.netty.channel.ConnectTimeoutException: connection timed out: /10.150.140.248:8081
at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331)
at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346)
at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1063)
... 19 more
Caused by: org.apache.flink.shaded.netty4.io.netty.channel.ConnectTimeoutException: connection timed out: /10.150.140.248:8081
at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:261)
... 8 more
我从这个错误的最后一部分了解到,JobManager Web接口URL是错误的,因为当我检查Kubernetes服务时,端口是不同的。
user@local:~/flink-1.14.4$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
dproc-example-flink-cluster-id ClusterIP None <none> 6123/TCP,6124/TCP 6h32m
dproc-example-flink-cluster-id-rest LoadBalancer 10.97.100.197 <pending> 8081:30976/TCP 6h32m
端口应该是30976,而不是8081。我已经尝试在flink-conf.yaml中用这个值以及命令行中的参数编辑rest.port。但一切都没有改变。我总是犯这个错误。
如何强制Flink客户端访问正确的JobManager URL。
王(danrtsey.wy@gmail.com)来自user@flink.apache.org已经回答了我的问题。非常感谢。我在下面分享。我已经尝试了和服务公开类型相关的第一个选项作为NodePort,并且我已经成功地执行了作业。
根本原因可能是LoadBalancer无法在您的环境中真正工作。我们已经有了跟踪此[1]的票证,并将在下一个版本中尝试解决它。
现在,请您尝试添加"-Dkubernetes.rest-service.exposed.type=NodePort";到您的会话和提交命令?
也许你也对新的flink-kubernetes运营商项目感兴趣[2]。它应该使在K8上运行Flink应用程序变得更容易。
[1] 。https://issues.apache.org/jira/browse/FLINK-17231
[2]。https://github.com/apache/flink-kubernetes-operator
最佳,杨