Flink:不能取消正在运行的作业(流)



我想运行一个流作业。
当我尝试使用start-clusted.sh和Flink Web Interface在本地运行它时,我没有问题。

然而,我目前正试图在YARN上使用Flink运行我的工作(部署在Google Dataproc上),当我试图取消它时取消状态将永远持续,并且在TaskManager .

这是我得到的日志:

2016-10-18 16:56:04,053 INFO org.apache.flink.runtime.taskmanager.Task - 
Attempting to cancel task Source: pubSubMessageAcknowledgingSource -> 
TrackingDisplayPushDeduplicater -> TrackingDisplayPushDeserializer -> 
(Sink: TrackingDisplayPushErrorFlumeSink, Map -> Sink: 
TrackingDisplayPushValidFlumeSink) (1/1)
2016-10-18 16:56:04,053 INFO org.apache.flink.runtime.taskmanager.Task - 
Source: pubSubMessageAcknowledgingSource -> 
TrackingDisplayPushDeduplicater -> TrackingDisplayPushDeserializer -> 
(Sink: TrackingDisplayPushErrorFlumeSink, Map -> Sink: 
TrackingDisplayPushValidFlumeSink) (1/1) switched to CANCELING
2016-10-18 16:56:04,053 INFO org.apache.flink.runtime.taskmanager.Task - 
Triggering cancellation of task code Source: 
pubSubMessageAcknowledgingSource -> TrackingDisplayPushDeduplicater -> 
TrackingDisplayPushDeserializer -> (Sink: 
TrackingDisplayPushErrorFlumeSink, Map -> Sink: 
TrackingDisplayPushValidFlumeSink) (1/1) (38bf32d9199a0c9383a8b1e8d73a1f65).
2016-10-18 16:56:34,055 WARN org.apache.flink.runtime.taskmanager.Task - 
Task 'Source: pubSubMessageAcknowledgingSource -> 
TrackingDisplayPushDeduplicater -> TrackingDisplayPushDeserializer -> 
(Sink: TrackingDisplayPushErrorFlumeSink, Map -> Sink: 
TrackingDisplayPushValidFlumeSink) (1/1)' did not react to cancelling 
signal, but is stuck in method:
java.net.PlainSocketImpl.socketConnect(Native Method)
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
java.net.Socket.connect(Socket.java:589)
java.net.Socket.connect(Socket.java:538)
sun.net.NetworkClient.doConnect(NetworkClient.java:180)
sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
sun.net.www.http.HttpClient.<init>(HttpClient.java:211)
sun.net.www.http.HttpClient.New(HttpClient.java:308)
sun.net.www.http.HttpClient.New(HttpClient.java:326)
sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1169)
sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1105)
sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:999)
sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:933)
sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(HttpURLConnection.java:1283)
sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1258)
com.accengage.bigdata.flink.streaming.sinks.FlumeSink.flush(FlumeSink.java:107)
com.accengage.bigdata.flink.streaming.sinks.FlumeSink.invoke(FlumeSink.java:80)
com.accengage.bigdata.flink.streaming.sinks.FlumeSink.invoke(FlumeSink.java:25)l
org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:39)
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:373)
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:358)
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:346)
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:329)
org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:39)
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:373)
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:358)
org.apache.flink.streaming.api.collector.selector.DirectedOutput.collect(DirectedOutput.java:126)
org.apache.flink.streaming.api.collector.selector.DirectedOutput.collect(DirectedOutput.java:35)
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:346)
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:329)
org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:39)
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:373)
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:358)
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:346)
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:329)
org.apache.flink.streaming.api.operators.StreamFilter.processElement(StreamFilter.java:38)
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:373)
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:358)
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:346)
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:329)
org.apache.flink.streaming.api.operators.StreamSource$NonTimestampContext.collect(StreamSource.java:160)
com.accengage.bigdata.flink.streaming.sources.PubSubAcknowledgingSource.run(PubSubAcknowledgingSource.java:148)
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:80)
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:53)
org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:56)
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:266)
org.apache.flink.runtime.taskmanager.Task.run(Task.java:584)
java.lang.Thread.run(Thread.java:745)

你知道我做错了什么吗?
我该怎么办?

谢谢。

我假设您正在使用自定义Sink (com.accengage.bigdata.flink.streaming.sinks.FlumeSink),它使用一些HTTP库与Flume通信。

最有可能的是,当中断被发送到线程时,HTTP库在循环或其他地方被击中(这种情况发生在例如当中断异常被忽略时)

为了解决这个问题,你可以使用一个正确处理中断的HTTP库,或者从一个不同的线程调用这个库,这个线程不会在主线程上接收中断。

在Flink 1.2中,将有一些额外的机制来避免系统在cancel()调用中被击中。看到flink - 4715。

相关内容

  • 没有找到相关文章

最新更新