如果执行者节点突然在火花流中死亡,该怎么办



我正在使用火花流的1.6版。

几天前,我的火花流应用程序(上下文(突然关闭。看着日志,一位执行者似乎已关闭。(实际上是关闭了设备。(

如果发生这种情况,该怎么办?(请注意,动态分配选项不可用。(

如果执行人关闭,我希望将作业本身分配给另一个执行人。我的应用在纱线客户端模式下是runnung。

## log example, at the time of shutdown.
WARN TransportChannelHandler: Exception in connection from xxxx-hostname/12.34.56.789:12345
ERROR TransportResponseHandler: Still have 2 requests outstanding when connection from xxxx-hostname/12.34.56.789:12345 is closed
ERROR ContextCleaner: Error cleaning broadcast 1123293
WARN BlockManagerMaster: Failed to remove RDD 262104
...
ERROR TransportClient: Failed to send RPC 5940957964172608257 to xxxx-hostname/12.34.56.789:12345: java.nio.channels.ClosedChannelException
...
WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to get executor loss reason for executor id 5 at RPC address xxxx-hostname:12345, but got no response. Marking as slave lost. org.apache.spark.rpc.RpcTimeoutException: Cannot receive any reply in 120 seconds. This timeout is controlled by spark.rpc.askTimeout

您用完了HDFS文件系统空间(数据节点空间(。

相关内容

  • 没有找到相关文章

最新更新