Hadoop distcp作业成功,但尝试_xxx被ApplicationMaster终止



运行distcp作业时遇到以下问题:几乎所有的地图任务都被标记为成功,但注意到Container已被杀死。

在在线界面上,地图作业的日志显示:进度100.00州成功

但在Note下,几乎每次尝试都会显示(~200)容器被ApplicationMaster杀死。容器被ApplicationMaster杀死。容器应请求被杀死。退出代码为143

在与尝试相关的日志文件中,我可以看到一条日志,上面写着任务"attempt_xxxxxxxxx_0"已完成。

所有作业/尝试的stderr输出均为空。

当查看应用程序主日志并跟踪其中一次成功(但已终止)的尝试时,我发现以下日志:

2017-01-05 10:27:22,772 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with attempt attempt_1483370705805_4012_m_000000_0
2017-01-05 10:27:22,773 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1483370705805_4012_m_000000 Task Transitioned from RUNNING to SUCCEEDED
2017-01-05 10:27:22,775 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 1
2017-01-05 10:27:22,775 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1483370705805_4012Job Transitioned from RUNNING to COMMITTING
2017-01-05 10:27:22,776 INFO [CommitterEvent Processor #1] org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing the event EventType: JOB_COMMIT
2017-01-05 10:27:23,118 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:1 AssignedReds:0 CompletedMaps:1 CompletedReds:0 ContAlloc:1 ContRel:0 HostLocal:0 RackLocal:0
2017-01-05 10:27:24,125 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received completed container container_e116_1483370705805_4012_01_000002
2017-01-05 10:27:24,126 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:0 AssignedReds:0 CompletedMaps:1 CompletedReds:0 ContAlloc:1 ContRel:0 HostLocal:0 RackLocal:0
2017-01-05 10:27:24,126 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1483370705805_4012_m_000000_0: Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

我已经设置了"mapreduce.map.investigative=false"!

所有MAP任务都已成功(distcp作业没有REDUCE),但MAPREDUCE持续很长时间(几个小时),然后它成功,distcp作业完成

我正在运行"纱线版本"=Hadoop 2.5.0-cdh5.3.1

我应该担心吗?是什么导致集装箱被杀死?如有任何建议,我们将不胜感激!

那些被杀死的尝试可能是由于推测性执行。在这种情况下,没有什么可担心的。

为了确保是这样,试着这样运行distcp:

hadoop distcp  -Dmapreduce.map.speculative=false ...

你应该停止看到那些被杀害的企图。

相关内容

  • 没有找到相关文章

最新更新