我运行了一个简单的排序程序,但遇到了如下错误。
12/06/15 01:13:17 WARN mapred.JobClient: Error reading task outputServer returned HTTP response code: 403 for URL: _http://192.168.1.106:50060/tasklog?plaintext=true&attemptid=attempt_201206150102_0002_m_000001_1&filter=stdout
12/06/15 01:13:18 WARN mapred.JobClient: Error reading task outputServer returned HTTP response code: 403 for URL: _http://192.168.1.106:50060/tasklog?plaintext=true&attemptid=attempt_201206150102_0002_m_000001_1&filter=stderr
12/06/15 01:13:20 INFO mapred.JobClient: map 50% reduce 0%
12/06/15 01:13:23 INFO mapred.JobClient: map 100% reduce 0%
12/06/15 01:14:19 INFO mapred.JobClient: Task Id : attempt_201206150102_0002_m_000000_2, Status : FAILED
Too many fetch-failures
12/06/15 01:14:20 WARN mapred.JobClient: Error reading task outputServer returned HTTP response code: 403 for URL: _http://192.168.1.106:50060/tasklog?plaintext=true&attemptid=attempt_201206150102_0002_m_000000_2&filter=stdout
有人知道原因是什么以及如何解决吗?
-------更新以获取更多日志信息------------------
2012-06-15 19:56:07039 WARN org.apache.hadoop.util.NativeCodeLoader:无法为您的平台加载本机hadoop库。。。在适用的情况下使用内置java类2012-06-15 19:56:07258 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl:源名称ugi已经存在!2012-06-15 19:56:07339 INFO org.apache.hadoop.mapred.Task:使用资源计算器插件:null2012-06-15 19:56:07346 INFO org.apache.hadop.mapred.ReduceTask:ShuffleRamManager:MemoryLimit=144965632,MaxSingleShuffleLimit=362414082012-06-15 19:56:07351 INFO org.apache.hadoop.mapred.ReduceTask:temppt_201206151954_0001_r_000000_0线程已启动:用于合并磁盘文件的线程2012-06-15 19:56:07351 INFO org.apache.hadoop.mapred.ReduceTask:temppt_201206151954_0001_r_000000_0线程已启动:用于合并内存文件的线程2012-06-15 19:56:07351 INFO org.apache.hadoop.mapred.ReduceTask:temppt_201206151954_0001_r_000000_0线程等待:合并磁盘文件的线程2012-06-15 19:56:07352 INFO org.apache.hadoop.mapred.ReduceTask:temppt_201206151954_0001_r_000000_0需要另外2个地图输出,其中0已经在进行中2012-06-15 19:56:07352 INFO org.apache.hadoop.mapred.ReduceTask:temppt_201206151954_0001_r_000000_0线程已启动:轮询映射完成事件的线程2012-06-15 19:56:07352 INFO org.apache.hadoop.mapred.ReduceTask:attempt_2012066151954_0001_r_0000000_0计划0输出(0个慢速主机和0个重复主机)2012-06-15 19:56:12353 INFO org.apache.hadoop.mapred.ReduceTask:temppt_201206151954_0001_r_000000_0计划1输出(0个慢速主机和0个重复主机)2012-06-15 19:56:32076 WARN org.apache.hadoop.mapred.ReduceTask:temppt_201206151954_0001_r_0000000_0复制失败:attempt_20120151954_0001_m_0000000_0from 192.168.1.1062012-06-15 19:56:32077 WARN org.apache.hadoop.mapred.ReduceTask:java.io.io异常:服务器返回URL的HTTP响应代码:403_http://192.168.1.106:50060/mapOutput?job=job_201206151954_0001&map=尝试_201206151954_0001_m_000000_0&reduce=0网址:sun.net。www.protocol.HttpURLConnection.getInputStream(HttpURLConnection.java:1436)网址:org.apache.hadop.mapred.ReduceTask$ReducePier$MapOutputCopier.getInputStream(ReduceTask.java:1639)网址:org.apache.hadop.mapred.ReduceTask$ReducePier$MapOutputCopier.setupSecureConnection(ReduceTask.java:1575)网址:org.apache.hadop.mapred.ReduceTask$ReducePier$MapOutputCopier.getMapOutput(ReduceTask.java:1483)网址:org.apache.hadop.mapred.ReduceTask$ReducePier$MapOutputCopier.copyOutput(ReduceTask.java:1394)网址:org.apache.hadop.mapred.ReduceTask$ReducePier$MapOutputCopier.run(ReduceTask.java:1326)
2012-06-15 19:56:32077 INFO org.apache.hadoop.mapred.ReduceTask:任务尝试_201206151954_0001_r_0000000_0:从尝试_20120615 1954_0001_m_0000000_0获取#1失败2012-06-15 19:56:32077 INFO org.apache.hadoop.mapred.ReduceTask:即使在MAX_fetch_RETRIES_PER_map重试后,也无法从尝试_201206151954_0001_m_0000000_0获取地图输出。。。或者是读取错误,报告给JobTracker2012-06-15 19:56:32077 WARN org.apache.hadoop.mapred.ReduceTask:temppt_201206151954_0001_r_000000_0将主机192.168.1.106添加到罚框,12秒后下一次联系2012-06-15 19:56:32077 INFO org.apache.hadop.mapred.ReduceTask:temppt_201206151954_0001_r_000000_0:从以前的故障中获得1个映射输出2012-06-15 19:56:47080 INFO org.apache.hadop.mapred.ReduceTask:temppt_201206151954_0001_r_0000000_0计划1输出(0个慢速主机和0个重复主机)2012-06-15 19:56:56048 WARN org.apache.hadoop.mapred.ReduceTask:temppt_201206151954_0001_r_0000000_0复制失败:attempt_20120151954_0001_m_0000000_0from 192.168.1.1062012-06-15 19:56:56049 WARN org.apache.hadoop.mapred.ReduceTask:java.io.io异常:服务器返回URL的HTTP响应代码:403_http://192.168.1.106:50060/mapOutput?job=job_201206151954_0001&map=尝试_201206151954_0001_m_000000_0&reduce=0网址:sun.net。www.protocol.HttpURLConnection.getInputStream(HttpURLConnection.java:1436)网址:org.apache.hadop.mapred.ReduceTask$ReducePier$MapOutputCopier.getInputStream(ReduceTask.java:1639)网址:org.apache.hadop.mapred.ReduceTask$ReducePier$MapOutputCopier.setupSecureConnection(ReduceTask.java:1575)网址:org.apache.hadop.mapred.ReduceTask$ReducePier$MapOutputCopier.getMapOutput(ReduceTask.java:1483)网址:org.apache.hadop.mapred.ReduceTask$ReducePier$MapOutputCopier.copyOutput(ReduceTask.java:1394)网址:org.apache.hadop.mapred.ReduceTask$ReducePier$MapOutputCopier.run(ReduceTask.java:1326)
2012-06-15 19:56:56049 INFO org.apache.hadoop.mapred.ReduceTask:任务尝试_201206151954_0001_r_0000000_0:从尝试_20120615 1954_0001_m_0000000_0获取#2失败2012-06-15 19:56:56049 INFO org.apache.hadoop.mapred.ReduceTask:即使在MAX_fetch_RETRIES_PER_map重试后,也无法从attempt_20120161954_0001_m_0000000_0获取地图输出。。。或者是读取错误,报告给JobTracker2012-06-15 19:56:56049 WARN org.apache.hadoop.mapred.ReduceTask:temppt_201206151954_0001_r_000000_0将主机192.168.1.106添加到罚框,16秒后下一次联系2012-06-15 19:56:56049 INFO org.apache.hadop.mapred.ReduceTask:temppt_201206151954_0001_r_000000_0:从以前的故障中获得1个映射输出2012-06-15 19:57:11053 INFO org.apache.hadoop.mapred.ReduceTask:temppt_201206151954_0001_r_000000_0需要另外2个地图输出,其中0已经在进行中2012-06-15 19:57:11053 INFO org.apache.hadop.mapred.ReduceTask:temppt_201206151954_0001_r_0000000_0计划0输出(1个慢速主机和0个重复主机)2012-06-15 19:57:11053 INFO org.apache.hadoop.mapred.EducationTask:惩罚(慢)主持人:2012-06-15 19:57:11053 INFO org.apache.hadop.mapred.ReduceTask:192.168.1.106将在:1秒后考虑。2012-06-15 19:57:16055 INFO org.apache.hadop.mapred.ReduceTask:temppt_201206151954_0001_r_0000000_0计划1输出(0个慢速主机和0个重复主机)2012-06-15 19:57:25984 WARN org.apache.hadoop.mapred.EduceTask:temppt_201206151954_0001_r_0000000_0复制失败:attempt_20120151954_0001_m_0000000_0from 192.168.1.1062012-06-15 19:57:25984 WARN org.apache.hadoop.mapred.EduceTask:java.io.io异常:服务器返回URL的HTTP响应代码:403_http://192.168.1.106:50060/mapOutput?job=job_201206151954_0001&map=尝试_201206151954_0001_m_000000_0&reduce=0网址:sun.net。www.protocol.HttpURLConnection.getInputStream(HttpURLConnection.java:1436)网址:org.apache.hadop.mapred.ReduceTask$ReducePier$MapOutputCopier.getInputStream(ReduceTask.java:1639)网址:org.apache.hadop.mapred.ReduceTask$ReducePier$MapOutputCopier.setupSecureConnection(ReduceTask.java:1575)网址:org.apache.hadop.mapred.ReduceTask$ReducePier$MapOutputCopier.getMapOutput(ReduceTask.java:1483)网址:org.apache.hadop.mapred.ReduceTask$ReducePier$MapOutputCopier.copyOutput(ReduceTask.java:1394)网址:org.apache.hadop.mapred.ReduceTask$ReducePier$MapOutputCopier.run(ReduceTask.java:1326)
谨致问候,
我遇到了同样的问题。在深入研究之后,我确定问题出在主机的名称解析上。请检查中特定尝试的日志
$HADOOP_HOME/logs/userlogs/JobXXX/attemptXXX/syslog
如果它有类似的东西
WARN org.apache.hadoop.mapred.ReduceTask:java.net未知主机异常:slave-1.local.lan
然后只需在/etc/hosts中添加适当的条目。这样做之后,错误得到了解决,在下一次尝试中,一切都很好。