我正在hadoop 1.0.3上的一个5节点集群上运行一个测试。测试由3个作业组成。第一个作业运行得很好。第二个作业获取第一个作业的输出(大约100MB)。在映射顺利达到100%后,作业将被困在映射阶段和减少阶段之间。降低到5%需要很多时间。以下是Hadoop随时间的完整输出。
13/11/19 13:39:25 INFO mapred.JobClient: map 0% reduce 0%
13/11/19 13:40:12 INFO mapred.JobClient: map 1% reduce 0%
13/11/19 13:40:20 INFO mapred.JobClient: map 2% reduce 0%
13/11/19 13:40:24 INFO mapred.JobClient: map 3% reduce 0%
13/11/19 13:40:29 INFO mapred.JobClient: map 4% reduce 0%
13/11/19 13:40:39 INFO mapred.JobClient: map 5% reduce 0%
13/11/19 13:40:42 INFO mapred.JobClient: map 6% reduce 0%
13/11/19 13:40:51 INFO mapred.JobClient: map 7% reduce 0%
13/11/19 13:41:01 INFO mapred.JobClient: map 8% reduce 0%
13/11/19 13:41:06 INFO mapred.JobClient: map 9% reduce 0%
13/11/19 13:41:18 INFO mapred.JobClient: map 10% reduce 0%
13/11/19 13:41:22 INFO mapred.JobClient: map 11% reduce 0%
13/11/19 13:41:33 INFO mapred.JobClient: map 12% reduce 0%
13/11/19 13:41:42 INFO mapred.JobClient: map 13% reduce 0%
13/11/19 13:41:50 INFO mapred.JobClient: map 14% reduce 0%
13/11/19 13:41:55 INFO mapred.JobClient: map 15% reduce 0%
13/11/19 13:42:04 INFO mapred.JobClient: map 16% reduce 0%
13/11/19 13:42:11 INFO mapred.JobClient: map 17% reduce 0%
13/11/19 13:42:18 INFO mapred.JobClient: map 18% reduce 0%
13/11/19 13:42:29 INFO mapred.JobClient: map 19% reduce 0%
13/11/19 13:42:36 INFO mapred.JobClient: map 20% reduce 0%
13/11/19 13:42:42 INFO mapred.JobClient: map 21% reduce 0%
13/11/19 13:42:50 INFO mapred.JobClient: map 22% reduce 0%
13/11/19 13:42:57 INFO mapred.JobClient: map 23% reduce 0%
13/11/19 13:43:07 INFO mapred.JobClient: map 24% reduce 0%
13/11/19 13:43:17 INFO mapred.JobClient: map 25% reduce 0%
13/11/19 13:43:27 INFO mapred.JobClient: map 26% reduce 0%
13/11/19 13:43:37 INFO mapred.JobClient: map 27% reduce 0%
13/11/19 13:43:47 INFO mapred.JobClient: map 28% reduce 0%
13/11/19 13:43:54 INFO mapred.JobClient: map 29% reduce 0%
13/11/19 13:44:03 INFO mapred.JobClient: map 30% reduce 0%
13/11/19 13:44:12 INFO mapred.JobClient: map 31% reduce 0%
13/11/19 13:44:18 INFO mapred.JobClient: map 32% reduce 0%
13/11/19 13:44:28 INFO mapred.JobClient: map 33% reduce 0%
13/11/19 13:44:38 INFO mapred.JobClient: map 34% reduce 0%
13/11/19 13:44:48 INFO mapred.JobClient: map 35% reduce 0%
13/11/19 13:44:54 INFO mapred.JobClient: map 36% reduce 0%
13/11/19 13:45:02 INFO mapred.JobClient: map 37% reduce 0%
13/11/19 13:45:16 INFO mapred.JobClient: map 38% reduce 0%
13/11/19 13:45:21 INFO mapred.JobClient: map 39% reduce 0%
13/11/19 13:45:33 INFO mapred.JobClient: map 40% reduce 0%
13/11/19 13:45:39 INFO mapred.JobClient: map 41% reduce 0%
13/11/19 13:45:50 INFO mapred.JobClient: map 42% reduce 0%
13/11/19 13:45:58 INFO mapred.JobClient: map 43% reduce 0%
13/11/19 13:46:06 INFO mapred.JobClient: map 44% reduce 0%
13/11/19 13:46:17 INFO mapred.JobClient: map 45% reduce 0%
13/11/19 13:46:23 INFO mapred.JobClient: map 46% reduce 0%
13/11/19 13:46:32 INFO mapred.JobClient: map 47% reduce 0%
13/11/19 13:46:39 INFO mapred.JobClient: map 48% reduce 0%
13/11/19 13:46:44 INFO mapred.JobClient: map 49% reduce 0%
13/11/19 13:46:54 INFO mapred.JobClient: map 50% reduce 0%
13/11/19 13:47:01 INFO mapred.JobClient: map 51% reduce 0%
13/11/19 13:47:09 INFO mapred.JobClient: map 52% reduce 0%
13/11/19 13:47:20 INFO mapred.JobClient: map 53% reduce 0%
13/11/19 13:47:26 INFO mapred.JobClient: map 54% reduce 0%
13/11/19 13:47:36 INFO mapred.JobClient: map 55% reduce 0%
13/11/19 13:47:47 INFO mapred.JobClient: map 56% reduce 0%
13/11/19 13:47:59 INFO mapred.JobClient: map 57% reduce 0%
13/11/19 13:48:02 INFO mapred.JobClient: map 58% reduce 0%
13/11/19 13:48:14 INFO mapred.JobClient: map 59% reduce 0%
13/11/19 13:48:25 INFO mapred.JobClient: map 60% reduce 0%
13/11/19 13:48:37 INFO mapred.JobClient: map 61% reduce 0%
13/11/19 13:48:48 INFO mapred.JobClient: map 62% reduce 0%
13/11/19 13:48:56 INFO mapred.JobClient: map 63% reduce 0%
13/11/19 13:49:07 INFO mapred.JobClient: map 64% reduce 0%
13/11/19 13:49:17 INFO mapred.JobClient: map 65% reduce 0%
13/11/19 13:49:27 INFO mapred.JobClient: map 66% reduce 0%
13/11/19 13:49:36 INFO mapred.JobClient: map 67% reduce 0%
13/11/19 13:49:45 INFO mapred.JobClient: map 68% reduce 0%
13/11/19 13:49:55 INFO mapred.JobClient: map 69% reduce 0%
13/11/19 13:50:03 INFO mapred.JobClient: map 70% reduce 0%
13/11/19 13:50:17 INFO mapred.JobClient: map 71% reduce 0%
13/11/19 13:50:26 INFO mapred.JobClient: map 72% reduce 0%
13/11/19 13:50:35 INFO mapred.JobClient: map 73% reduce 0%
13/11/19 13:50:46 INFO mapred.JobClient: map 74% reduce 0%
13/11/19 13:50:56 INFO mapred.JobClient: map 75% reduce 0%
13/11/19 13:51:04 INFO mapred.JobClient: map 76% reduce 0%
13/11/19 13:51:13 INFO mapred.JobClient: map 77% reduce 0%
13/11/19 13:51:19 INFO mapred.JobClient: map 78% reduce 0%
13/11/19 13:51:33 INFO mapred.JobClient: map 79% reduce 0%
13/11/19 13:51:41 INFO mapred.JobClient: map 80% reduce 0%
13/11/19 13:51:51 INFO mapred.JobClient: map 81% reduce 0%
13/11/19 13:52:02 INFO mapred.JobClient: map 82% reduce 0%
13/11/19 13:52:07 INFO mapred.JobClient: map 83% reduce 0%
13/11/19 13:52:18 INFO mapred.JobClient: map 84% reduce 0%
13/11/19 13:52:30 INFO mapred.JobClient: map 85% reduce 0%
13/11/19 13:52:41 INFO mapred.JobClient: map 86% reduce 0%
13/11/19 13:52:54 INFO mapred.JobClient: map 87% reduce 0%
13/11/19 13:53:06 INFO mapred.JobClient: map 88% reduce 0%
13/11/19 13:53:22 INFO mapred.JobClient: map 89% reduce 0%
13/11/19 13:53:32 INFO mapred.JobClient: map 90% reduce 0%
13/11/19 13:53:37 INFO mapred.JobClient: map 91% reduce 0%
13/11/19 13:53:54 INFO mapred.JobClient: map 92% reduce 0%
13/11/19 13:54:09 INFO mapred.JobClient: map 93% reduce 0%
13/11/19 13:54:25 INFO mapred.JobClient: map 94% reduce 0%
13/11/19 13:54:34 INFO mapred.JobClient: map 95% reduce 0%
13/11/19 13:54:49 INFO mapred.JobClient: map 96% reduce 0%
13/11/19 13:55:12 INFO mapred.JobClient: map 97% reduce 0%
13/11/19 13:55:28 INFO mapred.JobClient: map 98% reduce 0%
13/11/19 13:56:00 INFO mapred.JobClient: map 99% reduce 0%
13/11/19 13:56:58 INFO mapred.JobClient: map 100% reduce 0%
13/11/19 14:19:20 INFO mapred.JobClient: map 100% reduce 1%
13/11/19 14:23:39 INFO mapred.JobClient: map 100% reduce 2%
13/11/19 14:25:37 INFO mapred.JobClient: map 100% reduce 3%
13/11/19 14:31:12 INFO mapred.JobClient: map 100% reduce 4%
13/11/19 14:34:26 INFO mapred.JobClient: map 100% reduce 5%
13/11/19 14:35:58 INFO mapred.JobClient: map 89% reduce 5%
13/11/19 14:46:54 INFO mapred.JobClient: map 79% reduce 5%
13/11/19 14:46:55 INFO mapred.JobClient: map 79% reduce 6%
13/11/19 14:53:09 INFO mapred.JobClient: map 79% reduce 7%
13/11/19 14:56:08 INFO mapred.JobClient: map 79% reduce 8%
13/11/19 14:56:50 INFO mapred.JobClient: Task Id : attempt_201310311057_0040_m_000006_0, Status : FAILED
Task attempt_201310311057_0040_m_000006_0 failed to report status for 1225 seconds. Killing!
Task attempt_201310311057_0040_m_000006_0 failed to report status for 1249 seconds. Killing!
13/11/19 14:57:59 WARN mapred.JobClient: Error reading task outputRead timed out
13/11/19 14:59:00 WARN mapred.JobClient: Error reading task outputRead timed out
13/11/19 14:59:01 INFO mapred.JobClient: map 70% reduce 8%
13/11/19 14:59:20 INFO mapred.JobClient: map 71% reduce 8%
13/11/19 15:00:50 INFO mapred.JobClient: map 71% reduce 9%
13/11/19 15:01:41 INFO mapred.JobClient: map 71% reduce 10%
13/11/19 15:01:54 INFO mapred.JobClient: map 72% reduce 10%
13/11/19 15:02:25 INFO mapred.JobClient: map 73% reduce 10%
13/11/19 15:02:34 INFO mapred.JobClient: Task Id : attempt_201310311057_0040_m_000005_0, Status : FAILED
Task attempt_201310311057_0040_m_000005_0 failed to report status for 1212 seconds. Killing!
13/11/19 15:03:16 INFO mapred.JobClient: map 74% reduce 10%
13/11/19 15:04:08 INFO mapred.JobClient: map 75% reduce 10%
13/11/19 15:04:48 INFO mapred.JobClient: map 76% reduce 10%
13/11/19 15:06:19 INFO mapred.JobClient: map 77% reduce 10%
13/11/19 15:07:35 INFO mapred.JobClient: map 77% reduce 11%
13/11/19 15:07:46 INFO mapred.JobClient: map 78% reduce 11%
13/11/19 15:09:46 INFO mapred.JobClient: map 79% reduce 11%
13/11/19 15:10:11 INFO mapred.JobClient: map 79% reduce 12%
13/11/19 15:12:00 INFO mapred.JobClient: map 80% reduce 12%
13/11/19 15:12:56 INFO mapred.JobClient: map 81% reduce 12%
13/11/19 15:13:46 INFO mapred.JobClient: map 82% reduce 12%
13/11/19 15:14:37 INFO mapred.JobClient: map 83% reduce 12%
13/11/19 15:15:36 INFO mapred.JobClient: map 84% reduce 12%
13/11/19 15:16:41 INFO mapred.JobClient: map 85% reduce 12%
13/11/19 15:17:44 INFO mapred.JobClient: map 86% reduce 12%
13/11/19 15:18:45 INFO mapred.JobClient: map 87% reduce 12%
13/11/19 15:20:22 INFO mapred.JobClient: map 88% reduce 12%
13/11/19 15:22:41 INFO mapred.JobClient: map 89% reduce 12%
13/11/19 15:23:57 INFO mapred.JobClient: Task Id : attempt_201310311057_0040_m_000004_0, Status : FAILED
Task attempt_201310311057_0040_m_000004_0 failed to report status for 1378 seconds. Killing!
Task attempt_201310311057_0040_m_000004_0 failed to report status for 1292 seconds. Killing!
13/11/19 15:24:00 INFO mapred.JobClient: map 89% reduce 13%
13/11/19 15:25:08 INFO mapred.JobClient: map 79% reduce 13%
13/11/19 15:26:44 INFO mapred.JobClient: map 69% reduce 13%
13/11/19 15:28:15 INFO mapred.JobClient: map 70% reduce 13%
13/11/19 15:28:40 INFO mapred.JobClient: map 71% reduce 13%
13/11/19 15:29:06 INFO mapred.JobClient: map 71% reduce 12%
13/11/19 15:29:31 INFO mapred.JobClient: map 72% reduce 12%
13/11/19 15:30:13 INFO mapred.JobClient: map 73% reduce 12%
13/11/19 15:30:36 INFO mapred.JobClient: Task Id : attempt_201310311057_0040_m_000003_0, Status : FAILED
Task attempt_201310311057_0040_m_000003_0 failed to report status for 1203 seconds. Killing!
13/11/19 15:30:36 INFO mapred.JobClient: Task Id : attempt_201310311057_0040_m_000002_0, Status : FAILED
Task attempt_201310311057_0040_m_000002_0 failed to report status for 1200 seconds. Killing!
13/11/19 15:30:36 INFO mapred.JobClient: Task Id : attempt_201310311057_0040_r_000006_0, Status : FAILED
Task attempt_201310311057_0040_r_000006_0 failed to report status for 1202 seconds. Killing!
13/11/19 15:31:14 INFO mapred.JobClient: map 74% reduce 12%
13/11/19 15:31:39 INFO mapred.JobClient: map 75% reduce 12%
13/11/19 15:32:29 INFO mapred.JobClient: map 76% reduce 12%
13/11/19 15:33:43 INFO mapred.JobClient: map 77% reduce 12%
13/11/19 15:34:24 INFO mapred.JobClient: map 77% reduce 13%
13/11/19 15:34:42 INFO mapred.JobClient: map 78% reduce 13%
13/11/19 15:35:02 INFO mapred.JobClient: map 78% reduce 14%
13/11/19 15:35:34 INFO mapred.JobClient: map 79% reduce 14%
13/11/19 15:36:29 INFO mapred.JobClient: map 80% reduce 14%
13/11/19 15:36:51 INFO mapred.JobClient: map 80% reduce 15%
13/11/19 15:37:12 INFO mapred.JobClient: map 81% reduce 15%
13/11/19 15:37:46 INFO mapred.JobClient: map 82% reduce 15%
13/11/19 15:38:12 INFO mapred.JobClient: map 83% reduce 15%
13/11/19 15:38:39 INFO mapred.JobClient: map 84% reduce 15%
13/11/19 15:39:18 INFO mapred.JobClient: map 85% reduce 15%
13/11/19 15:39:50 INFO mapred.JobClient: map 86% reduce 15%
13/11/19 15:40:16 INFO mapred.JobClient: map 87% reduce 15%
13/11/19 15:40:52 INFO mapred.JobClient: map 88% reduce 15%
13/11/19 15:41:18 INFO mapred.JobClient: map 89% reduce 15%
13/11/19 15:41:48 INFO mapred.JobClient: map 90% reduce 15%
13/11/19 15:42:47 INFO mapred.JobClient: map 91% reduce 15%
13/11/19 15:43:58 INFO mapred.JobClient: map 92% reduce 15%
13/11/19 15:45:36 INFO mapred.JobClient: map 93% reduce 15%
13/11/19 15:46:29 INFO mapred.JobClient: map 93% reduce 16%
13/11/19 15:46:53 INFO mapred.JobClient: map 94% reduce 16%
13/11/19 15:48:25 INFO mapred.JobClient: map 94% reduce 17%
13/11/19 15:48:56 INFO mapred.JobClient: map 95% reduce 17%
13/11/19 15:50:37 INFO mapred.JobClient: map 96% reduce 17%
13/11/19 15:51:46 INFO mapred.JobClient: map 96% reduce 18%
13/11/19 15:52:15 INFO mapred.JobClient: map 97% reduce 18%
13/11/19 15:53:08 INFO mapred.JobClient: map 97% reduce 19%
13/11/19 15:56:03 INFO mapred.JobClient: map 97% reduce 20%
13/11/19 15:56:54 INFO mapred.JobClient: map 98% reduce 20%
13/11/19 15:57:10 INFO mapred.JobClient: map 98% reduce 21%
13/11/19 15:59:26 INFO mapred.JobClient: map 99% reduce 21%
13/11/19 16:02:58 INFO mapred.JobClient: map 100% reduce 21%
13/11/19 16:03:57 INFO mapred.JobClient: map 100% reduce 22%
13/11/19 16:30:35 INFO mapred.JobClient: map 100% reduce 23%
13/11/19 16:35:00 INFO mapred.JobClient: map 100% reduce 24%
13/11/19 16:40:35 INFO mapred.JobClient: map 100% reduce 25%
13/11/19 16:40:38 INFO mapred.JobClient: map 100% reduce 26%
13/11/19 16:44:38 INFO mapred.JobClient: map 100% reduce 27%
13/11/19 16:49:08 INFO mapred.JobClient: map 100% reduce 28%
13/11/19 16:49:30 INFO mapred.JobClient: map 100% reduce 29%
13/11/19 16:52:25 INFO mapred.JobClient: map 100% reduce 33%
13/11/19 16:53:54 INFO mapred.JobClient: map 100% reduce 38%
13/11/19 16:54:10 INFO mapred.JobClient: map 100% reduce 42%
13/11/19 16:55:21 INFO mapred.JobClient: map 100% reduce 43%
13/11/19 16:55:36 INFO mapred.JobClient: map 100% reduce 47%
13/11/19 16:55:39 INFO mapred.JobClient: map 100% reduce 56%
13/11/19 16:56:40 INFO mapred.JobClient: map 100% reduce 57%
13/11/19 16:58:04 INFO mapred.JobClient: map 100% reduce 58%
13/11/19 17:01:25 INFO mapred.JobClient: map 100% reduce 59%
13/11/19 17:04:47 INFO mapred.JobClient: map 100% reduce 64%
13/11/19 17:05:01 INFO mapred.JobClient: map 100% reduce 69%
13/11/19 17:07:39 INFO mapred.JobClient: map 100% reduce 70%
13/11/19 17:10:32 INFO mapred.JobClient: map 100% reduce 71%
13/11/19 17:13:21 INFO mapred.JobClient: map 100% reduce 72%
13/11/19 17:16:08 INFO mapred.JobClient: map 100% reduce 73%
13/11/19 17:19:03 INFO mapred.JobClient: map 100% reduce 74%
13/11/19 17:21:55 INFO mapred.JobClient: map 100% reduce 75%
13/11/19 17:24:46 INFO mapred.JobClient: map 100% reduce 76%
13/11/19 17:27:35 INFO mapred.JobClient: map 100% reduce 77%
13/11/19 17:30:24 INFO mapred.JobClient: map 100% reduce 78%
13/11/19 17:33:14 INFO mapred.JobClient: map 100% reduce 79%
13/11/19 17:36:07 INFO mapred.JobClient: map 100% reduce 80%
13/11/19 17:39:00 INFO mapred.JobClient: map 100% reduce 81%
13/11/19 17:41:51 INFO mapred.JobClient: map 100% reduce 82%
13/11/19 17:44:39 INFO mapred.JobClient: map 100% reduce 83%
13/11/19 17:47:27 INFO mapred.JobClient: map 100% reduce 84%
13/11/19 17:50:22 INFO mapred.JobClient: map 100% reduce 85%
13/11/19 17:53:09 INFO mapred.JobClient: map 100% reduce 86%
13/11/19 17:55:54 INFO mapred.JobClient: map 100% reduce 87%
13/11/19 17:58:44 INFO mapred.JobClient: map 100% reduce 88%
13/11/19 18:01:35 INFO mapred.JobClient: map 100% reduce 89%
13/11/19 18:04:21 INFO mapred.JobClient: map 100% reduce 90%
13/11/19 18:07:16 INFO mapred.JobClient: map 100% reduce 91%
13/11/19 18:10:08 INFO mapred.JobClient: map 100% reduce 92%
13/11/19 18:12:55 INFO mapred.JobClient: map 100% reduce 93%
13/11/19 18:15:51 INFO mapred.JobClient: map 100% reduce 94%
13/11/19 18:18:45 INFO mapred.JobClient: map 100% reduce 95%
13/11/19 18:21:36 INFO mapred.JobClient: map 100% reduce 96%
13/11/19 18:24:25 INFO mapred.JobClient: map 100% reduce 97%
13/11/19 18:27:42 INFO mapred.JobClient: map 100% reduce 98%
13/11/19 18:31:25 INFO mapred.JobClient: map 100% reduce 99%
13/11/19 18:41:13 INFO mapred.JobClient: map 100% reduce 100%
在此期间(从映射100%减少0%到映射100%减少5%),我观察到只有5个映射任务完成(总共10个),然后其他5个由于超时而失败。然后他们又跑了。我知道这可以通过增加超时来解决,这不是我问题的重点。
我知道在Map和Reduce之间,数据会被提交、打乱和排序。第一个问题。在这样的数据大小下,在映射阶段和减少阶段之间等待这么长时间是正常的吗?感觉不对。
我的reducer在计算上有点重,所以我把它改成了身份reducer。但这似乎没有多大帮助。这让我觉得问题要么出在我的映射器上,要么出在混洗/排序上。这是我的地图绘制器。
public static class CliquesMapper extends
Mapper<YearTermKey, SetWritable, YearTermKey, MapWritable> {
private YearTermKey outputKEY=new YearTermKey();
public void map(YearTermKey key, SetWritable value, Context context)
throws IOException, InterruptedException {
Set<Writable> neighbors=value.keySet();
int listSize=neighbors.size();
if(listSize!=1){
for(Writable keyTerm:neighbors){
IntWritable KEYTerm=(IntWritable) keyTerm;
outputKEY.set(new Text(key.getYear()), KEYTerm);
MapWritable outputVALUE=new MapWritable();
outputVALUE.put(key.getTerm(), value);
context.write(outputKEY, outputVALUE);
}
}else{
IntWritable finalTerm=new IntWritable();
for(Writable t:neighbors){
finalTerm.set(((IntWritable) t).get());
}
outputKEY.set(key.getYear(), finalTerm);
NullWritable nw=NullWritable.get();
MapWritable outputVALUE=new MapWritable();
outputVALUE.put(key.getTerm(), nw);
context.write(outputKEY,outputVALUE);
}
}
}
第二项质询。它们——我从映射器中获得的键值对——是否可能导致了这种延迟?否则,为什么会发生这种情况?
无论如何,在10个地图任务全部完成后(地图100%左右,减少33%),减速器需要将近2个小时才能完成。既然它是一个身份减少器,这怎么可能呢?
您正在问几个问题,虽然它们是相关的,但它们有不同的答案。我在下面一一回答。
在这样的数据大小下,在映射阶段和减少阶段之间等待这么长时间是正常的吗?
地图和减少阶段之间存在障碍。在所有映射器完成之前,减速器无法启动。有些映射器出现故障,从而减慢了整个映射阶段的速度,并阻塞了reduce阶段。一旦你解决了这个问题,你的减少阶段应该更早开始。
为什么地图任务失败?显然,他们没有报告进展:
[...] failed to report status for 1225 seconds. Killing!
它们——我从映射器中获得的键值对——是否可能导致了这种延迟?否则,为什么会发生这种情况?
我不确定,但我确实看过你的代码,你可以让它运行得更快,如下所示:
1) 将您的Text
转换为IntWritable
;它看起来像是数字数据(一年),这样做将减少从mapper发送到reducers的数据量。有关提高Hadoop性能的提示,请参阅本页中的提示5。
2) 重新使用你的写字台。您正在为每个迭代创建一个new Text
。您会惊讶于这是多么的低效,并且由于在堆中不断创建/取消分配对象而导致糟糕的性能。这个想法是创建一个可写的,然后重用它。有关详细信息,请参阅本页中的技巧6,以提高Hadoop性能。
虽然我不能确定这一点,但我怀疑这可能是您的一些映射器失败的原因。垃圾回收可能会导致程序在完成时暂停,从而不报告进度,结果Hadoop会终止任务。
3) 如果您还没有这样做,请使用多个减速器。请参阅我上面链接的页面中的提示3,了解如何为您的工作设置适当数量的地图和减少任务的一些启发。
无论如何,在10个地图任务全部完成后(地图100%左右,减少33%),减速器需要将近2个小时才能完成。既然它是一个身份减少器,这怎么可能呢?
如果一个(或几个)减速器的数据太多,这可能是正常行为。Shuffling意味着在reduce一侧进行排序。尝试使用sort
对Linux盒子中的大文件进行排序。这可能需要很长时间。这就是你工作中正在发生的事情。