Hadoop在映射100%后速度减慢



我正在hadoop 1.0.3上的一个5节点集群上运行一个测试。测试由3个作业组成。第一个作业运行得很好。第二个作业获取第一个作业的输出(大约100MB)。在映射顺利达到100%后,作业将被困在映射阶段和减少阶段之间。降低到5%需要很多时间。以下是Hadoop随时间的完整输出。

13/11/19 13:39:25 INFO mapred.JobClient:  map 0% reduce 0%
13/11/19 13:40:12 INFO mapred.JobClient:  map 1% reduce 0%
13/11/19 13:40:20 INFO mapred.JobClient:  map 2% reduce 0%
13/11/19 13:40:24 INFO mapred.JobClient:  map 3% reduce 0%
13/11/19 13:40:29 INFO mapred.JobClient:  map 4% reduce 0%
13/11/19 13:40:39 INFO mapred.JobClient:  map 5% reduce 0%
13/11/19 13:40:42 INFO mapred.JobClient:  map 6% reduce 0%
13/11/19 13:40:51 INFO mapred.JobClient:  map 7% reduce 0%
13/11/19 13:41:01 INFO mapred.JobClient:  map 8% reduce 0%
13/11/19 13:41:06 INFO mapred.JobClient:  map 9% reduce 0%
13/11/19 13:41:18 INFO mapred.JobClient:  map 10% reduce 0%
13/11/19 13:41:22 INFO mapred.JobClient:  map 11% reduce 0%
13/11/19 13:41:33 INFO mapred.JobClient:  map 12% reduce 0%
13/11/19 13:41:42 INFO mapred.JobClient:  map 13% reduce 0%
13/11/19 13:41:50 INFO mapred.JobClient:  map 14% reduce 0%
13/11/19 13:41:55 INFO mapred.JobClient:  map 15% reduce 0%
13/11/19 13:42:04 INFO mapred.JobClient:  map 16% reduce 0%
13/11/19 13:42:11 INFO mapred.JobClient:  map 17% reduce 0%
13/11/19 13:42:18 INFO mapred.JobClient:  map 18% reduce 0%
13/11/19 13:42:29 INFO mapred.JobClient:  map 19% reduce 0%
13/11/19 13:42:36 INFO mapred.JobClient:  map 20% reduce 0%
13/11/19 13:42:42 INFO mapred.JobClient:  map 21% reduce 0%
13/11/19 13:42:50 INFO mapred.JobClient:  map 22% reduce 0%
13/11/19 13:42:57 INFO mapred.JobClient:  map 23% reduce 0%
13/11/19 13:43:07 INFO mapred.JobClient:  map 24% reduce 0%
13/11/19 13:43:17 INFO mapred.JobClient:  map 25% reduce 0%
13/11/19 13:43:27 INFO mapred.JobClient:  map 26% reduce 0%
13/11/19 13:43:37 INFO mapred.JobClient:  map 27% reduce 0%
13/11/19 13:43:47 INFO mapred.JobClient:  map 28% reduce 0%
13/11/19 13:43:54 INFO mapred.JobClient:  map 29% reduce 0%
13/11/19 13:44:03 INFO mapred.JobClient:  map 30% reduce 0%
13/11/19 13:44:12 INFO mapred.JobClient:  map 31% reduce 0%
13/11/19 13:44:18 INFO mapred.JobClient:  map 32% reduce 0%
13/11/19 13:44:28 INFO mapred.JobClient:  map 33% reduce 0%
13/11/19 13:44:38 INFO mapred.JobClient:  map 34% reduce 0%
13/11/19 13:44:48 INFO mapred.JobClient:  map 35% reduce 0%
13/11/19 13:44:54 INFO mapred.JobClient:  map 36% reduce 0%
13/11/19 13:45:02 INFO mapred.JobClient:  map 37% reduce 0%
13/11/19 13:45:16 INFO mapred.JobClient:  map 38% reduce 0%
13/11/19 13:45:21 INFO mapred.JobClient:  map 39% reduce 0%
13/11/19 13:45:33 INFO mapred.JobClient:  map 40% reduce 0%
13/11/19 13:45:39 INFO mapred.JobClient:  map 41% reduce 0%
13/11/19 13:45:50 INFO mapred.JobClient:  map 42% reduce 0%
13/11/19 13:45:58 INFO mapred.JobClient:  map 43% reduce 0%
13/11/19 13:46:06 INFO mapred.JobClient:  map 44% reduce 0%
13/11/19 13:46:17 INFO mapred.JobClient:  map 45% reduce 0%
13/11/19 13:46:23 INFO mapred.JobClient:  map 46% reduce 0%
13/11/19 13:46:32 INFO mapred.JobClient:  map 47% reduce 0%
13/11/19 13:46:39 INFO mapred.JobClient:  map 48% reduce 0%
13/11/19 13:46:44 INFO mapred.JobClient:  map 49% reduce 0%
13/11/19 13:46:54 INFO mapred.JobClient:  map 50% reduce 0%
13/11/19 13:47:01 INFO mapred.JobClient:  map 51% reduce 0%
13/11/19 13:47:09 INFO mapred.JobClient:  map 52% reduce 0%
13/11/19 13:47:20 INFO mapred.JobClient:  map 53% reduce 0%
13/11/19 13:47:26 INFO mapred.JobClient:  map 54% reduce 0%
13/11/19 13:47:36 INFO mapred.JobClient:  map 55% reduce 0%
13/11/19 13:47:47 INFO mapred.JobClient:  map 56% reduce 0%
13/11/19 13:47:59 INFO mapred.JobClient:  map 57% reduce 0%
13/11/19 13:48:02 INFO mapred.JobClient:  map 58% reduce 0%
13/11/19 13:48:14 INFO mapred.JobClient:  map 59% reduce 0%
13/11/19 13:48:25 INFO mapred.JobClient:  map 60% reduce 0%
13/11/19 13:48:37 INFO mapred.JobClient:  map 61% reduce 0%
13/11/19 13:48:48 INFO mapred.JobClient:  map 62% reduce 0%
13/11/19 13:48:56 INFO mapred.JobClient:  map 63% reduce 0%
13/11/19 13:49:07 INFO mapred.JobClient:  map 64% reduce 0%
13/11/19 13:49:17 INFO mapred.JobClient:  map 65% reduce 0%
13/11/19 13:49:27 INFO mapred.JobClient:  map 66% reduce 0%
13/11/19 13:49:36 INFO mapred.JobClient:  map 67% reduce 0%
13/11/19 13:49:45 INFO mapred.JobClient:  map 68% reduce 0%
13/11/19 13:49:55 INFO mapred.JobClient:  map 69% reduce 0%
13/11/19 13:50:03 INFO mapred.JobClient:  map 70% reduce 0%
13/11/19 13:50:17 INFO mapred.JobClient:  map 71% reduce 0%
13/11/19 13:50:26 INFO mapred.JobClient:  map 72% reduce 0%
13/11/19 13:50:35 INFO mapred.JobClient:  map 73% reduce 0%
13/11/19 13:50:46 INFO mapred.JobClient:  map 74% reduce 0%
13/11/19 13:50:56 INFO mapred.JobClient:  map 75% reduce 0%
13/11/19 13:51:04 INFO mapred.JobClient:  map 76% reduce 0%
13/11/19 13:51:13 INFO mapred.JobClient:  map 77% reduce 0%
13/11/19 13:51:19 INFO mapred.JobClient:  map 78% reduce 0%
13/11/19 13:51:33 INFO mapred.JobClient:  map 79% reduce 0%
13/11/19 13:51:41 INFO mapred.JobClient:  map 80% reduce 0%
13/11/19 13:51:51 INFO mapred.JobClient:  map 81% reduce 0%
13/11/19 13:52:02 INFO mapred.JobClient:  map 82% reduce 0%
13/11/19 13:52:07 INFO mapred.JobClient:  map 83% reduce 0%
13/11/19 13:52:18 INFO mapred.JobClient:  map 84% reduce 0%
13/11/19 13:52:30 INFO mapred.JobClient:  map 85% reduce 0%
13/11/19 13:52:41 INFO mapred.JobClient:  map 86% reduce 0%
13/11/19 13:52:54 INFO mapred.JobClient:  map 87% reduce 0%
13/11/19 13:53:06 INFO mapred.JobClient:  map 88% reduce 0%
13/11/19 13:53:22 INFO mapred.JobClient:  map 89% reduce 0%
13/11/19 13:53:32 INFO mapred.JobClient:  map 90% reduce 0%
13/11/19 13:53:37 INFO mapred.JobClient:  map 91% reduce 0%
13/11/19 13:53:54 INFO mapred.JobClient:  map 92% reduce 0%
13/11/19 13:54:09 INFO mapred.JobClient:  map 93% reduce 0%
13/11/19 13:54:25 INFO mapred.JobClient:  map 94% reduce 0%
13/11/19 13:54:34 INFO mapred.JobClient:  map 95% reduce 0%
13/11/19 13:54:49 INFO mapred.JobClient:  map 96% reduce 0%
13/11/19 13:55:12 INFO mapred.JobClient:  map 97% reduce 0%
13/11/19 13:55:28 INFO mapred.JobClient:  map 98% reduce 0%
13/11/19 13:56:00 INFO mapred.JobClient:  map 99% reduce 0%
13/11/19 13:56:58 INFO mapred.JobClient:  map 100% reduce 0%
13/11/19 14:19:20 INFO mapred.JobClient:  map 100% reduce 1%
13/11/19 14:23:39 INFO mapred.JobClient:  map 100% reduce 2%
13/11/19 14:25:37 INFO mapred.JobClient:  map 100% reduce 3%
13/11/19 14:31:12 INFO mapred.JobClient:  map 100% reduce 4%
13/11/19 14:34:26 INFO mapred.JobClient:  map 100% reduce 5%
13/11/19 14:35:58 INFO mapred.JobClient:  map 89% reduce 5%
13/11/19 14:46:54 INFO mapred.JobClient:  map 79% reduce 5%
13/11/19 14:46:55 INFO mapred.JobClient:  map 79% reduce 6%
13/11/19 14:53:09 INFO mapred.JobClient:  map 79% reduce 7%
13/11/19 14:56:08 INFO mapred.JobClient:  map 79% reduce 8%
13/11/19 14:56:50 INFO mapred.JobClient: Task Id : attempt_201310311057_0040_m_000006_0, Status : FAILED
Task attempt_201310311057_0040_m_000006_0 failed to report status for 1225 seconds. Killing!
Task attempt_201310311057_0040_m_000006_0 failed to report status for 1249 seconds. Killing!
13/11/19 14:57:59 WARN mapred.JobClient: Error reading task outputRead timed out
13/11/19 14:59:00 WARN mapred.JobClient: Error reading task outputRead timed out
13/11/19 14:59:01 INFO mapred.JobClient:  map 70% reduce 8%
13/11/19 14:59:20 INFO mapred.JobClient:  map 71% reduce 8%
13/11/19 15:00:50 INFO mapred.JobClient:  map 71% reduce 9%
13/11/19 15:01:41 INFO mapred.JobClient:  map 71% reduce 10%
13/11/19 15:01:54 INFO mapred.JobClient:  map 72% reduce 10%
13/11/19 15:02:25 INFO mapred.JobClient:  map 73% reduce 10%
13/11/19 15:02:34 INFO mapred.JobClient: Task Id : attempt_201310311057_0040_m_000005_0, Status : FAILED
Task attempt_201310311057_0040_m_000005_0 failed to report status for 1212 seconds. Killing!
13/11/19 15:03:16 INFO mapred.JobClient:  map 74% reduce 10%
13/11/19 15:04:08 INFO mapred.JobClient:  map 75% reduce 10%
13/11/19 15:04:48 INFO mapred.JobClient:  map 76% reduce 10%
13/11/19 15:06:19 INFO mapred.JobClient:  map 77% reduce 10%
13/11/19 15:07:35 INFO mapred.JobClient:  map 77% reduce 11%
13/11/19 15:07:46 INFO mapred.JobClient:  map 78% reduce 11%
13/11/19 15:09:46 INFO mapred.JobClient:  map 79% reduce 11%
13/11/19 15:10:11 INFO mapred.JobClient:  map 79% reduce 12%
13/11/19 15:12:00 INFO mapred.JobClient:  map 80% reduce 12%
13/11/19 15:12:56 INFO mapred.JobClient:  map 81% reduce 12%
13/11/19 15:13:46 INFO mapred.JobClient:  map 82% reduce 12%
13/11/19 15:14:37 INFO mapred.JobClient:  map 83% reduce 12%
13/11/19 15:15:36 INFO mapred.JobClient:  map 84% reduce 12%
13/11/19 15:16:41 INFO mapred.JobClient:  map 85% reduce 12%
13/11/19 15:17:44 INFO mapred.JobClient:  map 86% reduce 12%
13/11/19 15:18:45 INFO mapred.JobClient:  map 87% reduce 12%
13/11/19 15:20:22 INFO mapred.JobClient:  map 88% reduce 12%
13/11/19 15:22:41 INFO mapred.JobClient:  map 89% reduce 12%
13/11/19 15:23:57 INFO mapred.JobClient: Task Id : attempt_201310311057_0040_m_000004_0, Status : FAILED
Task attempt_201310311057_0040_m_000004_0 failed to report status for 1378 seconds. Killing!
Task attempt_201310311057_0040_m_000004_0 failed to report status for 1292 seconds. Killing!
13/11/19 15:24:00 INFO mapred.JobClient:  map 89% reduce 13%
13/11/19 15:25:08 INFO mapred.JobClient:  map 79% reduce 13%
13/11/19 15:26:44 INFO mapred.JobClient:  map 69% reduce 13%
13/11/19 15:28:15 INFO mapred.JobClient:  map 70% reduce 13%
13/11/19 15:28:40 INFO mapred.JobClient:  map 71% reduce 13%
13/11/19 15:29:06 INFO mapred.JobClient:  map 71% reduce 12%
13/11/19 15:29:31 INFO mapred.JobClient:  map 72% reduce 12%
13/11/19 15:30:13 INFO mapred.JobClient:  map 73% reduce 12%
13/11/19 15:30:36 INFO mapred.JobClient: Task Id : attempt_201310311057_0040_m_000003_0, Status : FAILED
Task attempt_201310311057_0040_m_000003_0 failed to report status for 1203 seconds. Killing!
13/11/19 15:30:36 INFO mapred.JobClient: Task Id : attempt_201310311057_0040_m_000002_0, Status : FAILED
Task attempt_201310311057_0040_m_000002_0 failed to report status for 1200 seconds. Killing!
13/11/19 15:30:36 INFO mapred.JobClient: Task Id : attempt_201310311057_0040_r_000006_0, Status : FAILED
Task attempt_201310311057_0040_r_000006_0 failed to report status for 1202 seconds. Killing!
13/11/19 15:31:14 INFO mapred.JobClient:  map 74% reduce 12%
13/11/19 15:31:39 INFO mapred.JobClient:  map 75% reduce 12%
13/11/19 15:32:29 INFO mapred.JobClient:  map 76% reduce 12%
13/11/19 15:33:43 INFO mapred.JobClient:  map 77% reduce 12%
13/11/19 15:34:24 INFO mapred.JobClient:  map 77% reduce 13%
13/11/19 15:34:42 INFO mapred.JobClient:  map 78% reduce 13%
13/11/19 15:35:02 INFO mapred.JobClient:  map 78% reduce 14%
13/11/19 15:35:34 INFO mapred.JobClient:  map 79% reduce 14%
13/11/19 15:36:29 INFO mapred.JobClient:  map 80% reduce 14%
13/11/19 15:36:51 INFO mapred.JobClient:  map 80% reduce 15%
13/11/19 15:37:12 INFO mapred.JobClient:  map 81% reduce 15%
13/11/19 15:37:46 INFO mapred.JobClient:  map 82% reduce 15%
13/11/19 15:38:12 INFO mapred.JobClient:  map 83% reduce 15%
13/11/19 15:38:39 INFO mapred.JobClient:  map 84% reduce 15%
13/11/19 15:39:18 INFO mapred.JobClient:  map 85% reduce 15%
13/11/19 15:39:50 INFO mapred.JobClient:  map 86% reduce 15%
13/11/19 15:40:16 INFO mapred.JobClient:  map 87% reduce 15%
13/11/19 15:40:52 INFO mapred.JobClient:  map 88% reduce 15%
13/11/19 15:41:18 INFO mapred.JobClient:  map 89% reduce 15%
13/11/19 15:41:48 INFO mapred.JobClient:  map 90% reduce 15%
13/11/19 15:42:47 INFO mapred.JobClient:  map 91% reduce 15%
13/11/19 15:43:58 INFO mapred.JobClient:  map 92% reduce 15%
13/11/19 15:45:36 INFO mapred.JobClient:  map 93% reduce 15%
13/11/19 15:46:29 INFO mapred.JobClient:  map 93% reduce 16%
13/11/19 15:46:53 INFO mapred.JobClient:  map 94% reduce 16%
13/11/19 15:48:25 INFO mapred.JobClient:  map 94% reduce 17%
13/11/19 15:48:56 INFO mapred.JobClient:  map 95% reduce 17%
13/11/19 15:50:37 INFO mapred.JobClient:  map 96% reduce 17%
13/11/19 15:51:46 INFO mapred.JobClient:  map 96% reduce 18%
13/11/19 15:52:15 INFO mapred.JobClient:  map 97% reduce 18%
13/11/19 15:53:08 INFO mapred.JobClient:  map 97% reduce 19%
13/11/19 15:56:03 INFO mapred.JobClient:  map 97% reduce 20%
13/11/19 15:56:54 INFO mapred.JobClient:  map 98% reduce 20%
13/11/19 15:57:10 INFO mapred.JobClient:  map 98% reduce 21%
13/11/19 15:59:26 INFO mapred.JobClient:  map 99% reduce 21%
13/11/19 16:02:58 INFO mapred.JobClient:  map 100% reduce 21%
13/11/19 16:03:57 INFO mapred.JobClient:  map 100% reduce 22%
13/11/19 16:30:35 INFO mapred.JobClient:  map 100% reduce 23%
13/11/19 16:35:00 INFO mapred.JobClient:  map 100% reduce 24%
13/11/19 16:40:35 INFO mapred.JobClient:  map 100% reduce 25%
13/11/19 16:40:38 INFO mapred.JobClient:  map 100% reduce 26%
13/11/19 16:44:38 INFO mapred.JobClient:  map 100% reduce 27%
13/11/19 16:49:08 INFO mapred.JobClient:  map 100% reduce 28%
13/11/19 16:49:30 INFO mapred.JobClient:  map 100% reduce 29%
13/11/19 16:52:25 INFO mapred.JobClient:  map 100% reduce 33%
13/11/19 16:53:54 INFO mapred.JobClient:  map 100% reduce 38%
13/11/19 16:54:10 INFO mapred.JobClient:  map 100% reduce 42%
13/11/19 16:55:21 INFO mapred.JobClient:  map 100% reduce 43%
13/11/19 16:55:36 INFO mapred.JobClient:  map 100% reduce 47%
13/11/19 16:55:39 INFO mapred.JobClient:  map 100% reduce 56%
13/11/19 16:56:40 INFO mapred.JobClient:  map 100% reduce 57%
13/11/19 16:58:04 INFO mapred.JobClient:  map 100% reduce 58%
13/11/19 17:01:25 INFO mapred.JobClient:  map 100% reduce 59%
13/11/19 17:04:47 INFO mapred.JobClient:  map 100% reduce 64%
13/11/19 17:05:01 INFO mapred.JobClient:  map 100% reduce 69%
13/11/19 17:07:39 INFO mapred.JobClient:  map 100% reduce 70%
13/11/19 17:10:32 INFO mapred.JobClient:  map 100% reduce 71%
13/11/19 17:13:21 INFO mapred.JobClient:  map 100% reduce 72%
13/11/19 17:16:08 INFO mapred.JobClient:  map 100% reduce 73%
13/11/19 17:19:03 INFO mapred.JobClient:  map 100% reduce 74%
13/11/19 17:21:55 INFO mapred.JobClient:  map 100% reduce 75%
13/11/19 17:24:46 INFO mapred.JobClient:  map 100% reduce 76%
13/11/19 17:27:35 INFO mapred.JobClient:  map 100% reduce 77%
13/11/19 17:30:24 INFO mapred.JobClient:  map 100% reduce 78%
13/11/19 17:33:14 INFO mapred.JobClient:  map 100% reduce 79%
13/11/19 17:36:07 INFO mapred.JobClient:  map 100% reduce 80%
13/11/19 17:39:00 INFO mapred.JobClient:  map 100% reduce 81%
13/11/19 17:41:51 INFO mapred.JobClient:  map 100% reduce 82%
13/11/19 17:44:39 INFO mapred.JobClient:  map 100% reduce 83%
13/11/19 17:47:27 INFO mapred.JobClient:  map 100% reduce 84%
13/11/19 17:50:22 INFO mapred.JobClient:  map 100% reduce 85%
13/11/19 17:53:09 INFO mapred.JobClient:  map 100% reduce 86%
13/11/19 17:55:54 INFO mapred.JobClient:  map 100% reduce 87%
13/11/19 17:58:44 INFO mapred.JobClient:  map 100% reduce 88%
13/11/19 18:01:35 INFO mapred.JobClient:  map 100% reduce 89%
13/11/19 18:04:21 INFO mapred.JobClient:  map 100% reduce 90%
13/11/19 18:07:16 INFO mapred.JobClient:  map 100% reduce 91%
13/11/19 18:10:08 INFO mapred.JobClient:  map 100% reduce 92%
13/11/19 18:12:55 INFO mapred.JobClient:  map 100% reduce 93%
13/11/19 18:15:51 INFO mapred.JobClient:  map 100% reduce 94%
13/11/19 18:18:45 INFO mapred.JobClient:  map 100% reduce 95%
13/11/19 18:21:36 INFO mapred.JobClient:  map 100% reduce 96%
13/11/19 18:24:25 INFO mapred.JobClient:  map 100% reduce 97%
13/11/19 18:27:42 INFO mapred.JobClient:  map 100% reduce 98%
13/11/19 18:31:25 INFO mapred.JobClient:  map 100% reduce 99%
13/11/19 18:41:13 INFO mapred.JobClient:  map 100% reduce 100%

在此期间(从映射100%减少0%到映射100%减少5%),我观察到只有5个映射任务完成(总共10个),然后其他5个由于超时而失败。然后他们又跑了。我知道这可以通过增加超时来解决,这不是我问题的重点。

我知道在Map和Reduce之间,数据会被提交、打乱和排序。第一个问题。在这样的数据大小下,在映射阶段和减少阶段之间等待这么长时间是正常的吗?感觉不对。

我的reducer在计算上有点重,所以我把它改成了身份reducer。但这似乎没有多大帮助。这让我觉得问题要么出在我的映射器上,要么出在混洗/排序上。这是我的地图绘制器。

  public static class CliquesMapper extends
      Mapper<YearTermKey, SetWritable, YearTermKey, MapWritable> {
    private YearTermKey outputKEY=new YearTermKey();

    public void map(YearTermKey key, SetWritable value, Context context)
        throws IOException, InterruptedException {
        Set<Writable> neighbors=value.keySet();
        int listSize=neighbors.size();
        if(listSize!=1){
            for(Writable keyTerm:neighbors){
                IntWritable KEYTerm=(IntWritable) keyTerm;
                outputKEY.set(new Text(key.getYear()), KEYTerm);
                MapWritable outputVALUE=new MapWritable();
                outputVALUE.put(key.getTerm(), value);                  
                context.write(outputKEY, outputVALUE);                                
            }
        }else{
            IntWritable finalTerm=new IntWritable();
            for(Writable t:neighbors){
                finalTerm.set(((IntWritable) t).get());
            }
            outputKEY.set(key.getYear(), finalTerm);
            NullWritable nw=NullWritable.get();
            MapWritable outputVALUE=new MapWritable();
            outputVALUE.put(key.getTerm(), nw);
            context.write(outputKEY,outputVALUE);
        }             
    }
  }

第二项质询。它们——我从映射器中获得的键值对——是否可能导致了这种延迟?否则,为什么会发生这种情况?

无论如何,在10个地图任务全部完成后(地图100%左右,减少33%),减速器需要将近2个小时才能完成。既然它是一个身份减少器,这怎么可能呢?

您正在问几个问题,虽然它们是相关的,但它们有不同的答案。我在下面一一回答。

在这样的数据大小下,在映射阶段和减少阶段之间等待这么长时间是正常的吗?

地图和减少阶段之间存在障碍。在所有映射器完成之前,减速器无法启动。有些映射器出现故障,从而减慢了整个映射阶段的速度,并阻塞了reduce阶段。一旦你解决了这个问题,你的减少阶段应该更早开始。

为什么地图任务失败?显然,他们没有报告进展:

[...] failed to report status for 1225 seconds. Killing!

它们——我从映射器中获得的键值对——是否可能导致了这种延迟?否则,为什么会发生这种情况?

我不确定,但我确实看过你的代码,你可以让它运行得更快,如下所示:

1) 将您的Text转换为IntWritable;它看起来像是数字数据(一年),这样做将减少从mapper发送到reducers的数据量。有关提高Hadoop性能的提示,请参阅本页中的提示5。

2) 重新使用你的写字台。您正在为每个迭代创建一个new Text。您会惊讶于这是多么的低效,并且由于在堆中不断创建/取消分配对象而导致糟糕的性能。这个想法是创建一个可写的,然后重用它。有关详细信息,请参阅本页中的技巧6,以提高Hadoop性能。

虽然我不能确定这一点,但我怀疑这可能是您的一些映射器失败的原因。垃圾回收可能会导致程序在完成时暂停,从而不报告进度,结果Hadoop会终止任务。

3) 如果您还没有这样做,请使用多个减速器。请参阅我上面链接的页面中的提示3,了解如何为您的工作设置适当数量的地图和减少任务的一些启发。

无论如何,在10个地图任务全部完成后(地图100%左右,减少33%),减速器需要将近2个小时才能完成。既然它是一个身份减少器,这怎么可能呢?

如果一个(或几个)减速器的数据太多,这可能是正常行为。Shuffling意味着在reduce一侧进行排序。尝试使用sort对Linux盒子中的大文件进行排序。这可能需要很长时间。这就是你工作中正在发生的事情。

相关内容

  • 没有找到相关文章

最新更新