映射()函数的调用次数和MR作业中发射的映射任务数之间的联系



我写了一个MR程序来估计PI(3.141592……),如下所示,但我遇到了一个问题:

这个框架发射的map任务数量是11,下面是输出(总共35行)。但我预计输出是11行。有什么我想念的吗?

内圆78534096内圆78539304内圆78540871内圆78537925内圆78537161内圆78544419内圆78537045内圆78534861内圆78545779内圆78528890内圆78540007内圆78542686内圆78534539内圆78538255内圆78543392内圆78543191内圆78540938内圆78534882内圆78536155内圆78545739内圆78541807内圆78540635内圆78547561内圆78540521内圆78541320内圆78537605内圆78541379内圆78540408内圆78536238内圆78539614内圆78539773内圆78537169内圆78541707内圆78537141内圆78538045

//猪肉开始进口

公共类PiEstimation{

    public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, LongWritable> {
            private final static Text  INCIRCLE             = new Text("INCIRCLE");
            private final static LongWritable TimesInAMap   = new LongWritable(100000000);
            private static Random random = new Random();
            public  class MyPoint {
                    private double  x = 0.0;
                    private double  y = 0.0;
                    MyPoint(double _x,double _y) {
                            this.x = _x;
                            this.y = _y;
                    }
                    public boolean inCircle() {
                            if ( ((x-0.5)*(x-0.5) + (y-0.5)*(y-0.5)) <= 0.25 )
                                    return true;
                            else
                                    return false;
                    }
                    public void setPoint(double _x,double _y) {
                            this.x = _x;
                            this.y = _y;
                    }
            }
            public void map(LongWritable key, Text value, OutputCollector<Text, LongWritable> output, Reporter reporter) throws IOException {
                            long i = 0;
                            long N = TimesInAMap.get();
                            MyPoint myPoint = new MyPoint(random.nextDouble(),random.nextDouble());
                            long sum = 0;
                            while (i < N ) {
                            if (myPoint.inCircle()) {                                           
                                sum++;
                            }
                            myPoint.setPoint(random.nextDouble(),random.nextDouble());
                            i++;
                            }
                            output.collect(INCIRCLE, new LongWritable(sum));
                            }
            }

    public static class Reduce extends MapReduceBase implements Reducer<Text, LongWritable, Text, LongWritable> {
  public void reduce(Text key, Iterator<LongWritable> values, OutputCollector<Text, LongWritable> output, Reporter reporter) throws IOException {
      long sum = 0;
      while (values.hasNext()) {
        //sum += values.next().get();
        output.collect(key, values.next());
      }
      //output.collect(key, new LongWritable(sum));
  }
  }
    public static void main(String[] args) throws Exception {
  JobConf conf = new JobConf(PiEstimation.class);
  conf.setJobName("PiEstimation");
  conf.setOutputKeyClass(Text.class);
  conf.setOutputValueClass(LongWritable.class);
  conf.setMapperClass(Map.class);
  conf.setCombinerClass(Reduce.class);
  conf.setReducerClass(Reduce.class);
  conf.setInputFormat(TextInputFormat.class);
  conf.setOutputFormat(TextOutputFormat.class);
  conf.setNumMapTasks(10);
  conf.setNumReduceTasks(1);
  FileInputFormat.setInputPaths(conf, new Path(args[0]));
  FileOutputFormat.setOutputPath(conf, new Path(args[1]));
  JobClient.runJob(conf);
}

}

启动的映射任务的数量由许多因素决定,主要是输入格式、将输入文件分块到的相关块大小,以及输入文件本身是否是"可拆分的"

另外,调用映射的次数取决于每个映射拆分中的记录数(映射器正在处理的数据)。

假设您有一个100行的文本文件用于输入-很可能这将由一个Mapper处理,但映射方法被调用100次-输入文件中的每行调用一次

如果计算输入文件中的行数,即在所有映射器中调用映射的次数。很难确定每个Mapper中会调用多少次map。

相关内容

  • 没有找到相关文章

最新更新