我写了一个MR程序来估计PI(3.141592……),如下所示,但我遇到了一个问题:
这个框架发射的map任务数量是11,下面是输出(总共35行)。但我预计输出是11行。有什么我想念的吗?
内圆78534096内圆78539304内圆78540871内圆78537925内圆78537161内圆78544419内圆78537045内圆78534861内圆78545779内圆78528890内圆78540007内圆78542686内圆78534539内圆78538255内圆78543392内圆78543191内圆78540938内圆78534882内圆78536155内圆78545739内圆78541807内圆78540635内圆78547561内圆78540521内圆78541320内圆78537605内圆78541379内圆78540408内圆78536238内圆78539614内圆78539773内圆78537169内圆78541707内圆78537141内圆78538045
//猪肉开始进口
公共类PiEstimation{
public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, LongWritable> {
private final static Text INCIRCLE = new Text("INCIRCLE");
private final static LongWritable TimesInAMap = new LongWritable(100000000);
private static Random random = new Random();
public class MyPoint {
private double x = 0.0;
private double y = 0.0;
MyPoint(double _x,double _y) {
this.x = _x;
this.y = _y;
}
public boolean inCircle() {
if ( ((x-0.5)*(x-0.5) + (y-0.5)*(y-0.5)) <= 0.25 )
return true;
else
return false;
}
public void setPoint(double _x,double _y) {
this.x = _x;
this.y = _y;
}
}
public void map(LongWritable key, Text value, OutputCollector<Text, LongWritable> output, Reporter reporter) throws IOException {
long i = 0;
long N = TimesInAMap.get();
MyPoint myPoint = new MyPoint(random.nextDouble(),random.nextDouble());
long sum = 0;
while (i < N ) {
if (myPoint.inCircle()) {
sum++;
}
myPoint.setPoint(random.nextDouble(),random.nextDouble());
i++;
}
output.collect(INCIRCLE, new LongWritable(sum));
}
}
public static class Reduce extends MapReduceBase implements Reducer<Text, LongWritable, Text, LongWritable> {
public void reduce(Text key, Iterator<LongWritable> values, OutputCollector<Text, LongWritable> output, Reporter reporter) throws IOException {
long sum = 0;
while (values.hasNext()) {
//sum += values.next().get();
output.collect(key, values.next());
}
//output.collect(key, new LongWritable(sum));
}
}
public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(PiEstimation.class);
conf.setJobName("PiEstimation");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(LongWritable.class);
conf.setMapperClass(Map.class);
conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
conf.setNumMapTasks(10);
conf.setNumReduceTasks(1);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
JobClient.runJob(conf);
}
}
启动的映射任务的数量由许多因素决定,主要是输入格式、将输入文件分块到的相关块大小,以及输入文件本身是否是"可拆分的"
另外,调用映射的次数取决于每个映射拆分中的记录数(映射器正在处理的数据)。
假设您有一个100行的文本文件用于输入-很可能这将由一个Mapper处理,但映射方法被调用100次-输入文件中的每行调用一次
如果计算输入文件中的行数,即在所有映射器中调用映射的次数。很难确定每个Mapper中会调用多少次map。