如果我们通过在Mapreduce中保留mapper和combiner来跳过reducer，会发生什么

我的输入文件大小为10 GB，位于

/user/cloudera/inputfiles/records.txt

这是我的驾驶员等级代码：

public class WordCountMain {
/**
 * @param args
 */
public static void main(String[] args) throws Exception {
    // TODO Auto-generated method stub
    Configuration conf = new Configuration();
    Path inputFilePath = new Path(args[0]);
    Path outputFilePath = new Path(args[1]);


Job job = new Job(conf,"word count");
job.getConfiguration().set("mapred.job.queue.name","omega");
    job.setJarByClass(WordCountMain.class);

    FileInputFormat.addInputPath(job, inputFilePath);
    FileOutputFormat.setOutputPath(job, outputFilePath);
    job.setInputFormatClass(TextInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);
    job.setMapperClass(WordCountMapper.class);
    job.setCombinerClass(WordCountCombiner.class);
    job.setNumReduceTasks(0);
    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(IntWritable.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    System.exit(job.waitForCompletion(true) ? 0 : 1);
}

}

我有Mapper和Combiner的代码，我已经将reducer设置为零

这是我的Mapper代码：

public class WordCountMapper extends Mapper<Object,Text,Text,IntWritable>
{
public static IntWritable one = new IntWritable(1);
    protected void map(Object key, Text value, Context context) throws java.io.IOException,java.lang.InterruptedException
    {
    String line =   value.toString();
    String eachWord =null;
    StringTokenizer st = new StringTokenizer(line,"|");
    while(st.hasMoreTokens())
    {
        eachWord = st.nextToken();
        context.write(new Text(eachWord), one);
    }

    }
}

我已经写了我自己的组合

这是我的组合器代码：

public class WordCountCombiner extends Reducer<Text ,IntWritable,Text,IntWritable> {

protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws java.io.IOException, java.lang.InterruptedException
{
    int count =0;
    for(IntWritable i : values)
    {
        count =count+i.get();
    }
    context.write(key, new IntWritable(count));
}
}

我的问题是它将存储什么输出。

映射器的输出还是组合器的输出？

或者只有在写入了减速器相位的情况下，Combiner才会被执行？

请帮助

您无法确定组合器函数将运行多少次，也无法确定它是否会运行。此外，运行合并器并不取决于是否为作业指定reducer。在您的情况下，它只会生成160个输出文件（10240/64=160）

通过跳过mapper和reducer的设置，hadoop将继续使用其默认映射。例如，它将使用

IdentityMapper.class作为默认映射程序。
默认的输入格式是TextInputFormat。
默认的分区器是HashPartitione。
默认情况下，只有一个减速器，因此只有一个分区。
默认的减速器是reducer，也是一种通用类型。
默认的输出格式是TextOutputFormat，它通过将键和值转换为字符串并用制表符分隔来写出记录，每行一条

相关内容

最新更新

热门标签：