在Hadoop MapReduce中,Reducer类没有按预期工作



我尝试在Mapreduce中实现简单的组by。

我的输入文件如下:

7369,SMITH,CLERK,800,20
7499,ALLEN,SALESMAN,1600,30
7521,WARD,SALESMAN,1250,30
7566,JONES,MANAGER,2975,20
7654,MARTIN,SALESMAN,1250,30
7698,BLAKE,MANAGER,2850,30
7782,CLARK,MANAGER,2450,10
7788,SCOTT,ANALYST,3000,20
7839,KING,PRESIDENT,5000,10
7844,TURNER,SALESMAN,1500,30
7876,ADAMS,CLERK,1100,20
7900,JAMES,CLERK,950,30
7902,FORD,ANALYST,3000,20
7934,MILLER,CLERK,1300,10

My Mapper Class:

public class Groupmapper extends Mapper<Object,Text,IntWritable,IntWritable> {
    @Override
    public void map(Object key, Text value, Context context) throws IOException, InterruptedException{
        String line = value.toString();
        String[] parts=line.split(",");
        String token1=parts[3];
        String token2=parts[4];
        int deptno=Integer.parseInt(token2);
        int sal=Integer.parseInt(token1);
        context.write(new IntWritable(deptno),new IntWritable(sal));
    }    
}

减速机类:

public class Groupreducer extends Reducer<IntWritable, IntWritable, IntWritable, IntWritable> {
    IntWritable result=new IntWritable();
    public void Reduce(IntWritable key,Iterable<IntWritable> values, Context context) throws IOException, InterruptedException{
        int sum=0;
        for(IntWritable val:values){
            sum+=val.get();
        }
        result.set(sum);
        context.write(key,result);
    }
}

驱动程序类:

public class Group {
    public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
        Configuration conf=new Configuration();
        Job job=Job.getInstance(conf,"Group");
        job.setJarByClass(Group.class);
        job.setMapperClass(Groupmapper.class);
        job.setCombinerClass(Groupreducer.class);
        job.setReducerClass(Groupreducer.class);
        job.setOutputKeyClass(IntWritable.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);         
    }
}

期望输出:

10      8750
20      10875
30      9400

但是它打印如下所示的输出。它没有把这些值加起来。它就像恒等减速器。

10      1300
10      5000
10      2450
20      1100
20      3000
20      800
20      2975
20      3000
30      1500
30      1600
30      2850
30      1250
30      1250
30      950

减速器功能不正常

看起来好像没有使用reduce。因此,调试的下一个步骤将是仔细检查您的reducer。

如果你在reduce方法中添加一个@Override(就像你在map方法中所做的那样),你会看到你得到一个Method does not override method from its superclass错误。这意味着hadoop不会使用你的reduce方法,而会使用默认的身份实现。

问题是你有:

public void Reduce(IntWritable key,Iterable<IntWritable> values, Context context)

应该是:

public void reduce(IntWritable key,Iterable<IntWritable> values, Context context)

唯一的区别是方法名应该以小写的r开头

相关内容

  • 没有找到相关文章

最新更新