我尝试在Mapreduce中实现简单的组by。
我的输入文件如下:
7369,SMITH,CLERK,800,20
7499,ALLEN,SALESMAN,1600,30
7521,WARD,SALESMAN,1250,30
7566,JONES,MANAGER,2975,20
7654,MARTIN,SALESMAN,1250,30
7698,BLAKE,MANAGER,2850,30
7782,CLARK,MANAGER,2450,10
7788,SCOTT,ANALYST,3000,20
7839,KING,PRESIDENT,5000,10
7844,TURNER,SALESMAN,1500,30
7876,ADAMS,CLERK,1100,20
7900,JAMES,CLERK,950,30
7902,FORD,ANALYST,3000,20
7934,MILLER,CLERK,1300,10
My Mapper Class:
public class Groupmapper extends Mapper<Object,Text,IntWritable,IntWritable> {
@Override
public void map(Object key, Text value, Context context) throws IOException, InterruptedException{
String line = value.toString();
String[] parts=line.split(",");
String token1=parts[3];
String token2=parts[4];
int deptno=Integer.parseInt(token2);
int sal=Integer.parseInt(token1);
context.write(new IntWritable(deptno),new IntWritable(sal));
}
}
减速机类:
public class Groupreducer extends Reducer<IntWritable, IntWritable, IntWritable, IntWritable> {
IntWritable result=new IntWritable();
public void Reduce(IntWritable key,Iterable<IntWritable> values, Context context) throws IOException, InterruptedException{
int sum=0;
for(IntWritable val:values){
sum+=val.get();
}
result.set(sum);
context.write(key,result);
}
}
驱动程序类:
public class Group {
public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
Configuration conf=new Configuration();
Job job=Job.getInstance(conf,"Group");
job.setJarByClass(Group.class);
job.setMapperClass(Groupmapper.class);
job.setCombinerClass(Groupreducer.class);
job.setReducerClass(Groupreducer.class);
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
期望输出:
10 8750
20 10875
30 9400
但是它打印如下所示的输出。它没有把这些值加起来。它就像恒等减速器。
10 1300
10 5000
10 2450
20 1100
20 3000
20 800
20 2975
20 3000
30 1500
30 1600
30 2850
30 1250
30 1250
30 950
减速器功能不正常
看起来好像没有使用reduce。因此,调试的下一个步骤将是仔细检查您的reducer。
如果你在reduce方法中添加一个@Override
(就像你在map方法中所做的那样),你会看到你得到一个Method does not override method from its superclass
错误。这意味着hadoop不会使用你的reduce方法,而会使用默认的身份实现。
问题是你有:
public void Reduce(IntWritable key,Iterable<IntWritable> values, Context context)
应该是:
public void reduce(IntWritable key,Iterable<IntWritable> values, Context context)
唯一的区别是方法名应该以小写的r
开头