我正在学习Hadoop。我有两个Mapper,它们都处理不同的文件,还有一个Reducer,它将两个Mappers的输入组合在一起。
输入:文件1:
1,Abc
2,Mno
3,Xyz
文件2:
1,CS
2,EE
3,CS
预期输出:
1 1,Abc,CS
2 2,Mno,EE
3 3,Xyz,CS
获取输出:
1 1,,CS
2 2,Mno,
3 3,Xyz,
我的代码:
映射器1:
public class NameMapper extends MapReduceBase implements
Mapper<LongWritable, Text, LongWritable, UserWritable> {
@Override
public void map(LongWritable key, Text value,
OutputCollector<LongWritable, UserWritable> output, Reporter reporter)
throws IOException {
String val[] = value.toString().split(",");
LongWritable id = new LongWritable(Long.parseLong(val[0]));
Text name = new Text(val[1]);
output.collect(id, new UserWritable(id, name, new Text("")));
}
}
映射2:
public class DepartmentMapper extends MapReduceBase implements
Mapper<LongWritable, Text, LongWritable, UserWritable> {
@Override
public void map(LongWritable key, Text value,
OutputCollector<LongWritable, UserWritable> output, Reporter reporter)
throws IOException {
String val[] = value.toString().split(",");
LongWritable id = new LongWritable(Integer.parseInt(val[0]));
Text department = new Text(val[1]);
output.collect(id, new UserWritable(id, new Text(""), department));
}
}
减速器:
public class JoinReducer extends MapReduceBase implements
Reducer<LongWritable, UserWritable, LongWritable, UserWritable> {
@Override
public void reduce(LongWritable key, Iterator<UserWritable> values,
OutputCollector<LongWritable, UserWritable> output,
Reporter reporter) throws IOException {
UserWritable user = new UserWritable();
while (values.hasNext()) {
UserWritable u = values.next();
user.setId(u.getId());
if (!(u.getName().equals(""))) {
user.setName(u.getName());
}
if (!(u.getDepartment().equals(""))) {
user.setDepartment(u.getDepartment());
}
}
output.collect(user.getId(), user);
}
}
驱动程序:
public class Driver extends Configured implements Tool {
public int run(String[] args) throws Exception {
JobConf conf = new JobConf(getConf(), Driver.class);
conf.setJobName("File Join");
conf.setOutputKeyClass(LongWritable.class);
conf.setOutputValueClass(UserWritable.class);
conf.setReducerClass(JoinReducer.class);
MultipleInputs.addInputPath(conf, new Path("/user/hadoop/join/f1"),
TextInputFormat.class, NameMapper.class);
MultipleInputs.addInputPath(conf, new Path("/user/hadoop/join/f2"),
TextInputFormat.class, DepartmentMapper.class);
Path output = new Path("/user/hadoop/join/output");
FileSystem.get(new URI(output.toString()), conf).delete(output);
FileOutputFormat.setOutputPath(conf, output);
JobClient.runJob(conf);
return 0;
}
public static void main(String[] args) throws Exception {
int result = ToolRunner.run(new Configuration(), new Driver(), args);
System.exit(result);
}
}
用户可写:
public class UserWritable implements Writable {
private LongWritable id;
private Text name;
private Text department;
public UserWritable() {
}
public UserWritable(LongWritable id, Text name, Text department) {
super();
this.id = id;
this.name = name;
this.department = department;
}
public LongWritable getId() {
return id;
}
public void setId(LongWritable id) {
this.id = id;
}
public Text getName() {
return name;
}
public void setName(Text name) {
this.name = name;
}
public Text getDepartment() {
return department;
}
public void setDepartment(Text department) {
this.department = department;
}
@Override
public void readFields(DataInput in) throws IOException {
id = new LongWritable(in.readLong());
name = new Text(in.readUTF());
department = new Text(in.readUTF());
}
@Override
public void write(DataOutput out) throws IOException {
out.writeLong(id.get());
out.writeUTF(name.toString());
out.writeUTF(department.toString());
}
@Override
public String toString() {
return id.get() + "," + name.toString() + "," + department.toString();
}
}
Reducer应该为每个UserId获得2个UserWritable对象;第一个有身份证,姓名,第二个有身份证件,部门。有人能解释一下我在哪里犯的错吗?
我在代码中发现了这个问题。
u.getName()
return Text对象。
u.getName().toString()
解决了问题。