我是Hadoop的新手,我已经挣扎了两天来弄清楚为什么输出。Collect没有收集正确的值。
我解释一下:事实上,(为了简化)我有以下map方法:
public
void
map(LongWritable key, Text value, OutputCollector<Text, MyObject> output, Reporter reporter)
throws IOException {
try {
ForXmlHandling message = (ForXmlHandling) unmarshaller.unmarshal(new StringReader(value.toString()));
MyObject row = XmlParser.parse(message);
row.setOrigin(true);
output.collect(new Text(row.getPnrRecordKey().toString()), row);
}
catch(JAXBException e) {
LOG.debug(e);
}
}
其中MyObject是我创建的对象:
public class MyObject {
private boolean original;
private boolean split;
....
}
事实上,当我在调试模式下仅启动映射器时,即使我将row (MyObject)的origin属性设置为true,映射器(output.collect)的输出始终是origin属性设置为false(布尔值的默认值)的row。我不明白output.collect.
任何帮助都非常受欢迎。谢谢!
谢谢你的回答!实际上,问题来自readFields和write的实现,因为我没有调用:
//write
_original.write(out);
_split.write(out);
//readFields
_original = new BooleanWritable();
_split = new BooleanWritable();
_original.readFields(in);
_split.readFields(in);