我是Hadoop的新手,遇到了这个问题。我正在尝试将化简器的默认文本,整数值更改为文本,文本。我想映射文本,不可写,然后在化简器中,我想有 2 个计数器,具体取决于值是多少,然后将这 2 个计数器写入收集器的文本中。
public class WordCountMapper extends MapReduceBase
implements Mapper<LongWritable, Text, Text, IntWritable> {
private final IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable>
output, Reporter reporter) throws IOException {
String line = value.toString();
String[] words = line.split(",");
String[] date = words[2].split(" ");
word.set(date[0]+" "+date[1]+" "+date[2]);
if(words[0].contains("0"))
one.set(0);
else
one.set(4);
output.collect(word, one);
}
}
-----------------------------------------------------------------------------------
public class WordCountReducer extends MapReduceBase
implements Reducer<Text, IntWritable, Text, Text> {
public void reduce(Text key,Iterator<IntWritable> values,
OutputCollector<Text, Text> output,
Reporter reporter) throws IOException {
int sad = 0;
int happy = 0;
while (values.hasNext()) {
IntWritable value = (IntWritable) values.next();
if(value.get() == 0)
sad++; // process value
else
happy++;
}
output.collect(key, new Text("sad:"+sad+", happy:"+happy));
}
}
---------------------------------------------------------------------------------
public class WordCount {
public static void main(String[] args) {
JobClient client = new JobClient();
JobConf conf = new JobConf(WordCount.class);
// specify output types
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
// specify input and output dirs
FileInputFormat.addInputPath(conf, new Path("input"));
FileOutputFormat.setOutputPath(conf, new Path("output"));
// specify a mapper
conf.setMapperClass(WordCountMapper.class);
// specify a reducer
conf.setReducerClass(WordCountReducer.class);
conf.setCombinerClass(WordCountReducer.class);
client.setConf(conf);
try {
JobClient.runJob(conf);
} catch (Exception e) {
e.printStackTrace();
}
}
}
我收到此错误:
10-14-12 18:11:01 信息映射。作业客户端: 任务 ID : attempt_201412100143_0008_m_000000_0,状态:失败 java.io.IOException:溢出失败 at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:425( at WordCountMapper.map(WordCountMapper.java:31( at WordCountMapper.map(WordCountMapper.java:1( at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47( at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227( at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209( 由以下原因引起:java.io.IOException:错误值 类:类 org.apache.hadoop.io.Text is not class org.apache.hadoop.io.IntWritable at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:143( at org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:626( at WordCountReducer.reduce(WordCountReducer.java:29( at WordCountReducer.reduce(WordCountReducer.java:1( at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.combineAndSpill(MapTask.java:904( at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:785( at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1600(MapTask.java:286( at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:712(
在此之后,该错误会重复几次。有人可以解释为什么会发生此错误吗?我搜索了与此类似的错误,但我发现的只是映射器和化简器的键值类型不匹配,但正如我所看到的,映射器和化简器的键值类型匹配。提前谢谢你。
尝试评论
conf.setCombinerClass(WordCountReducer.class);
然后跑。
这是因为数据缓冲区可能会变满。
溢出错误
还包括
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
因为映射和化简器发出不同的键值数据类型。
如果两者都发出相同的数据类型,则
job.setOutputKeyClass();
job.setOutputValueClass();
就够了。
在 WordCount 类的这一行中,它应该是
conf.setOutputValueClass(Text.class);