改进Wordcount中的身份映射器

我创建了一个map方法，它读取wordcount示例[1]的映射输出。这个例子没有使用MapReduce提供的IdentityMapper.class，但这是我发现的为Wordcount创建工作IdentityMapper的唯一方法。唯一的问题是这个Mapper花的时间比我想的要多。我开始想也许我在做一些多余的事情。有什么帮助来改进我的WordCountIdentityMapper代码吗?

[1]身份映射

public class WordCountIdentityMapper extends MyMapper<LongWritable, Text, Text, IntWritable> {
    private Text word = new Text();
    public void map(LongWritable key, Text value, Context context
    ) throws IOException, InterruptedException {
        StringTokenizer itr = new StringTokenizer(value.toString());
        word.set(itr.nextToken());
        Integer val = Integer.valueOf(itr.nextToken());
        context.write(word, new IntWritable(val));
    }
    public void run(Context context) throws IOException, InterruptedException {
        while (context.nextKeyValue()) {
            map(context.getCurrentKey(), context.getCurrentValue(), context);
        }
    }
}

[2]生成mapoutput的Map类

public static class MyMap extends Mapper<LongWritable, Text, Text, IntWritable> {
    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();
    public void map(LongWritable key, Text value, Context context
    ) throws IOException, InterruptedException {
        StringTokenizer itr = new StringTokenizer(value.toString());
        while (itr.hasMoreTokens()) {
            word.set(itr.nextToken());
            context.write(word, one);
        }
    }
    public void run(Context context) throws IOException, InterruptedException {
        try {
            while (context.nextKeyValue()) {
                map(context.getCurrentKey(), context.getCurrentValue(), context);
            }
        } finally {
            cleanup(context);
        }
    }
}

谢谢,

解决方法是用indexOf()方法代替StringTokenizer。它的效果要好得多。

相关内容

最新更新

热门标签：