在hadoop mapreduce中分离输出文件

我的问题可能已经被问过了，但我找不到明确的答案。

我的MapReduce是一个基本的WordCount。我当前的输出文件是：

// filename : 'part-r-00000'
789  a
755  #c   
456  d
123  #b

如何更改输出文件名？

那么，是否可以有两个输出文件：

// First output file
789  a
456  d
// Second output file
123  #b
755  #c

这是我的reduce类：

public static class SortReducer extends Reducer<IntWritable, Text, IntWritable, Text> {
    public void reduce(IntWritable key, Text value, Context context) throws IOException, InterruptedException {
        context.write(key, value);
    }
}

这是我的分区器类：

public class TweetPartitionner extends Partitioner<Text, IntWritable>{
    @Override
    public int getPartition(Text a_key, IntWritable a_value, int a_nbPartitions) {
        if(a_key.toString().startsWith("#"))
            return 1;
        return 0;
    }

}

非常感谢！

关于如何更改输出文件名的其他问题，您可以查看http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html#write（java.lang.String，K，V）。

在作业文件中设置

job.setNumReduceTasks(2);

从映射器发射

写一个partitioner，将partitioner添加到作业配置中，在partitioner中检查密钥是否以#return 1 else 0 开头

在reducer中交换密钥和值

相关内容

最新更新

热门标签：