我正在使用这样的MultipleOutputs:
public int run(String[] args) throws Exception {
...
job1.setInputFormatClass(TextInputFormat.class);
job1.setOutputFormatClass(TextOutputFormat.class);
****MultipleOutputs.addNamedOutput(job1, "stopwords", TextOutputFormat.class, Text.class, IntWritable.class);****
...
}
在减速机上
public static class ReduceWordCount extends Reducer<Text, IntWritable, Text, IntWritable> {
private MultipleOutputs<Text, IntWritable> mos;
@Override
public void setup(Context context) {
mos = new MultipleOutputs<Text, IntWritable>(context);
}
@Override
public void reduce(Text word, Iterable<IntWritable> counts, Context context) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable count : counts) {
sum += count.get();
}
if(sum>4000){
context.write(word, new IntWritable(sum));
mos.write("stopwords", new Text(word+", "), sum, "stopwords.csv");
}
}
protected void cleanup(Context context) throws IOException, InterruptedException {
mos.close();
}
}
我得到的输出文件是停用词.csv-r-00000我需要摆脱 -r-00000。我该怎么做?
对于可能关心的人,我在这里找到了一个答案,他在作业完成后重命名了文件
FileSystem hdfs = FileSystem.get(getConf());
FileStatus fs[] = hdfs.listStatus(new Path(args[1]));
if (fs != null){
for (FileStatus aFile : fs) {
if (!aFile.isDir()) {
hdfs.rename(aFile.getPath(), new Path(aFile.getPath().toString()+".txt"));
}
}
}