hadoop映射减少作业无输出

我在Netbeans中编写MapReduce作业，并生成（也在NB中）一个jar文件。当我尝试在hadoop（1.2.1版）中执行此作业时，我会执行以下命令：

$ hadoop jar job.jar org.job.mainClass /home/user/in.txt /home/user/outdir

此命令不显示任何错误，但不会创建outdir、outfiles。。。

这是我的工作代码：

映射器

public class Mapper extends MapReduceBase implements org.apache.hadoop.mapred.Mapper<LongWritable, Text, Text, IntWritable> {
            private final IntWritable one = new IntWritable(1);
            private Text company = new Text("");

            @Override
            public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
                company.set(value.toString());
                output.collect(value, one);
            }
        }

还原剂

public class Reducer extends MapReduceBase implements org.apache.hadoop.mapred.Reducer<Text, IntWritable, Text, IntWritable> {
    @Override
    public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
        int sum = 0;
        while (values.hasNext()){
            sum++;
            values.next();
        }
        output.collect(key, new IntWritable(sum));
    }
}

主

 public static void main(String[] args) {
    JobConf configuration = new JobConf(CdrMR.class);
    configuration.setJobName("Dedupe companies");
    configuration.setOutputKeyClass(Text.class);
    configuration.setOutputValueClass(IntWritable.class);
    configuration.setMapperClass(Mapper.class);
    configuration.setReducerClass(Reducer.class);
    configuration.setInputFormat(TextInputFormat.class);
    configuration.setOutputFormat(TextOutputFormat.class);
    FileInputFormat.setInputPaths(configuration, new Path(args[0]));
    FileOutputFormat.setOutputPath(configuration, new Path(args[1]));
}

输入文件的格式如下：

name1
name2
name3
...

还说我在虚拟机（Ubuntu 12.04）中执行hadoop时没有root权限。Hadoop是否正在执行作业并将输出文件存储在不同的目录中？

根据本文，您需要使用以下方法提交JobConf：

JobClient.runJob(configuration);

正确的hadoop命令是

hadoop jar myjar packagename.DriverClass input output

病例1

MapReduceProject
    |
    |__ src
         |
         |__ package1
            - Driver
            - Mapper
            - Reducer

然后你可以使用

hadoop jar myjar input output

病例2

MapReduceProject
    |
    |__ src
         |
         |__ package1
         |  - Driver1
         |  - Mapper1
         |  - Reducer1
         |
         |__ package2
            - Driver2
            - Mapper2
            - Reducer2

在这种情况下，您必须指定驱动程序类和hadoop命令。

hadoop jar myjar packagename.DriverClass input output

正确的hadoop命令是

$ hadoop jar job.jar /home/user/in.txt /home/user/outdir

不是

$ hadoop jar job.jar org.job.mainClass /home/user/in.txt /home/user/outdir

Hadoop认为org.job.mainClass是输入文件，in.txt是输出文件。执行的结果是File Already Exist:in.txt。此代码适用于主方法：

public static void main(String[] args) throws FileNotFoundException, IOException {
    JobConf configuration = new JobConf(CdrMR.class);
    configuration.setJobName("Dedupe companies");
    configuration.setOutputKeyClass(Text.class);
    configuration.setOutputValueClass(IntWritable.class);
    configuration.setMapperClass(NameMapper.class);
    configuration.setReducerClass(NameReducer.class);
    configuration.setInputFormat(TextInputFormat.class);
    configuration.setOutputFormat(TextOutputFormat.class);
    FileInputFormat.setInputPaths(configuration, new Path(args[0]));
    FileOutputFormat.setOutputPath(configuration, new Path(args[1]));
    System.out.println("Hello Hadoop");
    System.exit(JobClient.runJob(configuration).isSuccessful() ? 0 : 1);
}

感谢@AlexeyShestakov和@Y.Prithvi

相关内容

最新更新

热门标签：