MapReduce job 在 oozie 中失败



我有一个仅映射作业,它将序列文件(键是文本,值是字节可写)作为序列文件的输入和输出数据(键是空可写的,值是文本)。

爪哇类

import java.io.*;
import java.util.*;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.BytesWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;
public class Test {
    public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
        Configuration conf = new Configuration();
        Job job = new Job(conf, "Test");
        job.setJarByClass(Test.class);
        job.setMapperClass(TestMapper.class);
        job.setInputFormatClass(SequenceFileInputFormat.class);
        job.setOutputFormatClass(SequenceFileOutputFormat.class);
        job.setOutputKeyClass(NullWritable.class);
        job.setOutputValueClass(Text.class);
        job.setMapOutputKeyClass(NullWritable.class);
        job.setMapOutputValueClass(Text.class);
        job.setNumReduceTasks(0);
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        job.submit();
    }
    public static class TestMapper extends Mapper<Text, BytesWritable, NullWritable, Text> {
        Text outValue = new Text("");
        int counter = 0;
        public void map(Text filename, BytesWritable data, Context context) throws IOException, InterruptedException {
        / logic
              }
    }
}

从 unix 命令运行作业时它工作正常,当在 oozie 中安排的相同作业看到以下错误时

java.lang.ClassCastException: org.apache.hadoop.io.LongWritable 不能强制转换为 org.apache.hadoop.io.Textat Test$TestMapper.map(Test.java:56)

Oozie 中的作业配置

<configuration>
<property>
<name>mapred.input.dir</name>
<value>${input}</value>
</property>
<property>
<name>mapred.output.dir</name>
<value>/temp</value>
</property>
<property>
<name>mapreduce.map.class</name>
<value>Test$TestMapper</value>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>0</value>
</property>
<property>
<name>mapreduce.job.output.key.class</name>
<value>org.apache.hadoop.io.NullWritable</value>
</property>
<property>
<name>mapreduce.job.output.value.class</name>
<value>org.apache.hadoop.io.Text</value>
</property>
<property>
<name>mapreduce.job.inputformat.class</name>
<value>org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat</value>
</property>
<property>
<name>mapreduce.job.outputformat.class</name>
<value>org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat</value>
</property>
<property>
<name>mapreduce.job.mapinput.key.class</name>
<value>org.apache.hadoop.io.Text</value>
</property>
<property>
<name>mapreduce.job.mapinput.value.class</name>
<value>org.apache.hadoop.io.BytesWritable</value>
</property>
<property>
<name>mapred.reducer.new-api</name>
<value>true</value>
</property>
<property>
<name>mapred.mapper.new-api</name>
<value>true</value>
</property>

有人可以告诉我这里的错误是什么..谢谢

classcast 异常表示 Oozie 仍在使用 TextInputFormat 的默认输入格式,其键类型为 LongWwriteable。由于映射器的键类型为文本,因此映射器引入时存在类型不匹配。所以mapreduce.job.inputformat.class的配置键不正确。

(经过一些试验和错误)

我们发现正确的属性名称是mapreduce.inputformat.class,即:

<property>
    <name>mapreduce.inputformat.class</name>
    <value>org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat</value>
</property>

相关内容

  • 没有找到相关文章

最新更新