我有一个仅映射作业,它将序列文件(键是文本,值是字节可写)作为序列文件的输入和输出数据(键是空可写的,值是文本)。
爪哇类
import java.io.*;
import java.util.*;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.BytesWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;
public class Test {
public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
Configuration conf = new Configuration();
Job job = new Job(conf, "Test");
job.setJarByClass(Test.class);
job.setMapperClass(TestMapper.class);
job.setInputFormatClass(SequenceFileInputFormat.class);
job.setOutputFormatClass(SequenceFileOutputFormat.class);
job.setOutputKeyClass(NullWritable.class);
job.setOutputValueClass(Text.class);
job.setMapOutputKeyClass(NullWritable.class);
job.setMapOutputValueClass(Text.class);
job.setNumReduceTasks(0);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.submit();
}
public static class TestMapper extends Mapper<Text, BytesWritable, NullWritable, Text> {
Text outValue = new Text("");
int counter = 0;
public void map(Text filename, BytesWritable data, Context context) throws IOException, InterruptedException {
/ logic
}
}
}
从 unix 命令运行作业时它工作正常,当在 oozie 中安排的相同作业看到以下错误时
java.lang.ClassCastException: org.apache.hadoop.io.LongWritable 不能强制转换为 org.apache.hadoop.io.Textat Test$TestMapper.map(Test.java:56)
Oozie 中的作业配置
<configuration>
<property>
<name>mapred.input.dir</name>
<value>${input}</value>
</property>
<property>
<name>mapred.output.dir</name>
<value>/temp</value>
</property>
<property>
<name>mapreduce.map.class</name>
<value>Test$TestMapper</value>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>0</value>
</property>
<property>
<name>mapreduce.job.output.key.class</name>
<value>org.apache.hadoop.io.NullWritable</value>
</property>
<property>
<name>mapreduce.job.output.value.class</name>
<value>org.apache.hadoop.io.Text</value>
</property>
<property>
<name>mapreduce.job.inputformat.class</name>
<value>org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat</value>
</property>
<property>
<name>mapreduce.job.outputformat.class</name>
<value>org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat</value>
</property>
<property>
<name>mapreduce.job.mapinput.key.class</name>
<value>org.apache.hadoop.io.Text</value>
</property>
<property>
<name>mapreduce.job.mapinput.value.class</name>
<value>org.apache.hadoop.io.BytesWritable</value>
</property>
<property>
<name>mapred.reducer.new-api</name>
<value>true</value>
</property>
<property>
<name>mapred.mapper.new-api</name>
<value>true</value>
</property>
有人可以告诉我这里的错误是什么..谢谢
classcast 异常表示 Oozie 仍在使用 TextInputFormat 的默认输入格式,其键类型为 LongWwriteable。由于映射器的键类型为文本,因此映射器引入时存在类型不匹配。所以mapreduce.job.inputformat.class的配置键不正确。
(经过一些试验和错误)
我们发现正确的属性名称是mapreduce.inputformat.class,即:
<property>
<name>mapreduce.inputformat.class</name>
<value>org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat</value>
</property>