我有一个Map reduce程序,它在独立(Ecllipse)模式下成功运行,但同时试图通过在集群中导出jar来运行相同的MR。它显示了这样的空指针异常,
13/06/26 05:46:22 ERROR mypackage.HHDriver: Error while configuring run method.
java.lang.NullPointerException
我使用以下代码来运行方法。
public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
Configuration configuration = new Configuration();
Tool headOfHouseHold = new HHDriver();
try {
ToolRunner.run(configuration,headOfHouseHold,args);
} catch (Exception exception) {
exception.printStackTrace();
LOGGER.error("Error while configuring run method", exception);
// System.exit(1);
}
}
运行方法:
if (args != null && args.length == 8) {
// Setting the Configurations
GenericOptionsParser genericOptionsParser=new GenericOptionsParser(args);
Configuration configuration=genericOptionsParser.getConfiguration();
//Configuration configuration = new Configuration();
configuration.set("fs.default.name", args[0]);
configuration.set("mapred.job.tracker", args[1]);
configuration.set("deltaFlag",args[2]);
configuration.set("keyPrefix",args[3]);
configuration.set("outfileName",args[4]);
configuration.set("Inpath",args[5]);
String outputPath=args[6];
configuration.set("mapred.map.tasks.speculative.execution", "false");
configuration.set("mapred.reduce.tasks.speculative.execution", "false");
// To avoid the creation of _LOG and _SUCCESS files
configuration.set("mapreduce.fileoutputcommitter.marksuccessfuljobs", "false");
configuration.set("hadoop.job.history.user.location", "none");
configuration.set(Constants.MAX_NUM_REDUCERS,args[7]);
// Configuration of the MR-Job
Job job = new Job(configuration, "HH Job");
job.setJarByClass(HHDriver.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setNumReduceTasks(HouseHoldingHelper.numReducer(configuration));
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
MultipleOutputs.addNamedOutput(job,configuration.get("outfileName"),
TextOutputFormat.class,Text.class,Text.class);
// Deletion of OutputTemp folder (if exists)
FileSystem fileSystem = FileSystem.get(configuration);
Path path = new Path(outputPath);
if (path != null /*&& path.depth() >= 5*/) {
fileSystem.delete(path, true);
}
// Deletion of empty files in the output (if exists)
FileStatus[] fileStatus = fileSystem.listStatus(new Path(outputPath));
for (FileStatus file : fileStatus) {
if (file.getLen() == 0) {
fileSystem.delete(file.getPath(), true);
}
}
// Setting the Input/Output paths
FileInputFormat.setInputPaths(job, new Path(configuration.get("Inpath")));
FileOutputFormat.setOutputPath(job, new Path(outputPath));
job.setMapperClass(HHMapper.class);
job.setReducerClass(HHReducer.class);
job.waitForCompletion(true);
return job.waitForCompletion(true) ? 0 : 1;
我仔细检查了不为null的运行方法参数,它也在独立模式下运行。。
问题可能是因为hadoop配置没有正确传递给程序。你可以试着把这个放在你的驱动程序类的开头:
GenericOptionsParser genericOptionsParser=new GenericOptionsParser(args[]);
Configuration hadoopConfiguration=genericOptionsParser.getConfiguration();
然后在初始化对象时使用hadoopConfiguration
对象。
例如
public int run(String[] args) throws Exception {
GenericOptionsParser genericOptionsParser=new GenericOptionsParser(args[]);
Configuration hadoopConfiguration=genericOptionsParser.getConfiguration();
Job job = new Job(hadoopConfiguration);
//set other stuff
}