大家好!我有一个关于eclipse中hadoop的程序,源代码是:
public class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
protected void map(Object key, Text value, Context context) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while(itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable result = new IntWritable();
@Override
protected void reduce(Text key, Iterable<IntWritable> values,
Context context) throws IOException, InterruptedException {
int sum = 0;
for(IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public class WordCount {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] oargs = new GenericOptionsParser(conf, args).getRemainingArgs();
if(oargs.length != 2) {
System.err.println("Usage: word count <in> <out>");
}
System.out.println("input: "+oargs[0]);
System.out.println("output: "+oargs[1]);
Job job = new Job(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(oargs[0]));
FileOutputFormat.setOutputPath(job, new Path(oargs[1]));
System.out.println("==============================");
System.out.println("start ...");
boolean flag = job.waitForCompletion(true);
System.out.println(flag);
System.out.println("end ...");
System.out.println("==============================");
}
}
结果是,请查看日志:
rory@0303 /cygdrive/f/develop/hadoop/hadoop-1.0.3
$ ./bin/hadoop jar ./jar/wordcount.jar /tmp/input /tmp/output
input: /tmp/input
output: /tmp/output
==============================
start ...
12/07/25 14:59:17 INFO input.FileInputFormat: Total input paths to process : 2
12/07/25 14:59:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
12/07/25 14:59:17 WARN snappy.LoadSnappy: Snappy native library not loaded
12/07/25 14:59:17 INFO mapred.JobClient: Running job: job_201207251447_0001
12/07/25 14:59:18 INFO mapred.JobClient: map 0% reduce 0%
日志不会永远停留在那里。为什么?
我在本地模式下运行代码,由cygwin软件在windowsxp系统中运行。
@Rory,正如Thomas所问的,你能更具体地说明"下一步做"吗?这是你在屏幕上看到的整个堆栈轨迹吗?你的意思是说你已经编译了一次,然后得到了结果,却无法再次运行它吗?您是否为eclipse IDE上的程序指定了正确的输入参数,即输入和输出目录?
如果您的意思是第二次无法再次运行程序,可能是您没有指定其他输出目录。但我想,在看到堆栈跟踪后,情况并非如此。
我想如果你问为什么你从来没有看到end ====================
println部分,那么请检查你的代码:
System.exit(job.waitForCompletion(true)?0:1);
System.out.println("end ...");
System.out.println("==============================");
您正在用System.exit
包装job.waitForCompletion(true)
调用,因此JVM将在最后两个System.out可以执行之前终止。
编辑
这里的log appender/logger消息表明,任何其他异常都可能被吞噬。您应该修改代码的签名以使用ToolRunner实用程序:
public class WordCount {
public static void main(String[] args) throws Exception {
ToolRunner.run(new WordCount(), args);
}
public int run(String args[]) {
if(args.length != 2) {
System.err.println("Usage: word count <in> <out>");
}
System.out.println("input: "+args[0]);
System.out.println("output: "+args[1]);
Job job = new Job(getConf(), "word count");
Configuration conf = job.getConf();
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.out.println("==============================");
System.out.println("start ...");
int result = job.waitForCompletion(true) ? 0 : 1;
System.out.println("end ...");
System.out.println("==============================");
return results
}
}
您应该使用$HADOOP_HOME/bin/HADOOP脚本将作业提交到集群(如下所示,您需要替换jar的名称和WordCount类的完全限定名称):
#> hadoop jar wordcount.jar WordCount input output