在Spring数据hadoop上运行作业时出现问题



我已经使用Mahout 创建了以下Mapper和Reducer

package mypackage.ItemSimilarity;
import java.io.IOException;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.mahout.math.VarLongWritable;
public class ItemPrefMapper extends
Mapper<LongWritable, Text, VarLongWritable, VarLongWritable> {
private static final Pattern NUMBERS = Pattern.compile("(\d+)");
@Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
Matcher m = NUMBERS.matcher(line);
m.find();
VarLongWritable userID = new VarLongWritable(Long.parseLong(m.group()));
VarLongWritable itemID = new VarLongWritable();
while (m.find()) {
itemID.set(Long.parseLong(m.group()));
context.write(userID, itemID);
}
}
}

降低等级

package mypackage.ItemSimilarity;
import java.io.IOException;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.mahout.math.RandomAccessSparseVector;
import org.apache.mahout.math.VarLongWritable;
import org.apache.mahout.math.Vector;
import org.apache.mahout.math.VectorWritable;
public class UserVectorReducer
extends
Reducer<VarLongWritable, VarLongWritable, VarLongWritable, VectorWritable> {
@Override
public void reduce(VarLongWritable userID,
Iterable<VarLongWritable> itemPrefs, Context context)
throws IOException, InterruptedException {
Vector userVector = new RandomAccessSparseVector(Integer.MAX_VALUE, 100);
for (VarLongWritable itemPref : itemPrefs) {
userVector.set((int) itemPref.get(), 1.0f);
}
context.write(userID, new VectorWritable(userVector));
}
}

运行此的弹簧配置

<job id="mahoutJob" input-path="/home/ubuntu/input/data.txt" output-path="/home/ubuntu/output"
mapper="mypackage.ItemSimilarity.ItemPrefMapper" 
reducer="mypackage.ItemSimilarity.UserVectorReducer" 
jar-by-class="mypackage.ItemSimilarity.ItemPrefMapper"/>
<job-runner id="myjob-runner" pre-action="setupScript"  job-ref="mahoutJob" 
run-at-startup="true"/>

当我运行这个时,我得到了以下错误。我已经扩展了Hadoop映射器类,但spring说它不是映射器类。

java.lang.RuntimeException:类mypackage。项目相似性。ItemPrefMapper不是org.apache.hadop.mapreduce.Mapper,位于org.apache.haop.conf.Configuration.setClass(Configuration.java:931)网址:org.apache.hadop.mapreduce.Job.setMapperClass(Job.java:175)在org.springframework.data.hadop.mapreduce.JobFactoryBean.afterPropertiesSet(JobFactoryBean.java:153)位于org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1571)位于org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1509)网址:org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:521)网址:org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:458)

您确定您的jar逐类元素吗?因为它应该指向类似于main方法的东西,在这里实例化ApplicationContext实例。

另外,你确定你的包裹名称吗?

com.threepillar.abs.ItemSimilarity.ItemPrefMapper

mypackage。项目相似性。ItemPrefMapper

最新更新