hadoop arraystible给我一个classcastException



编辑:问题解决 - 我有一个非常愚蠢的错误。

我有一条由地图组成的MapReduce管道,减少,映射和减少。我使用sequenceFileOutputputformat作为第一个降低,第二张地图的序列fileInputformat。我已经查看了它的用法,似乎我正在正确使用它。我投入的类型是刻板和intpairarraystable(一种使用Mahout的Intpairable的自定义抛出子类)。问题在于,在第二张地图中阅读intpairarraytrable时,当我尝试将单独的intpairwritables删除时,我会得到ClassCastException。我不确定这是否是由于我如何使用arraytrable类的错误,还是我使用sequencefile {input,output}格式的错误。我在这里和其他地方都看过很多示例,在我看来,就像我在做正确的事一样,但我仍然遇到了一个错误。有帮助吗?

具体细节:

这是我的第一个还原类:

public static class WalkIdReducer extends MapReduceBase implements
        Reducer<IntWritable, IntPairWritable, IntWritable, IntPairArrayWritable> {
    @Override
    public void reduce(IntWritable walk_id, Iterator<IntPairWritable> values,
            OutputCollector<IntWritable, IntPairArrayWritable> output,
            Reporter reporter) throws IOException {
        ArrayList<IntPairWritable> value_array = new ArrayList<IntPairWritable>();
        while (values.hasNext()) {
            value_array.add(values.next());
        }
        output.collect(walk_id, IntPairArrayWritable.fromArrayList(value_array));
    }
}

和第二映射类:

public static class NodePairMapper extends MapReduceBase implements
        Mapper<IntWritable, IntPairArrayWritable, IntPairWritable, Text> {
    @Override
    public void map(IntWritable key, IntPairArrayWritable value,
            OutputCollector<IntPairWritable, Text> output,
            Reporter reporter) throws IOException {
        // The following line gives a ClassCastException;
        // See IntPairArrayWritable.toArrayList(), below
        ArrayList<IntPairWritable> values = value.toArrayList();
        // other unimportant stuff
    }
}

第一个MapReduce的作业配置相关部分:

    conf.setReducerClass(WalkIdReducer.class);
    conf.setOutputKeyClass(IntWritable.class);
    conf.setOutputValueClass(IntPairArrayWritable.class);
    conf.setOutputFormat(SequenceFileOutputFormat.class);

以及第二个mapReduce:

    conf.setInputFormat(SequenceFileInputFormat.class);
    conf.setMapperClass(NodePairMapper.class);

,最后是我的array trabritable子类:

public static class IntPairArrayWritable extends ArrayWritable
{
    // These two methods are what people say is all you need for
    // creating an ArrayWritable subclass
    public IntPairArrayWritable() {
        super(IntPairArrayWritable.class);
    }
    public IntPairArrayWritable(IntPairWritable[] values) {
        super(IntPairArrayWritable.class, values);
    }
    // Some convenience methods, so I can use ArrayLists in
    // other parts of the code
    public static IntPairArrayWritable fromArrayList(
            ArrayList<IntPairWritable> array) {
        IntPairArrayWritable writable = new IntPairArrayWritable();
        IntPairWritable[] values = new IntPairWritable[array.size()];
        for (int i=0; i<array.size(); i++) {
            values[i] = array.get(i);
        }
        writable.set(values);
        return writable;
    }
    public ArrayList<IntPairWritable> toArrayList() {
        ArrayList<IntPairWritable> array = new ArrayList<IntPairWritable>();
        for (Writable pair : this.get()) {
            // This line is what kills it.  I get a ClassCastException here.
            IntPairWritable int_pair = (IntPairWritable) pair;
            array.add(int_pair);
        }
        return array;
    }
}

我遇到的具体错误是:

java.lang.ClassCastException: WalkAnalyzer$IntPairArrayWritable cannot be cast to org.apache.mahout.common.IntPairWritable
at WalkAnalyzer$IntPairArrayWritable.toArrayList(WalkAnalyzer.java:231)
at WalkAnalyzer$NodePairMapper.map(WalkAnalyzer.java:84)
at WalkAnalyzer$NodePairMapper.map(WalkAnalyzer.java:77)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)

我对为什么从arraytrabtle中从get()方法中出现的内容是 WalkAnalyzer$IntPairArrayWritable的一个实例 - 我期望get()返回IntPairArrayWritable所包含的元素的数组,如API。

编辑

我发现了这个问题。这是我为intpairarraystrable撰写构造函数的方式。当我应该打电话给super(IntPairWritable.class);时,我打电话给super(IntPairArrayWritable.class);。代码实际上应该看起来像这样:

public static class IntPairArrayWritable extends ArrayWritable
{
    // These two methods are what people say is all you need for
    // creating an ArrayWritable subclass
    public IntPairArrayWritable() {
        super(IntPairWritable.class);
    }
    public IntPairArrayWritable(IntPairWritable[] values) {
        super(IntPairWritable.class, values);
    }
}

我想使用一个不那么明显的混淆名称为arraytrable trabletable子类,因此错误会更容易发现。

更容易发现。

检查您的intpairwritable的导入语句。看来您在映射器中拿起错误的包装名称,因此即使其名称也是intpairable也可以。

也是如此。

最新更新