ReadField with complextype in hadoop



我有这个class:

public class Stripe implements WritableComparable<Stripe>{
    private List<Term> occorrenze = new ArrayList<Term>();
    public Stripe(){}
    @Override
    public void readFields(DataInput in) throws IOException {
    }
}

public class Term implements WritableComparable<Term> {
    private Text key;
    private IntWritable frequency;
    @Override
    public void readFields(DataInput in) throws IOException {
        this.key.readFields(in);
        this.frequency.readFields(in);
    }

Stripe是一个Term列表(一对Text和intWritable)。我怎么能设置方法"readField"从DataInput读取复杂类型的条纹?

要序列化一个列表,您需要写出列表的长度,然后是元素本身。一个简单的readFields/write方法对可以是:

@Override
public void readFields(DataInput in) throws IOException {
    occorrenze.clear();
    int cnt = in.readInt();
    for (int x = 0; x < cnt; x++) {
        Term term = new Term();
        term.readFields(in);
        occorrence.add(term);
    }
}
@Override
public void write(DataOutput out) throws IOException {
    out.writeInt(occorrenze.size());
    for (Term term : occorrenze) {
        term.write(out);
    }
}

您可以通过使用VInt而不是int来提高效率,并通过在readFields方法中使用可重用的术语池来节省对象创建/垃圾收集

你可以使用ArrayWritable,这是一个相同类型的可写列表。

相关内容

  • 没有找到相关文章

最新更新