比较hbase中的两个表,并使用TableMapReduceUtil将摘要写入第三个表



我需要在Hbase上使用MR来比较Hbase中的两个表(表1,表2),并将摘要写入第三个表(图3)

我使用的是下面的TableMapReduceUtil psuedo代码。映射器:表1减速器:表3。

在mapper中,我需要将Table1的值与Table2进行比较。在哪里实例化Table2?

在映射程序中,是否必须为每个映射程序实例化Table3?我想实例化表3整个MapReduce作业只执行一次?

driver()
{
TableMapReduceUtil.initTableMapperJob(
    table1,        // input table
    scan,              
    MyMapper.class,     // mapper class
    Text.class,         
    IntWritable.class,  
    job);
TableMapReduceUtil.initTableReducerJob(
    table3,        // output table
    MyTableReducer.class,    
    job);
}

public static class MyMapper extends TableMapper<Text, IntWritable>  {
    public static final byte[] CF = "cf".getBytes();
    public static final byte[] ATTR1 = "attr1".getBytes();
    private final IntWritable ONE = new IntWritable(1);
    private Text text = new Text();
    public void map(ImmutableBytesWritable row, Result value, Context context) throws IOException, InterruptedException {
            String val = new String(value.getValue(CF, ATTR1));
            String diff;
            //instantiate Table3 and compare with val. Do i have to instantiate for each mapper?
            text.set(diff);     
            context.write(text, ONE);
    }
}

public static class MyTableReducer extends TableReducer<Text, IntWritable, ImmutableBytesWritable>  {
    public static final byte[] CF = "cf".getBytes();
    public static final byte[] COUNT = "count".getBytes();
    public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
            int i = 0;
            for (IntWritable val : values) {
                i += val.get();
            }
            Put put = new Put(Bytes.toBytes(key.toString()));
            put.add(CF, COUNT, Bytes.toBytes(i));
            context.write(null, put);
    }
}

如果您试图在HBase中创建一个已经创建的表,它将抛出一个TableExistsException,您可以选择忽略它。请参阅HBaseAdmin文档。所以你是可以的——第一个创建表的映射器实际上会创建它,然后其他映射器会抛出你将忽略的异常。

相关内容

  • 没有找到相关文章

最新更新