HBase使用Hadoop中的Put,但在HBaseshell中看不到值



我有一个简单的map/reduce作业,它扫描一个hbase表,并修改另一个hbase表。hadoop作业似乎成功完成了,但当我检查hbase表时,条目并没有出现在那里。

这是hadoop程序:

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.hbase.mapreduce.TableMapper;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.output.NullOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class HBaseInsertTest extends Configured implements Tool {
    @Override
    public int run(String[] args) throws Exception {
        String table = "duplicates";
        Scan scan = new Scan();
        scan.setCaching(500);
        scan.setCacheBlocks(false);
        Job job = new Job(getConf(), "HBaseInsertTest");
        job.setJarByClass(HBaseInsertTest.class);
        TableMapReduceUtil.initTableMapperJob(table, scan, Mapper.class, /* mapper output key = */null,
            /* mapper output value= */null, job);
        TableMapReduceUtil.initTableReducerJob("tablecopy", /*output table=*/null, /*reducer class=*/job);
        job.setNumReduceTasks(0);
        // Note that these are the default.
        job.setOutputFormatClass(NullOutputFormat.class);
        return job.waitForCompletion(true) ? 0 : 1;
    }
    private static class Mapper extends TableMapper<ImmutableBytesWritable, Put> {
        @Override
        protected void setup(Context context) throws IOException, InterruptedException {
            super.setup(context);
        }
        @Override
        public void map(ImmutableBytesWritable row, Result columns, Context context) throws IOException {
            long id = 1260018L;
            try {
                Put put = new Put(Bytes.toBytes(id));
                put.add(Bytes.toBytes("mapping"), Bytes.toBytes("foo"), Bytes.toBytes("bar"));
                context.write(row, put);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        }
    }
    public static void main(String[] args) throws Exception {
        Configuration config = HBaseConfiguration.create();
        int res = ToolRunner.run(config, new HBaseInsertTest(), args);
        System.exit(res);
    }
}

来自HBase外壳:

hbase(main):008:0> get 'tablecopy', '1260018', 'mapping'
COLUMN                          CELL                                                                                    
0 row(s) in 0.0100 seconds

我把程序简化了很多,试图证明/隔离这个问题。我对hadoop/hbase也比较陌生。我确实验证了映射是tablecopy表中存在的列族。

我认为问题出在您查询hbase(main):008:0>获取"tablecopy","1260018","mapping"

相反,您应该对此进行查询:hbase(main):008:0>获取"tablecopy",1260018,"mapping"

HBase认为这是您正在查询的字符串键,因为有引号。此外,如果你只是在你的端运行一个简单的客户端作业来从HBase检索这个密钥,那么如果它已经存在,它就会正确地为你获取值。

您的问题在于缺少减速器。您需要创建一个扩展TableReducer的类,该类将Put作为输入,并使用context.write(ImmutableBytesWritable key, Put put)将Put写入目标表。

我想象它看起来像这样:

public static class MyReducer extends TableReducer<ImmutableBytesWritable, Put, ImmutableBytesWritable> {
  public void reduce(ImmutableBytesWritable key, Iterable<Put> values, Context context)
      throws IOException, InterruptedException {
    for (Put record : values) {
      context.write(key, record);
    }
  }
}

然后,将表缩减器初始化器修改为:TableMapReduceUtil.initTableReducerJob("tablecopy", MyReducer.class, job);

另一种选择是继续不使用reducer,并在映射器中打开一个HTable对象,然后像这样直接写入put:

HTable table = new HTable(Context.getConfiguration(), "output_table_name");
Put myPut = ...;
table.put(myPut);
table.close();

希望这能有所帮助!

相关内容

  • 没有找到相关文章