使用MapReduce将数据大容量插入HBase

我需要在HBase表中插入4亿行。

架构看起来像这个

其中，我通过简单地将int和int以及value连接为System.nanoTime（）来生成密钥

我的地图仪看起来像这个

public class DatasetMapper extends Tablemapper <Text,LongWritable> {

  private static Configuration conf = HBaseConfiguration.create();

public void map (Text key, LongWritable values, Context context) throws exception {
   // instantiate HTable object that connects to table name 
   HTable htable = new HTable(conf,"temp") // already created temp table 
   htable.setAutoFlush(flase);
   htable.setWriteBufferSize(1024*1024*12);
   // construct key
   int i = 0, j = 0;
   for(i=0; i<400000000,i++) {
       String rowkey = Integer.toString(i).concat(Integer.toString(j));
       Long value = Math.abs(System.nanoTime());
       Put put = new Put(Bytes.toBytes(rowkey));
           put.add(Bytes.toBytes("location"),Bytes.toBytes("longlat"),Bytes.toBytes(value);
       htable.put(put)
       j++;
       htable.flushCommits();
}
}

我的工作看起来像

Configuration config = HBaseConfiguration.create();
Job job = new Job(config,"initdb");
job.setJarByClass(DatasetMapper.class);    // class that contains mapper
TableMapReduceUtil.initTableMapperJob(
null,      // input table
null,            
DatabaseMapper.class,   // mapper class
null,             // mapper output key
null,             // mapper output value
job);
TableMapReduceUtil.initTableReducerJob(
temp,      // output table
null,             // reducer class
job);
job.setNumReduceTasks(0);
boolean b = job.waitForCompletion(true);
if (!b) {
throw new IOException("error with job!");
}

作业运行，但插入0条记录。我知道我犯了一些错误，但我没能抓住它，因为我是HBase的新手。请帮帮我。

感谢

首先，映射程序的名称是DatasetMapper，但在作业配置中，您指定了DatabaseMapper。我想知道它是如何在没有任何错误的情况下工作的。

接下来，看起来您已经将TableMapper和Mapper的用法混合在一起了。HbaseTableMapper是一个抽象类，它扩展了HadoopMapper，帮助我们方便地读取Hbase，TableReducer帮助我们写回Hbase。您正在尝试从Mapper中放入数据，同时使用TableReducer。您的映射器实际上永远不会被调用。

要么使用TableReducer来放置数据，要么只使用Mapper。如果您真的想在Mapper中执行此操作，可以使用TableOutputFormat类。参见HBase最终指南第301页给出的示例。这是谷歌图书链接

HTH-

附言：你可能会发现这些链接有助于正确学习HBase+MR集成：

链接1。

链接2。

相关内容

最新更新

热门标签：