WordCount with Apache Crunch into HBase Standalone

目前我正在评估Apache Crunch。我遵循一个简单的WordCount MapReduce作业示例：之后，我尝试将结果保存到独立的HBase中。HBase 正在运行（使用 jps 和 HBase shell 进行检查），如下所述：http://hbase.apache.org/book/quickstart.html

现在我采用写入 HBase 的示例：

Pipeline pipeline = new MRPipeline(WordCount.class,getConf());
PCollection<String> lines = pipeline.readTextFile(inputPath);
PTable<String,Long> counts = noStopWords.count();
pipeline.write(counts, new HBaseTarget("wordCountOutTable");
PipelineResult result = pipeline.done();

我得到一个异常："exception：java.lang.illegalArgumentException： HBaseTarget 仅支持放置和删除"

有什么线索出了什么问题吗？

PTable 可能是 PCollection，但 HBaseTarget 只能处理 Put 或 Delete 对象。因此，您必须将 PTable 转换为 PCollection，其中集合的每个元素都是 Put 或 Delete。看看完成此操作的紧缩示例。

示例转换可能如下所示：

 public PCollection<Put> createPut(final PTable<String, String> counts) {
   return counts.parallelDo("Convert to puts", new DoFn<Pair<String, String>, Put>() {
     @Override
     public void process(final Pair<String, String> input, final Emitter<Put> emitter) {
       Put put;
       // input.first is used as row key
       put = new Put(Bytes.toBytes(input.first())); 
       // the value (input.second) is added with its family and qualifier
       put.add(COLUMN_FAMILY_TARGET, COLUMN_QUALIFIER_TARGET_TEXT, Bytes.toBytes(input.second())); 
       emitter.emit(put);
     }
   }, Writables.writables(Put.class));
 }

相关内容

最新更新

热门标签：