Hadoop with phoenix:如何将phoenix表对象写入hdfs文件系统



我有一个map reduce作业,它使用phoenix从hbase表中读取。我希望这个作业的输出在HDFS中,然后馈送到另一个map reduce作业,在那里我将更新到HBASE表。这是我试过的。

public class Job1Driver extends Configured implements Tool {
@Override
public int run(String[] args) throws Exception {
    final org.apache.hadoop.conf.Configuration jobConfiguration = super.getConf();
    final Job job1 = Job.getInstance(jobConfiguration, jobConfiguration.get("mapreduce.job.name"));
    final String selectQuery = "SELECT * FROM TABLE1 WHERE IS_SUMMARY_RECORD=false";
    job1.setJarByClass(Job1Driver.class);
    PhoenixMapReduceUtil.setInputCluster(job1, jobConfiguration.get("HBASE_URL"));
    PhoenixMapReduceUtil.setInput(job1, Table1Writable.class, "TABLE1", selectQuery);
    if (jobConfiguration.get("IS_FROZEN_DATA_AVAILABLE").equals("True")) {
        MultipleInputs.addInputPath(job1,new Path(args[0]),
                TextInputFormat.class, FrozenMapper.class);
    }
    MultipleInputs.addInputPath(job1,new Path(args[1]),
            PhoenixInputFormat.class,ActiveMapper.class);
    FileOutputFormat.setOutputPath(job1, new Path(args[2]));
    job1.setMapOutputKeyClass(Text.class);
    job1.setMapOutputValueClass(Table1Writable.class);
    job1.setOutputKeyClass(NullWritable.class);
    job1.setOutputValueClass(Table1Writable.class);
    job1.setReducerClass(Job1Reducer.class);
    boolean st = job1.waitForCompletion(true);
    return st ? 0 : 1;
}
public static void main(String[] args) throws Exception {
    Configuration conf = HBaseConfiguration.create();
    int exitCode = ToolRunner.run(conf, new Job1Driver(), args);
    System.exit(exitCode);
}

当我运行这个时,我在输出目录

中得到如下内容
hadoopDir.Table1Writable@5c8eee0f

使用Writable实现,我可以从mapper写到HDFS,但对于reducer也不适用。有什么明显的我遗漏了吗?

您是否使用MapReduce,因为Phoenix查询不缩放?我们试图在Splice Machine(开源)上对phoenix进行基准测试,但我们无法让它扩展到大型查询/更新。

我认为你需要设置

job.setOutputFormatClass()

好运…

最新更新