我有一个haddop工作,它的输出应该写入HBase。我真的不需要化简器,我想插入的行类型是在映射器中确定的。
如何使用 TableOutputFormat 来实现此目的?从我看到的所有例子中,假设化简器是创建 Put 的那个,而 TableMapper 仅用于从 HBase 表中读取。
在我的情况下,输入是HDFS,输出是放在特定的表中,我在TableMapReduceUtil中也找不到任何可以帮助我的东西。
有什么例子可以帮助我吗?
顺便说一句,我正在使用新的Hadoop API
这是从文件中读取并将所有行放入 Hbase 的示例。此示例来自"Hbase:权威指南",您可以在存储库中找到它。要获取它,只需在您的计算机上克隆存储库:
git clone git://github.com/larsgeorge/hbase-book.git
在本书中,您还可以找到有关代码的所有解释。但是,如果您无法理解某些事情,请随时询问。
` public class ImportFromFile {
public static final String NAME = "ImportFromFile";
public enum Counters { LINES }
static class ImportMapper
extends Mapper<LongWritable, Text, ImmutableBytesWritable, Writable> {
private byte[] family = null;
private byte[] qualifier = null;
@Override
protected void setup(Context context)
throws IOException, InterruptedException {
String column = context.getConfiguration().get("conf.column");
byte[][] colkey = KeyValue.parseColumn(Bytes.toBytes(column));
family = colkey[0];
if (colkey.length > 1) {
qualifier = colkey[1];
}
}
@Override
public void map(LongWritable offset, Text line, Context context)
throws IOException {
try {
String lineString = line.toString();
byte[] rowkey = DigestUtils.md5(lineString);
Put put = new Put(rowkey);
put.add(family, qualifier, Bytes.toBytes(lineString));
context.write(new ImmutableBytesWritable(rowkey), put);
context.getCounter(Counters.LINES).increment(1);
} catch (Exception e) {
e.printStackTrace();
}
}
}
private static CommandLine parseArgs(String[] args) throws ParseException {
Options options = new Options();
Option o = new Option("t", "table", true,
"table to import into (must exist)");
o.setArgName("table-name");
o.setRequired(true);
options.addOption(o);
o = new Option("c", "column", true,
"column to store row data into (must exist)");
o.setArgName("family:qualifier");
o.setRequired(true);
options.addOption(o);
o = new Option("i", "input", true,
"the directory or file to read from");
o.setArgName("path-in-HDFS");
o.setRequired(true);
options.addOption(o);
options.addOption("d", "debug", false, "switch on DEBUG log level");
CommandLineParser parser = new PosixParser();
CommandLine cmd = null;
try {
cmd = parser.parse(options, args);
} catch (Exception e) {
System.err.println("ERROR: " + e.getMessage() + "n");
HelpFormatter formatter = new HelpFormatter();
formatter.printHelp(NAME + " ", options, true);
System.exit(-1);
}
return cmd;
}
public static void main(String[] args) throws Exception {
Configuration conf = HBaseConfiguration.create();
String[] otherArgs =
new GenericOptionsParser(conf, args).getRemainingArgs();
CommandLine cmd = parseArgs(otherArgs);
String table = cmd.getOptionValue("t");
String input = cmd.getOptionValue("i");
String column = cmd.getOptionValue("c");
conf.set("conf.column", column);
Job job = new Job(conf, "Import from file " + input + " into table " + table);
job.setJarByClass(ImportFromFile.class);
job.setMapperClass(ImportMapper.class);
job.setOutputFormatClass(TableOutputFormat.class);
job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, table);
job.setOutputKeyClass(ImmutableBytesWritable.class);
job.setOutputValueClass(Writable.class);
job.setNumReduceTasks(0);
FileInputFormat.addInputPath(job, new Path(input));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}`
你只需要让映射器输出这对。 OutputFormat
仅指定如何保留输出键值。这并不一定意味着关键值来自化简器。您需要在映射器中执行以下操作:
... extends TableMapper<ImmutableBytesWritable, Put>() {
...
...
context.write(<some key>, <some Put or Delete object>);
}