我是大数据的初学者。首先,我想尝试mapreduce如何与hbase一起工作。该场景是在我的 hbase 使用映射减少中对字段 uas 求和,该日期作为主键。这是我的表格:
hbase::table - test 行列+单元格 10102010#1 列=cf:nama, 时间戳=1418267197429, 值=jonru 10102010#1 列=cf:quiz, 时间戳=1418267197429, 值=\x00\x00\x00d 10102010#1 列=cf:uas, 时间戳=1418267197429, 值=\x00\x00\x00d 10102010#1 列=cf:uts, 时间戳=1418267197429, 值=\x00\x00\x00d 10102010#2 列=cf:nama, timestamp=1418267180874, value=jonru 10102010#2 列=CF:测验, 时间戳=1418267180874, 值=\x00\x00\x00d 10102010#2 列=cf:uas, 时间戳=1418267180874, 值=\x00\x00\x00d 10102010#2 列=cf:uts, 时间戳=1418267180874, 值=\x00\x00\x00d 10102012#1 列=cf:nama, 时间戳=1418267156542, 值=jonru 10102012#1 列=cf:quiz, 时间戳=1418267156542, 值=\x00\x00\x00\x00\x0A 10102012#1 列=cf:uas, 时间戳=1418267156542, 值=\x00\x00\x00\x00\x0A 10102012#1 列=cf:uts, 时间戳=1418267156542, 值=\x00\x00\x00\x00\x0A 10102012#2 列=cf:nama, 时间戳=1418267166524, 值=jonru 10102012#2 列=cf:quiz, 时间戳=1418267166524, 值=\x00\x00\x00\x00\x0A 10102012#2 列=cf:uas, 时间戳=1418267166524, 值=\x00\x00\x00\x00\x0A 10102012#2 列=cf:uts, 时间戳=1418267166524, 值=\x00\x00\x00\x0A
我的代码是这样的:
public class TestMapReduce {
public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
Configuration config = HBaseConfiguration.create();
Job job = new Job(config, "Test");
job.setJarByClass(TestMapReduce.TestMapper.class);
Scan scan = new Scan();
scan.setCaching(500);
scan.setCacheBlocks(false);
TableMapReduceUtil.initTableMapperJob(
"test",
scan,
TestMapReduce.TestMapper.class,
Text.class,
IntWritable.class,
job);
TableMapReduceUtil.initTableReducerJob(
"test",
TestReducer.class,
job);
job.waitForCompletion(true);
}
public static class TestMapper extends TableMapper<Text, IntWritable> {
@Override
protected void map(ImmutableBytesWritable rowKey, Result columns, Mapper.Context context) throws IOException, InterruptedException {
System.out.println("mulai mapping");
try {
//get row key
String inKey = new String(rowKey.get());
//get new key having date only
String onKey = new String(inKey.split("#")[0]);
//get value s_sent column
byte[] bUas = columns.getValue(Bytes.toBytes("cf"), Bytes.toBytes("uas"));
String sUas = new String(bUas);
Integer uas = new Integer(sUas);
//emit date and sent values
context.write(new Text(onKey), new IntWritable(uas));
} catch (RuntimeException ex) {
ex.printStackTrace();
}
}
}
public class TestReducer extends TableReducer {
public void reduce(Text key, Iterable values, Reducer.Context context) throws IOException, InterruptedException {
try {
int sum = 0;
for (Object test : values) {
System.out.println(test.toString());
sum += Integer.parseInt(test.toString());
}
Put inHbase = new Put(key.getBytes());
inHbase.add(Bytes.toBytes("cf"), Bytes.toBytes("sum"), Bytes.toBytes(sum));
context.write(null, inHbase);
} catch (Exception e) {
e.printStackTrace();
}
}
}
我收到这样的错误:
Exception in thread "main" java.lang.NullPointerException
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1010) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1010)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:451)
at org.apache.hadoop.util.Shell.run(Shell.java:424)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:656)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:745)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:728)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:633)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:421)
at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:281)
at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:125)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:348)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1295)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1292)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1292)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1313)
at TestMapReduce.main(TestMapReduce.java:97)
Java Result: 1
请帮帮我:)
让我们看一下代码的这一部分:
byte[] bUas = columns.getValue(Bytes.toBytes("cf"), Bytes.toBytes("uas"));
String sUas = new String(bUas);
对于当前键,您正在尝试从列系列 cf 中获取列 uas 的值。这是一个非关系数据库,因此此键很可能没有此列的值。在这种情况下,getValue 方法将返回 null。接受 byte[] 作为输入的字符串构造函数无法处理空值,因此它将引发 NullPointerException。快速修复将如下所示:
byte[] bUas = columns.getValue(Bytes.toBytes("cf"), Bytes.toBytes("uas"));
String sUas = bUas == null ? "" : new String(bUas);