在Hive中执行自定义UDF



我有以下表在我的Hive

>describe weblogs;
OK
originatingip           string                                      
clientidentity          string                                      
userid                  string                                      
time                    string                                      
requesttype             string                                      
requestpage             string                                      
httpprotocolversion     string                                      
responsecode            int                                         
responsesize            int                                         
referrer                string                                      
useragent               string                                      
Time taken: 1.065 seconds, Fetched: 11 row(s)

我在Java中创建了一个UDF来映射Ip地址与地理位置。以下是我的UDF

package com.prithvi.hive.logprocessing.udf.ipgeo;
public class IpgeoHive extends UDF {
Text result = new Text();
String ipCountry, ipCity;
public Text evaluate(Text input) throws IOException {
    if(input==null)return null;
    URL database_path = getClass().getResource("/GeoLiteCity.dat");
    File file;
    try {
      file = new File(database_path.toURI());
    } catch(URISyntaxException e) {
      file = new File(database_path.getPath());
    }
    LookupService cl = new LookupService(file);
    Location location = cl.getLocation(input.toString());
    if (location != null) {
        ipCountry = location.countryName;
        ipCity = location.city;
    } else {
        ipCountry = "Unknown";
        ipCity = "Unknown";
    }
    result.set(ipCountry+"/"+ipCity);
    return result;
}
}

通过在eclipse

中传递虚拟值进行测试时,上面的udf返回预期的结果在构建jar文件之后,我使用以下命令在沙箱中运行它
ADD JAR MapReduce_Examples-0.0.1-SNAPSHOT-jar-with-dependencies.jar;
CREATE TEMPORARY FUNCTION IP2GEO AS 'com.prithvi.hive.logprocessing.udf.ipgeo.IpgeoHive';
SELECT originatingip, IP2GEO(originatingip) from weblogs limit 10;

但是作业失败,出现以下错误,我不知道如何解决这个问题。

Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException:    Hive Runtime Error while processing  row{"originatingip":"25.198.250.35","clientidentity":"-","userid":"-","time":"[2014-07-19T16:05:33Z]","requesttype":""GET","requestpage":"/","httpprotocolversion":"HTTP/1.1"","responsecode":404,"responsesize":1081,"referrer":""-"","useragent":""Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)""}
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:195)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"originatingip":"25.198.250.35","clientidentity":"-","userid":"-","time":"[2014-07-19T16:05:33Z]","requesttype":""GET","requestpage":"/","httpprotocolversion":"HTTP/1.1"","responsecode":404,"responsesize":1081,"referrer":""-"","useragent":""Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)""}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
... 8 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public org.apache.hadoop.io.Text com.prithvi.hive.logprocessing.udf.ipgeo.IpgeoHive.evaluate(org.apache.hadoop.io.Text) throws java.io.IOException  on object com.prithvi.hive.logprocessing.udf.ipgeo.IpgeoHive@63c0b9c3 of class com.prithvi.hive.logprocessing.udf.ipgeo.IpgeoHive with arguments {25.198.250.35:org.apache.hadoop.io.Text} of size 1
at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:1241)
at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.evaluate(GenericUDFBridge.java:182)
at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:166)
at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:79)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)
... 9 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:1217)
... 18 more
Caused by: java.lang.IllegalArgumentException: URI is not hierarchical
at java.io.File.<init>(File.java:418)
at com.prithvi.hive.logprocessing.udf.ipgeo.IpgeoHive.evaluate(IpgeoHive.java:28)
... 23 more
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched: 
Job 0: Map: 1   HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec*

hive不知道java字符串/Text.

您必须将java字符串/Text转换为hive字符串。

使用下面的代码

private JavaStringObjectInspector stringInspector;
stringInspector = PrimitiveObjectInspectorFactory.javaStringObjectInspector;
String ip = stringInspector.getPrimitiveJavaObject(input);  

相关内容

  • 没有找到相关文章

最新更新