XML 文件的"Hive Runtime Error while processing row"



我正在尝试读取一个简单的xml文件并从中提取数据。下面是文件

来源:

<a>
        <b id="foo">b1</b>
        <b id="bar">b2</b>
</a>

我在 Hive 中创建了 SRC 表,如下所示:

Create table src(line string);

然后我加载了这个表,如下所示:

load data local inpath '/home/hduser/Desktop/batch/hiveip/src' into table src;

我正在尝试使用以下查询提取 AS 数据:

select xpath(line,'//@id') from src;
    Diagnostic Messages for this Task:
    Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"line":"<a>"}
            at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:195)
            at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
            at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
            at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
            at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
            at java.security.AccessController.doPrivileged(Native Method)
            at javax.security.auth.Subject.doAs(Subject.java:415)
            at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
            at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
    Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"line":"<a>"}
            at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
            at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
            ... 8 more
    Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating array ('line',''//@id'')
            at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
            at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
            at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
            at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
            at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)
            ... 9 more
    Caused by: java.lang.RuntimeException: Invalid expression '//@id'
            at org.apache.hadoop.hive.ql.udf.xml.UDFXPathUtil.eval(UDFXPathUtil.java:74)
            at org.apache.hadoop.hive.ql.udf.xml.UDFXPathUtil.evalNodeList(UDFXPathUtil.java:95)
            at org.apache.hadoop.hive.ql.udf.xml.GenericUDFXPath.eval(GenericUDFXPath.java:76)
            at org.apache.hadoop.hive.ql.udf.xml.GenericUDFXPath.evaluate(GenericUDFXPath.java:97)
            at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:166)
            at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
            at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
            at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:79)
            ... 13 more

我没有得到输出。

但是,当我执行以下查询时,我得到了输出:

select xpath('<a><b id="foo">b1</b><b id="bar">b2</b></a>','//@id')

输出:

["foo","bar"]

如果有人能解释我到底发生了什么以及我做错了哪里,那就太好了。

您的表src很可能有 4 行这样的行。

+---------------------+--+
|      src.line       |
+---------------------+--+
| <a>                 |
| <b id="foo">b1</b>  |
| <b id="bar">b2</b>  |
| </a>                |
+---------------------+--+

相反,它应该是这样的

+----------------------------------------------+--+
|                   src.line                   |
+----------------------------------------------+--+
| <a><b id="foo">b1</b><b id="bar">b2</b></a>  |
+----------------------------------------------+--+

以它们在一行中的方式排列 xml 文件

[cloudera@quickstart ~]$ cat myxml.xml 
<a><b id="foo">b1</b><b id="bar">b2</b></a>

并将其加载到配置单元

create table src(line string)
location '/your/xml/location';

并运行查询。 它应该给你预期的结果

+----------------+--+
|      _c0       |
+----------------+--+
| ["foo","bar"]  |
+----------------+--+

相关内容

最新更新