将 XML 加载到 Hive 时出现空指针异常



我目前在使用hive-serde将xml文件加载到Hive时遇到了一些问题。我按照这里的提示进行操作,但在尝试读取通过 xml 文件加载到 Hive 中的数据时,我仍然收到空指针异常。下面的SQL才能正常运行,只有在尝试从表中读取时才出现问题

这是带有一些虚拟值的 xml

<?xml version="1.0"?><History-Group-Comm-CommB-DT-RBB-Work 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<item id="HISTORY_6_GMT">
<pxAddedByID>HUBINT</pxAddedByID>
<pxAddedBySystem>CD</pxAddedBySystem>
<pxHistoryForReference>GR</pxHistoryForReference>
<pxInsName>GMT</pxInsName>
<pxObjClass>Work</pxObjClass>
<pxTimeCreated>2017-02-13T13:08:28.776Z</pxTimeCreated>
<pyFlowKey>RULE-OBJ</pyFlowKey>
<pyFlowName>pyStartCase</pyFlowName>
<pyFlowType>pyStartCase</pyFlowType>
<pyMessageKey>ItemCreated</pyMessageKey>
<pyPerformer>HUB</pyPerformer>
<pzInsKey>776 GMT</pzInsKey>
</item>

加载 XML 的 SQL 如下

add jar hdfs://DEVHDPVM01HA:8020/HADOOP/DASD_ACQ/common/lib/hivexmlserde-1.0.5.3.jar;

DROP TABLE IF EXISTS test.test_tbl_stg;
CREATE EXTERNAL TABLE test.test_tbl_stg  (
ADDED_BY_ID STRING COMMENT 'pxAddedByID',
ADDED_BY_SYSTEM STRING COMMENT 'pxAddedBySystem',
HISTORY_FOR_REFERENCE STRING COMMENT '',
INSERT_NAME STRING COMMENT '',
OBJECT_CLASS STRING COMMENT '',
TIME_CREATED STRING COMMENT '',
FLOW_KEY STRING COMMENT '',
FLOW_NAME STRING COMMENT '',
FLOW_TYPE FLOAT COMMENT '',
MESSAGE STRING COMMENT '',
PERFORMER STRING COMMENT '',
INSERT_KEY STRING COMMENT '' ) COMMENT 'Optional Table Comment'
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
"column.xpath.ADDED_BY_ID"="/item/pxAddedByID/text()", 
"column.xpath.ADDED_BY_SYSTEM"="/item/pxAddedBySystem/text()",
"column.xpath.HISTORY_FOR_REFERENCE"="/item/pxHistoryForReference/text()",
"column.xpath.INSERT_NAME"="/item/pxInsName/text()",
"column.xpath.OBJECT_CLASS"="/item/pxObjClass/text()",
"column.xpath.TIME_CREATED"="/item/pxTimeCreated/text()",
"column.xpath.FLOW_KEY"="/item/pyFlowKey/text()",
"column.xpath.FLOW_NAME"="/item/pyFlowName/text()",
"column.xpath.FLOW_TYPE"="/item/pyFlowType/text()",
"column.xpath.MESSAGE"="/item/pyMessageKey/text()",
"column.xpath.PERFORMER"="/item/pyPerformer/text()",
"column.xpath.INSERT_KEY"="/item/pzInsKey/text()")
STORED AS
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '${stagingFolderPath}'
TBLPROPERTIES ("xmlinput.start"="<item id=","xmlinput.end"="</item>");

非常感谢有关为什么会发生此空指针异常的任何建议

转换此字段数据类型 FLOW_TYPE浮动注释 '' 自 FLOW_TYPE字符串注释 '',

由于转换,我认为您正面临此问题

最新更新