我的xml看起来像这样
<TAG>
<REQUEST_ID>1</REQUEST_ID>
<APPLICATION_ID>2</APPLICATION_ID>
<EXTERNAL_SYSTEM_CODE>RB</EXTERNAL_SYSTEM_CODE>
<CCM_CHECK>
<CCM_CHECK_ID>101</CCM_CHECK_ID>
<CCM_CHECK_RESULT>10</CCM_CHECK_RESULT>
</CCM_CHECK>
<VERIF_ANSWERS>
<CHECK_CODE>101</CHECK_CODE>
<QUESTION_CODE>1</QUESTION_CODE>
<BOOKMARK_NUMBER>1</BOOKMARK_NUMBER>
<ANSWER_VALUE>NN</ANSWER_VALUE>
</VERIF_ANSWERS>
<VERIF_ANSWERS>
<CHECK_CODE>101</CHECK_CODE>
<QUESTION_CODE>2</QUESTION_CODE>
<BOOKMARK_NUMBER>1</BOOKMARK_NUMBER>
<ANSWER_VALUE>NN</ANSWER_VALUE>
</VERIF_ANSWERS>
</TAG>
这就是我从中创建表的方式
CREATE EXTERNAL TABLE s_sourcedata.evkuzmin_test_xml(
request_id string
, application_id string
, external_system_code string
, ccm_check map<string, string>
, verif_answers array<struct<verif_answer:array<map<string, string>>>>
)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
"column.xpath.request_id"="/TAG/REQUEST_ID/text()",
"column.xpath.application_id"="/TAG/APPLICATION_ID/text()",
"column.xpath.external_system_code"="/TAG/EXTERNAL_SYSTEM_CODE/text()",
"column.xpath.ccm_check"="/TAG/CCM_CHECK/*",
"column.xpath.verif_answers"="/TAG")
STORED AS
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION '/storage/s_sourcedata/db/evkuzmin_test_xml'
TBLPROPERTIES (
"xmlinput.start"="<TAG",
"xmlinput.end"="</TAG>"
);
这导致以下结果
1,2,RB,"{""CCM_CHECK_ID"":""101"",""CCM_CHECK_RESULT"":""10""}","[{""verif_answer"":null}]"
我怎样才能像ccm_check
那样将verif_answers
变成一系列关键价值对?
我尝试以与ccm_check
相同的方式执行此操作,但只得到了第一个VERIF_ANSWERS
。
VERIF_ANSWERS
的数量可能会有所不同。在这种情况下有 2 个?但可以有 0 或 10。
CREATE EXTERNAL TABLE myxml(
request_id string
, application_id string
, external_system_code string
, ccm_check map<string, string>
, verif_answers array<map<string, string>>
)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
"column.xpath.request_id"="/TAG/REQUEST_ID/text()",
"column.xpath.application_id"="/TAG/APPLICATION_ID/text()",
"column.xpath.external_system_code"="/TAG/EXTERNAL_SYSTEM_CODE/text()",
"column.xpath.ccm_check"="/TAG/CCM_CHECK/*",
"column.xpath.verif_answers"="/TAG/VERIF_ANSWERS/*")
STORED AS
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION 'file:///home/cloudera/xmlfiles'
TBLPROPERTIES (
"xmlinput.start"="<TAG",
"xmlinput.end"="</TAG>"
);
select myxml.verif_answers from myxml;
INFO : OK
+----------------------------------------------------+--+
| myxml.verif_answers |
+----------------------------------------------------+--+
| [{"CHECK_CODE":"101"},{"QUESTION_CODE":"1"},{"BOOKMARK_NUMBER":"1"},{"ANSWER_VALUE":"NN"},{"CHECK_CODE":"101"},{"QUESTION_CODE":"2"},{"BOOKMARK_NUMBER":"1"},{"ANSWER_VALUE":"NN"}] |
+----------------------------------------------------+--+
1 row selected (0.306 seconds)