我有一个具有以下架构的配置单元表:
CREATE EXTERNAL TABLE db_test.user_arry(
cstid string,
prdctsslctd array<string>,
indvprc array<bigint>,
dscntamt array<bigint>,
prdctsrjctd array<string>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY 'n'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'/location/on/a/hadoop/'
其中存在的数据采用以下格式:
--------------------------------------------------------
name | prdctsslctd | indvprc | dscntamt | prdctsrjctd
--------------------------------------------------------
cctg65 ["m_jns","cbyht"] ["23","6"] ["1","1"] ["shs","jkt"]
jju89o0 ["top","jeans_wmn"] ["55","45"] [NULL] [NULL]
ju34hd ["laychps","candy","toy"]["3","5","67"]["12","8"]["candy"]
尝试将此数据拉入数据类型为string
所有列的表中
CREATE EXTERNAL TABLE db_test.user_strng(
cstid string,
prdctsslctd string,
indvprc string,
dscntamt string,
prdctsrjctd string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY 'n'
STORED AS textfile
LOCATION
'/location/on/a/hadoop/';
用:
insert into db_test.user_strng select * from db_test.user_arry;
实际提现价:
--------------------------------------------------------
name | prdctsslctd | indvprc | dscntamt | prdctsrjctd
--------------------------------------------------------
cctg65 m_jnscbyht 236 11 shsjkt
jju89o0 topjeans_wmn 5545 NULL NULL
ju34hd laychpscandytoy 3567 128 candy
预期运营
:--------------------------------------------------------
name | prdctsslctd | indvprc | dscntamt | prdctsrjctd
--------------------------------------------------------
cctg65 "m_jns","cbyht" "23","6" "1","1" "shs","jkt"
jju89o0 "top","jeans_wmn" "55","45" NULL NULL
ju34hd "laychps","candy","toy" "3","5","67" "12","8" "candy"
不知道哪里出了问题,或者错过了什么?
Update_1
执行将数组转换为数组后表中的 O/P:
ALTER TABLE user_arry CHANGE indvprc indvprc array<string>;
ALTER TABLE user_arry CHANGE dscntamt dscntamt array<string>;
--------------------------------------------------------
name | prdctsslctd | indvprc | dscntamt | prdctsrjctd
--------------------------------------------------------
cctg65 ["m_jns","cbyht"] ["23","6"] ["1","1"] ["shs","jkt"]
jju89o0 ["top","jeans_wmn"] ["55","45"] [] []
ju34hd ["laychps","candy","toy"]["3","5","67"]["12","8"]["candy"]
所有数据类型均为字符串的表中的最终 O/P:
--------------------------------------------------------
name | prdctsslctd | indvprc | dscntamt | prdctsrjctd
--------------------------------------------------------
cctg65 m_jns cbyht 23 6 1 1 shs jkt
jju89o0 top jeans_wmn 55 45
ju34hd laychps candy toy 3 5 67 12 8 candy
仍然没有得到所需的 o/p。
Update_2
正如建议的那样,FIELDS TERMINATED BY ','
的更改是否FIELDS TERMINATED BY 't'
.以所需格式获取数据。
将所有数组类型更改为array <string>
:
alter table ALTER TABLE user_arry CHANGE indvprc indvprc array<string>;
alter table ALTER TABLE user_arry CHANGE dscntamt dscntamt array<string>;
根据您的数据示例,array<bigint>
不是像"23.45"
这样的值的正确格式。array<string>
应该适用于您的数据文件。
使用concat_ws
将数组转换为逗号分隔的字符串:
insert into db_test.user_strng
select name,
concat_ws(',',prdctsslctd) as prdctsslctd,
concat_ws(',',indvprc) as indvprc,
concat_ws(',',dscntamt) as dscntamt,
concat_ws(',',prdctsrjctd) as prdctsrjctd
from db_test.user_arry;