我有一个带有结构数组的csv文件,其中所有内容都由","分隔。在 ID 字段之后,数据包含 X、Y 和 Z 坐标的三元组数组。
ID, X1,Y1,Z1,X2,Y2,Z2,X3,Y3,Z3,...
1,1,2,3,4,5,6,7,8,9
2,4,5,6,7,8,9
3,10,11,12
4,15,16,17,18,19,20,25,26,27
我尝试使用以下代码创建 Hive 表,如果我的字段、集合项和映射键使用不同的字符,该表将起作用。但是,由于所有内容都用逗号分隔,因此失败了。想知道是否有针对这种情况的替代解决方案。
CREATE TABLE IF NOT EXISTS Hivetable (
ID INT,
XYZ array<STRUCT<X:DOUBLE, Y:DOUBLE, Z:DOUBLE>>
)
row format delimited
fields terminated by ','
collection items terminated by ','
map keys terminated by ','
stored as textfile
;
LOAD DATA local INPATH 'Path/datafile.csv' OVERWRITE INTO TABLE Hivetable;
SCV 文件输入应为:
1,1;2;3#4;5;6#7;8;9
2,4;5;6#7;8;9
表创建:
CREATE TABLE IF NOT EXISTS Hivetable (
ID INT,
XYZ array<STRUCT<X:DOUBLE, Y:DOUBLE, Z:DOUBLE>>
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '#'
MAP KEYS TERMINATED BY ';'
LINES TERMINATED BY 'n'
STORED AS
INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
输出:
select * from Hivetable
1 [{"X":1,"Y":2,"Z":3},{"X":4,"Y":5,"Z":6},{"X":7,"Y":8,"Z":9}]