我正在尝试通过组合多个表来创建一个表,但我需要获取输出表上的所有列。
Create table output_table
Select cl1, cl2, cl3, cl4
From table1
Union all
Select cl1, cl2, cl3, cl5
From table2
Union all
Select cl1, cl2, cl3, cl6
From table3
Union all
Select cl1, cl2, cl3, cl7
From table4
在这种情况下,我的输出表应该包含所有列。
有什么办法可以得到这个,请
我会在 Hive 中尝试这样做(我想所有的列都是字符串,如果不是,你可以放另一个值,例如,如果字段是 int,你可以放 0 而不是 '',或者你也可以放 null,最终结果将是相同的(
CREATE TABLE output_table
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE
AS
SELECT cl1, cl2, cl3, cl4, '' AS cl5, '' AS cl6, '' AS cl7
FROM table1
UNION ALL
SELECT '' AS cl1, '' AS cl2, '' AS cl3, '' AS cl4, cl5, '' AS cl6, '' AS cl7
FROM table2
UNION ALL
SELECT '' AS cl1, '' AS cl2, '' AS cl3, '' AS cl4, '' AS cl5, cl6, '' AS cl7
FROM table3
UNION ALL
SELECT '' AS cl1, '' AS cl2, '' AS cl3, '' AS cl4, '' AS cl5, '' AS cl6, cl7
FROM table4;
然后,您可以按如下方式清理此表 创建另一个表作为最终表
CREATE TABLE final_table
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE
AS
SELECT t.cl1, t1.cl2,t2.cl3,t3.cl4,t4.cl5,t5.cl6,t6.cl7
FROM (SELECT cl1 FROM output_table WHERE cl1 <> '') AS t,
(SELECT cl2 FROM output_table WHERE cl2 <> '') AS t1,
(SELECT cl3 FROM output_table WHERE cl3 <> '') AS t2,
(SELECT cl4 FROM output_table WHERE cl4 <> '') AS t3,
(SELECT cl5 FROM output_table WHERE cl5 <> '') AS t4,
(SELECT cl6 FROM output_table WHERE cl6 <> '') AS t5,
(SELECT cl7 FROM output_table WHERE cl7 <> '') AS t6;