>我目前有一个看起来像这样的表格:
Table_1
ID|NAME|INFO_CODE_1|INFO_CODE_2|INFO_CODE_3|INFO_CODE_4|INFO_TEXT|DESCRIPTION
1 |Test|123 |254 |556 |867 |Test Text|Test Description
我正在尝试将 Table_1 中的数据插入到Table_2中,以便第一个表中的每条记录变成第二个表中的四个单独的记录,如下所示:
Table_2
ID|NAME|INFO_CODE|INFO_TEXT|DESCRIPTION
1 |Test|123 |Test Text|Test Description
1 |Test|254 |Test Text|Test Description
1 |Test|556 |Test Text|Test Description
1 |Test|867 |Test Text|Test Description
Table_1中有数千条记录,还有其他列的数据我们不关心,INFO_CODE_1 到 INFO_CODE_4 中的值都因记录而异(甚至可能为 null(。我正在考虑写这样的东西:
INSERT OVERWRITE TABLE example.TABLE_2
SELECT
A.ID AS ID,
A.NAME AS NAME,
A.INFO_CODE_1 AS INFO_CODE,
A.INFO_TEXT AS INFO_TEXT,
A.DESCRIPTION AS DESCRIPTION
FROM
example.TABLE_1 A
UNION ALL
SELECT
A.ID AS ID,
A.NAME AS NAME,
A.INFO_CODE_2 AS INFO_CODE,
A.INFO_TEXT AS INFO_TEXT,
A.DESCRIPTION AS DESCRIPTION
FROM
example.TABLE_1 A
UNION ALL
SELECT
A.ID AS ID,
A.NAME AS NAME,
A.INFO_CODE_3 AS INFO_CODE,
A.INFO_TEXT AS INFO_TEXT,
A.DESCRIPTION AS DESCRIPTION
FROM
example.TABLE_1 A
UNION ALL
SELECT
A.ID AS ID,
A.NAME AS NAME,
A.INFO_CODE_4 AS INFO_CODE,
A.INFO_TEXT AS INFO_TEXT,
A.DESCRIPTION AS DESCRIPTION
FROM
example.TABLE_1 A;
- 这行得通吗?
- 如果是,有没有更好的方法来编写 陈述?
- 有没有办法确保具有相同ID的所有记录一个接一个地插入?
提前感谢大家提供的任何帮助。
另一种没有很多工会的方式是这样的。它将需要 1 个 mapreduce 作业,而不是几个节省资源的作业。您可以连接代码,拆分它们以生成数组并分解数组。
INSERT OVERWRITE TABLE example.TABLE_2
SELECT
a.id,
a.name,
CODE
a.INFO_TEXT,
A.DESCRIPTION from
(
SELECT
A.ID AS ID,
A.NAME AS NAME,
split(concat(A.INFO_CODE_1, ",",A.INFO_CODE_2,",",A.INFO_CODE_3,",",A.INFO_CODE_4),",") AS INFO_CODE_ARRAY,
A.INFO_TEXT AS INFO_TEXT,
A.DESCRIPTION AS DESCRIPTION
from TABLE_1
) t LATERAL VIEW explode(INFO_CODE_ARRAY) codea AS code;