我有一个产品表
Product Number | Type | Code | Date
P1-A D D1 2020-03-18
P1-A D D2 2020-03-20
P1-A D D4 2020-03-29
P1-A P P1 2020-01-20
P1-A P P3A 2020-01-22
P2-A D D1 2020-04-10
P3-A ... ... ...
我想了解每组Product Number
和Type
哪个是 [D2,D4] 或 [P3A,P4] 代码的最大日期,以及哪个代码代表最大日期。
这段代码是错误的,但它显示了我的想法
SELECT Product Number, MAX(D Code), MAX(D Date), MAX(P Code), MAX(P Date)
FROM product_table WHERE code = 'D2' OR code = 'D4' or code = 'P3A' or code = 'P4'
GROUP BY Product Number, Type
最终表如下所示
Product Number | D Code | D Date | P Code | P Date |
P1-A D4 2020-03-29 P3A 2020-01-22
... ... ... ... ...
.... ... ... ... ...
如何使用 Hive 正确执行此操作?
使用条件聚合:
SELECT Product_Number,
max(case when type = 'D' then code end) as D_code,
max(case when type = 'D' then date end) as D_date,
max(case when type = 'P' then code end) as P_code,
max(case when type = 'P' then date end) as P_date
FROM product_table
WHERE code in('D2','D4','P3A','P4')
GROUP BY Product_Number;