Hive/SQL 捆绑列用于少数列,其余列基于拉取的其他列的最低/最高



我有一个 5 列的蜂巢表

name orderno productcategory amount description
KJFSFKS 1   1   40  D1
KJFSFKS 2   2   50  D2
KJFSFKS 3   2   67  D3
KJFSFKS 4   2   10  D4
KJFSFKS 5   3   2   D5
KJFSFKS 6   3   5   D6
KJFSFKS 7   3   6   D7
KJFSFKS 8   4   8   D8
KJFSFKS 9   5   8   D9
KJFSFKS 10  5   10  D10

基于相同产品类别代码的所需输出,如果产品类别代码在多行中添加金额字段相同,则根据最高订单号选择描述,订单号始终选择最低,输出如下

name    orderno productcategory amount  description
KJFSFKS 1   1   40  D1
KJFSFKS 2   2   127 D4
KJFSFKS 5   3   13  D7
KJFSFKS 8   4   8   D8
KJFSFKS 9   5   18  D10

如上所述,有些字段按某种顺序排列,有些字段按不同的顺序排列

我使用了分组依据,但 sum(amount( 很好,描述字段呢,它基于 orderno 列,我的要求中还有其他列,我应该根据订单号进行选择

select name, orderno,  productcategory,  amount,   description 
from 
(
select name, orderno, productcategory, 
sum(amount) over(partition by name, productcategory) amount, 
first_value(description) over(partition by name, productcategory order by orderno desc) description,
row_number() over (partition by name, productcategory order by orderno) rn
from  your_table
)s where rn=1; --pick lowest orderno 

OK
KJFSFKS 1       1       40      D1
KJFSFKS 2       2       127     D4
KJFSFKS 5       3       13      D7
KJFSFKS 8       4       8       D8
KJFSFKS 9       5       18      D10
Time taken: 12.492 seconds, Fetched: 5 row(s)
select      name
,min(orderno)    as orderno
,productcategory
,sum(amount)     as amount
,max(named_struct('orderno',orderno,'description',description)).description
from        mytable
group by    name
,productcategory
;

最新更新