如何按一列对每个组占据最大值,并保留其他组的信息



我有这样的数据:

column1    column2   column3
A          V         10
A          Z         11
A          X         11

,对于column1中的每个数据,我想在column3中找到最大值和column2中的相应值。我该如何在HIVE中执行此操作?

我最接近的是进行select column1, max(column3) from table group by column1。但是,这不会添加来自column2的相应信息。我也如何得到吗?

在第3列中的领带上,我真的不在乎我们从column2中汲取哪个价值。谢谢。

我希望结果为:

column1   column2   column3
A         Z         11

一种方法是使用 row_number。如果有联系,您将获得列2的任意值。

select column1,column2,column3
from (
select t.*,row_number() over(partition by column1 order by column3 desc) as rn
from tablename t
) x
where rn=1

这是同一概念的一些变体


select      column1           ,
           ,max(named_struct('column3',column3,'column2',column2)).column2
           ,max(column3) as column3
from        mytable
group by    column1
;

+---------+---------+---------+
| column1 | column2 | column3 |
+---------+---------+---------+
| A       | Z       |      11 |
+---------+---------+---------+

select      column1
           ,max(struct(column3,column2)).col2  as column2
           ,max(column3)                       as column3
from        mytable
group by    column1
;

+---------+---------+---------+
| column1 | column2 | column3 |
+---------+---------+---------+
| A       | Z       |      11 |
+---------+---------+---------+

select      column1
           ,col.column2
           ,col.column3
from       (select      column1
                       ,max(named_struct('column3',column3,'column2',column2)) as col
            from        mytable
            group by    column1
            ) t
;

+---------+---------+---------+
| column1 | column2 | column3 |
+---------+---------+---------+
| A       | Z       |      11 |
+---------+---------+---------+

最新更新