雪花查询中的窗口函数的条件分组依据



我在雪花中有一个表格,格式如下:

create temp_test(name string, split string, value int)
insert into temp_test
values ('A','a', 100), ('A','b', 200), ('A','c',300), ('A', 'd', 400), ('A', 'e',500), ('B', 'a', 1000), ('B','b', 2000), ('B','c', 3000), ('B', 'd',4000), ('B','e', 5000)

第一步,我只需要每个名称的前 2 个值(按值排序(,所以我使用以下查询来获取它:

select name, split, value,
row_number() over (PARTITION BY (name) order by value desc) as row_num 
from temp_test
qualify row_num <= 2

这给了我以下结果集:

NAME    SPLIT   VALUE   ROW_NUM
A       e       500     1
A       d       400     2
B       e       5000    1
B       d       4000    2

现在,我需要对前 2 名以外的值求和,并将其放入名为"其他"的不同拆分中,如下所示:

NAME    SPLIT   VALUE   
A       e       500     
A       d       400     
A       Others  600        
B       e       5000    
B       d       4000
B       Others  6000     

如何在雪花查询或SQL中做到这一点?

with data as (
select name, split, value,
row_number() over (partition by (name) order by value desc) as row_num 
from temp_test
)
select
name,
case when row_num <= 2 then split else 'Others' end as split,
sum(value) as value
from data
group by name, case when row_num <= 2 then row_num else 3 end

Shawnt00 的答案很好,但对于 Snowflake 中的记录,这可以写得更简单:

首先,最后的分组可以按索引或名称引用结果:

GROUP BY 1,2

GROUP BY name, split

此外,由于 CASE 只有太多分支,可以使用 IFF,并且似乎您正在使用 CTE 来添加row_number您也可以将 IFF 推送到 CTE 中

WITH data AS (
SELECT name, value,
ROW_NUMBER() OVER (PARTITION BY name ORDER BY value DESC) AS row_num,
IFF(row_num < 3, split, 'Others') as n_split
FROM VALUES ('A','a', 100), ('A','b', 200), ('A','c',300), ('A', 'd', 400), 
('A', 'e',500), ('B', 'a', 1000), ('B','b', 2000), ('B','c', 3000), 
('B', 'd',4000), ('B','e', 5000) 
v(name, split, value)
)   
SELECT
name,
n_split,
SUM(value) AS value
FROM data
GROUP BY name, n_split;

如果超级热衷于小型 SQL,则将ROW_NUMBER推送到 IFF:

WITH data AS (
SELECT name, value,
IFF(ROW_NUMBER() OVER (PARTITION BY name ORDER BY value DESC) < 3, split, 'Others') as n_split
FROM VALUES ('A','a', 100), ('A','b', 200), ('A','c',300), ('A', 'd', 400), 
('A', 'e',500), ('B', 'a', 1000), ('B','b', 2000), ('B','c', 3000), 
('B', 'd',4000), ('B','e', 5000) 
v(name, split, value)
)   
SELECT
name,
n_split AS split,
SUM(value) AS value
FROM data
GROUP BY name, n_split;

给:

NAME    SPLIT   VALUE
A       e       500
A       d       400
A       Others  600
B       e       5000
B       d       4000
B       Others  6000

相关内容

  • 没有找到相关文章

最新更新