获取每年最受欢迎的五个名字的总和|大查询



我正试图从谷歌云数据库中按年份和性别获得五个最受欢迎的名字的总和。我的数据源是bigquery-public-data.usa_names.usa_1910_2013,在那里我们可以看到诸如:state、gender、year、name和number之类的列。

我当前的代码正在工作,但结果似乎不是真实的。

SELECT count(x.number) as numberOfNames,
x.name, 
x.year
FROM bigquery-public-data.usa_names.usa_1910_2013 x
JOIN (SELECT gender, year, name, SUM(number) as numbers
FROM bigquery-public-data.usa_names.usa_1910_2013
WHERE gender='F'
GROUP BY year, gender, name
ORDER BY numbers DESC
LIMIT 5
) y ON y.name = x.name
WHERE x.gender = 'F' 
GROUP BY x.name, x.year

预期结果如下:

Linda 60000 1910
Elizabeth 59000 1910
Jennifer 58000 1910
Mary 57000 1910
Patricia 56000 1910
Elizabeth 62000 1911
Jennifer 58000 1911
Linda 57500 1911
Mary 57000 1911
Patricia 56000 1911

有什么帮助吗?

使用窗口函数:

SELECT n.*
FROM (SELECT gender, name, year, SUM(number) as number,
ROW_NUMBER() OVER (PARTITION BY gender, year ORDER BY SUM(number) DESC) as seqnum
FROM bigquery-public-data.usa_names.usa_1910_2013 n
WHERE gender = 'F'
GROUP BY gender, name, year
) n
WHERE seqnum <= 5;
with data as (
select year, name, sum(number) as name_count
FROM bigquery-public-data.usa_names.usa_1910_2013
where gender = 'F'
group by 1,2
),
ordered as (
select *, row_number() over (partition by year order by name_count desc) as yearly_rank
from data
)
select * from ordered
where yearly_rank <= 5
order by year, name_count desc

最新更新