使用SQLite,我如何计算每年的最大同比增长率



我正在学习SQL,我正在Codecademy上做一个名为World Populations SQL practice的练习练习。有一个表有三列:国家、人口和年份。我有兴趣计算每年同比增长率最高的国家。(这不是Codecademy提出的,我只是觉得这是一个有趣的想法(。

我可以用这个查询来计算所有的同比增长率:

SELECT country,
100.0 * ((SELECT population FROM population_years AS p2
WHERE p2.year = p1.year + 1
AND p2.country = p1.country)
- population) / population AS year_on_year_growth,
year
FROM population_years AS p1
WHERE year_on_year_growth IS NOT NULL
ORDER BY year_on_year_growth;

我可以计算特定年份(如2005年(的最大同比增长率,查询如下:

SELECT country,
100.0 * ((SELECT population FROM population_years AS p2
WHERE p2.year = p1.year + 1
AND p2.country = p1.country)
- population) / population AS year_on_year_growth,
year
FROM population_years AS p1
WHERE year = 2005
AND year_on_year_growth IS NOT NULL
ORDER BY year_on_year_growth DESC
LIMIT 1;

使用python,如果我这样做,我可以使用保存为yoy_query的第一个查询来解决问题:

yoy_result = c.execute(yoy_query).fetchall()
sorted([record for record in yoy_result if record[1] == max([row[1] for row in yoy_result if row[2] == record[2]])],key=lambda x:x[2])

我得到了想要的结果:

[('Montserrat', 7.34177215189872, 2000), ('Montserrat', 13.4433962264151, 2001), ('Afghanistan', 5.803891762260126, 2002), ('Montserrat', 10.467706013363028, 2003), ('Liberia', 4.7976709085316545, 2004), ('Jordan', 7.088496587486171, 2005), ('Jordan', 6.764378108744186, 2006), ('Montserrat', 12.638580931263864, 2007), ('Liberia', 4.157111008408977, 2008), ('Niger', 3.737166190281749, 2009)]

但是我想不出用SQL实现这一点的方法。有什么想法吗?我认为在python中它看起来容易得多的原因是因为我能够保存中间结果,然后对其进行第二次计算。

您可以使用窗口函数LAG()RANK():

select country, year_on_year_growth, year
from (
select *, rank() over (partition by year order by year_on_year_growth desc) as rnk
from (
select *, 
100.0 * (population / lag(population) over (partition by country order by year) - 1) as year_on_year_growth
from population_years 
)
)

表达式:

lag(population) over (partition by country order by year)

返回该国上一年的人口(假设年份之间没有差距(
所以我计算出增长率为:

((当年人口(/(上一年人口((-1

我想最简单的方法实际上就是使用如下视图:

CREATE VIEW yoy_growth
AS
SELECT country,
100.0 * ((SELECT population FROM population_years AS p2
WHERE p2.year = p1.year + 1
AND p2.country = p1.country)
- population) / population AS year_on_year_growth,
year
FROM population_years AS p1
WHERE year_on_year_growth IS NOT NULL
ORDER BY year_on_year_growth;
SELECT * FROM yoy_growth AS y1
WHERE year_on_year_growth = (
SELECT MAX(year_on_year_growth)
FROM yoy_growth AS y2
WHERE y1.year = y2.year
)
ORDER BY year;

这样我就得到了我想要的结果,尽管查询看起来确实有点慢。

最新更新