我需要使用多个聚合级别在表上运行聚合统计信息。
实现这一点的一种不雅的方法是UNION
几个使用相同字段名的SELECT
查询,但每个查询的GROUP BY
中都有不同的字段。
示例:每个直辖市(=最小地理单位(、县、国家和世界的人口总数,即population
字段的总和。
-- aggregate dummy table population_statistics by
-- 1:municipality 2:county 3:nation 4:world
SELECT
"municipality" AS geo_level,
municipality_name AS geo_name,
SUM(population) AS population
FROM population_statistics
GROUP BY municipality_name
UNION
SELECT
"county" AS geo_level,
county_name AS geo_name,
SUM(population) AS population
FROM population_statistics
GROUP BY county_name
UNION
SELECT
"nation" AS geo_level,
country_name AS geo_name,
SUM(population) AS population
FROM population_statistics
GROUP BY country_name
UNION
SELECT
"world" AS geo_level,
"world" AS geo_name,
SUM(population) AS population
FROM population_statistics
;
DB篡改模型数据
我知道像R的tidyr
和python的pandas
库这样的编程语言有更干净的方法来运行表的多级聚合但是是否可以使用纯SQL运行多级聚合也许可以使用公共表表达式(最近添加到MySQL版本8中的CTE(来提高这种聚合的效率?
在MySql 8.0+中,您可以创建一个具有要聚合的所有级别的CTE,然后应用聚合:
WITH cte AS (
SELECT 1 sort_order, 'municipality' geo_level, municipality_name geo_name, population FROM population_statistics
UNION ALL
SELECT 2, 'county', county_name, population FROM population_statistics
UNION ALL
SELECT 3, 'nation', country_name, population FROM population_statistics
UNION ALL
SELECT 4, 'world', 'world', population FROM population_statistics
)
SELECT geo_level, geo_name, SUM(population) population
FROM cte
GROUP BY sort_order, geo_level, geo_name
ORDER BY sort_order
请参阅演示