我有一个包含表Cities
、Shops
和Sales
表的数据库,这样Cities
有很多Shops
,Shops
有很多Sales
.Shops
基本上是一个连接表,这样Cities
属于许多Sales
。
我正在寻找一个查询来查找每个城市的平均销售价格以及员工总数。
我试过了:
SELECT cities.id, AVG(sales.price) AS "avg_price", SUM(shops.employees) AS "tot_employees"
FROM cities
LEFT JOIN shops ON shops.city_id = city.id
LEFT JOIN sales ON sales.shop_id = shops.id
GROUP BY city.id
-- 640 ms
然而,这笔钱回报了太多的员工。这是有道理的,因为由于第二个连接,Shops
有很多重复的行。
稍微好一点:
SELECT cities.id, AVG(sales.price) AS "avg_price", SUM(DISTINCT(shops.employees)) AS "tot_employees"
FROM cities
LEFT JOIN shops ON shops.city_id = city.id
LEFT JOIN sales ON sales.shop_id = shops.id
GROUP BY city.id
-- 660 ms
这只对员工的唯一价值求和。但是,对于员工人数相同的节目来说,这当然是不正确的。
注意:Sales
表相当大。效率很重要。
如何在不影响目标表的情况下,对这个中间连接表的每一行执行聚合查询,没有重复
?两个级别的聚合怎么样?
SELECT c.id, SUM(sum_price) / SUM(num_sales), SUM(tot_employees)
FROM (SELECT c.id, sh.id as shop_id, SUM(s.price) AS sum_price, COUNT(*) as sales,
sh.employees AS tot_employees
FROM cities c LEFT JOIN
shops sh
ON sh.city_id = c.id LEFT JOIN
sales s
ON s.shop_id = sh.id
GROUP BY c.id, sh.id
) c
GROUP BY c.id