假设我有两个MySQL表,purchase_log和game_log,前者是所有客户端支付的记录,后者是所有游戏的记录。
create table purchase_log (
client_id int,
purchase_date date,
amount int
);
insert into purchase_log (client_id, purchase_date, amount) values (9, '2012-01-01', 10);
insert into purchase_log (client_id, purchase_date, amount) values (10, '2012-01-01', 5);
insert into purchase_log (client_id, purchase_date, amount) values (11, '2012-01-01', 10);
create table game_log (
client_id int,
game_id int,
game_date date
);
insert into game_log (client_id, game_id, game_date) values (9, 110, '2012-12-01');
insert into game_log (client_id, game_id, game_date) values (10, 110, '2012-12-01');
insert into game_log (client_id, game_id, game_date) values (11, 110, '2012-12-01');
insert into game_log (client_id, game_id, game_date) values (11, 110, '2012-12-02');
insert into game_log (client_id, game_id, game_date) values (11, 110, '2012-12-03');
按月份分组,但仅针对玩过游戏的客户的平均花费金额是多少?
select avg(amount)
from purchase_log
where client_id in
(select client_id
from game_log
#where...
)
#and...
group by month(purchase_date);
+-------------+
| avg(amount) |
+-------------+
| 8.3333 |
+-------------+
8.3的答案是正确的。((10+10+5)/3=8.3)但是很多SO文章都说联接更高效,所以我重写为联接:
select avg(amount)
from purchase_log p
#where...
join game_log g on p.client_id=g.client_id
#and...
group by month(purchase_date);
+-------------+
| avg(amount) |
+-------------+
| 9.0000 |
+-------------+
但9.0的结果是错误的,应该是8.3。由于一个客户端玩了多个游戏,所以join给出了一个错误的结果,该结果给出了3行。
有没有办法修复这个连接?还是应该回到效率较低的子查询?
尝试此查询的另一种方法是:
select avg(amount)
from purchase_log pl
where exits (select 1
from game_log gl
where gl.client_id = pl.client_id
);
为了使其正常工作,您还需要game_log(client_id)
上的索引。
您的结果不同,因为join
乘以了行数。在这种情况下,在where
子句中保留条件确实更容易。