1)样本数据集看起来像这样:
create table user(
user_id int,
name varchar(10),
surname varchar(10)
);
insert into user(user_id, name, surname) values
(1, 'a', 'aa'),
(2, 'b', 'bb'),
(3, 'c', 'cc');
create table books(
user_id int,
book_name varchar(10)
);
insert into books(user_id, book_name) values
(1, 'book1'),
(1, 'book2'),
(1, 'book3'),
(2, 'book1');
create table expanses(
id int,
user_id int,
amount_spent int,
date timestamp
);
insert into expanses(id, user_id, amount_spent, date)
values
(1,1,10, '2020-02-03'),
(2,1,10, '2020-02-03'),
(3,1,30, '2020-02-02'),
(4,1,12, '2020-02-01'),
(5,1,15, '2020-01-31'),
(6,1,13, '2020-01-15'),
(7,2,15, '2020-02-01'),
(8,3,20, '2020-02-01');
我想要的结果:
| CountUsers | amount_spent |
|---------|--------------|
| 2 | 77 |
解释:I want to count
a)有多少用户拥有book1或book2和
b)在2020-02-01 - 2020-02-03期间,他们在约会中总共花了多少钱。
现在查询应该是什么样子?
我正在使用MySQL版本8
I have try:
SELECT count(*)
, sum(amount_spend) as total_amount_spend
FROM
( select sum(amount_spend) as amount_spend
FROM expanses e
LEFT
JOIN books b
ON b.user_id = e.user_id
WHERE (b.book_name ='book1' or b.book_name ='book2')
and e.date between '2020-02-01' and '2020-02-03'
GROUP
BY e.user_id) src'
结果是错误的,因为select子句从内部(稍微修改,以更清楚地显示):
select amount_spend as amount_spend
FROM expanses
LEFT JOIN books ON books.user_id = expanses.user_id WHERE (books.book_name ='book1' or books.book_name ='book2') and expanses.date between '2020-02-01' and '2020-02-03'
3)将返回如下内容:
| user_id | amount_spent | book_name |
|---------|--------------|-----------|
| 1 | 10 | book1 |
| 1 | 30 | book1 |
| 1 | 30 | book1 |
| 1 | 12 | book1 |
| 1 | 10 | book2 |
| 1 | 10 | book2 |
| 1 | 30 | book2 |
| 1 | 12 | book2 |
| 2 | 15 | book1 |
所以如果把这些加起来,我们会得到
| CountUsers | amount_spent |
|---------|--------------|
| 2 | 139 |
这是错误的,因为有重复的。
如果我们将DISTINCT加到sum(DISTINCT amount_spend)
它也将是错误的,因为它将给出以下结果
| CountUsers | amount_spent |
|---------|--------------|
| 2 | 67 |
总结一下,您可以在表3中看到,在book_name中有一些重复的amount_spent。(一对多关系)
如何避免重复amount_spent,但保持与book_name?
小提琴
select count(distinct user_id)
, sum(amount_spent)
from expanses
where expanses.date between '2020-02-01' and '2020-02-03'
and user_id in (select user_id from books where book_name in('book1','book2'))
https://www.db-fiddle.com/f/26ifPWyRRKGp9YVQXg1qje/0
a)
SELECT COUNT(DISTINCT user_id) total FROM books WHERE book_name IN ('book1','book2');
b)
SELECT SUM(amount_spent) total_spent
FROM expanses e
WHERE e.date BETWEEN '2020-02-01' AND '2020-02-03'
AND EXISTS
( SELECT *
FROM books b
WHERE b.user_id = e.user_id
AND b.book_name IN ('book1','book2')
);
这是一个结合a)和b)的想法
SELECT SUM(amount_spent) total_spent
, (SELECT COUNT(DISTINCT user_id) total FROM books WHERE book_name IN ('book1','book2')) total_customers
FROM expanses e
WHERE e.date BETWEEN '2020-02-01' AND '2020-02-03'
AND EXISTS
( SELECT *
FROM books b
WHERE b.user_id = e.user_id
AND b.book_name IN ('book1','book2')
);