我想为每个用户找到最新和第二个最新的不同order_id
之间的日期差异。
预期输出为:
user_id | order_diff
1 | 1
3 | 7
8 | 1
CCD_ 2表示两个不同的CCD_ 3之间的天数差异。如果没有两个不同的order_id
(如用户id 9的情况(,则不返回结果。
在这种情况下,user_id
的1
的order_diff
是1,因为他的两个不同的order_id
之间的日差是1。然而,user_id
9没有order_diff
,因为他没有2个不同的"order_id"。
这是数据集:
user_id order_id order_time
1 208965785 2016-12-15 17:14:13
1 201765785 2016-12-14 17:19:05
1 203932785 2016-12-13 20:41:30
1 209612785 2016-12-14 20:14:32
1 208112785 2016-12-14 20:27:08
1 205525785 2016-12-14 17:01:26
1 208812785 2016-12-14 20:18:23
1 206432785 2016-12-11 20:32:20
1 206698785 2016-12-14 10:50:15
2 209524795 2016-11-26 18:06:21
3 206529925 2016-10-01 10:43:57
3 203729925 2016-10-08 10:43:11
4 204876145 2016-09-24 10:23:49
5 203363157 2016-07-13 23:56:43
6 207784875 2017-01-04 12:21:21
7 206437177 2016-06-25 02:40:33
8 202819645 2016-09-09 11:47:27
8 202819645 2016-09-09 11:47:27
8 202819646 2016-09-08 11:47:27
9 205127187 2016-06-05 22:21:18
9 205127187 2016-06-05 22:21:18
11 207874877 2016-06-17 16:49:44
12 204927595 2016-11-28 23:05:40
这是我目前使用的代码:
SELECT e1.user_id,datediff(e1.order_time,e2.time), e1.order_id FROM
sales e1
JOIN
sales e2
ON
e1.user_id=e2.user_id
AND
e1.order_id = (SELECT distinct order_id FROM sales temp1 WHERE temp1.order_id =e1.order_id ORDER BY order_time DESC LIMIT 1)
AND
e2.order_id = (SELECT distinct order_id FROM sales temp2 WHERE temp2.order_id=e2.order_id ORDER BY order_time DESC LIMIT 1 OFFSET 1)
我的输出没有产生所需的输出,它也忽略了order_ids
相同的情况。
编辑:我还希望将查询扩展到较大的数据集,其中第二个最近的order_time
可能不是min(order_time)
基于小提琴:
select user_id,
datediff(max(order_time),
( -- Scalar Subquery to get the 2nd largest order_time
select max(order_time)
from orders as o2
where o2.user_id = o.user_id -- same user
and o2.order_time < max(o.order_time) -- but not the max time
)
) as diff
from orders as o
group by user_id
having diff is not null -- if there's no 2nd largest time diff will be NULL
以下将起作用:
架构(MySQL v5.7(
CREATE TABLE orders
(`user_id` int, `order_id` int, `order_time` datetime)
;
INSERT INTO orders
(`user_id`, `order_id`, `order_time`)
VALUES
(1,208965785,'2016-12-15 17:14:13'),
(1,201765785,'2016-12-14 17:19:05'),
(1,203932785,'2016-12-13 20:41:30'),
(1,209612785,'2016-12-14 20:14:32'),
(1,208112785,'2016-12-14 20:27:08'),
(1,205525785,'2016-12-14 17:01:26'),
(1,208812785,'2016-12-14 20:18:23'),
(1,206432785,'2016-12-11 20:32:20'),
(1,206698785,'2016-12-14 10:50:15'),
(2,209524795,'2016-11-26 18:06:21'),
(3,206529925,'2016-10-01 10:43:57'),
(3,203729925,'2016-10-08 10:43:11'),
(4,204876145,'2016-09-24 10:23:49'),
(5,203363157,'2016-07-13 23:56:43'),
(6,207784875,'2017-01-04 12:21:21'),
(7,206437177,'2016-06-25 02:40:33'),
(8,202819645,'2016-09-09 11:47:27'),
(8,202819645,'2016-09-09 11:47:27'),
(8,202819646,'2016-09-08 11:47:27'),
(9,205127187,'2016-06-05 22:21:18'),
(9,205127187,'2016-06-05 22:21:18'),
(11,207874877,'2016-06-17 16:49:44'),
(12,204927595,'2016-11-28 23:05:40');
查询#1
SELECT dt2.user_id,
MIN(datediff(dt2.latest_order_time,
dt2.second_latest_order_time)) AS order_diff
FROM (
SELECT o.user_id,
o.order_time AS latest_order_time,
(SELECT o2.order_time
FROM orders AS o2
WHERE o2.user_id = o.user_id AND
o2.order_id <> o.order_id
ORDER BY o2.order_time DESC LIMIT 1) AS second_latest_order_time
FROM orders AS o
JOIN (SELECT user_id, MAX(order_time) AS latest_order_time
FROM orders
GROUP BY user_id) AS dt
ON dt.user_id = o.user_id AND
dt.latest_order_time = o.order_time
) AS dt2
WHERE dt2.second_latest_order_time IS NOT NULL
GROUP BY dt2.user_id;
| user_id | order_diff |
| ------- | ---------- |
| 1 | 1 |
| 3 | 7 |
| 8 | 1 |
查看DB Fiddle
详细信息:
- 我们为子选择查询(派生表(中的
user_id
确定最大order_time
。我们可以将其别名为latest_order_time
- 我们把这个结果集合
Join
放到orders
表中。这将有助于我们仅考虑order_diff
0的最大值为order_time
的行 - 现在,我们使用CorrelatedSubquery来确定同一用户的最大
order_time
值,而不是剩余的order_id
值。我们可以将其别名为second_latest_order_time
- 最后,再次将其用作派生表,并删除
second_latest_order_time
为null
的所有情况,并计算其余情况的datediff()
- 需要最终的
Group By
,因为您的数据有多个条目用于
以下是解决方案:
SELECT user_id,
DATEDIFF(MAX(order_time), MIN(order_time)) as order_diff
FROM orders
GROUP BY user_id
HAVING order_diff > 0;
这里有一个测试它的链接。