我有一个如下表:
+----+----------------+-------------------------+
| id | employeeNumber | transactionTime |
+----+----------------+-------------------------+
| 1 | 1234 | 2016-02-23 15:11:00.000 |
+----+----------------+-------------------------+
| 2 | 1234 | 2016-02-22 11:01:00.000 |
+----+----------------+-------------------------+
| 3 | 1235 | 2016-02-22 07:22:00.000 |
+----+----------------+-------------------------+
| 4 | 1236 | 2016-02-20 09:16:00.000 |
+----+----------------+-------------------------+
| 5 | 1236 | 2016-02-19 11:01:00.000 |
+----+----------------+-------------------------+
| 6 | 1236 | 2016-02-18 11:44:00.000 |
+----+----------------+-------------------------+
| 7 | 1236 | 2016-02-17 12:12:00.000 |
+----+----------------+-------------------------+
| 8 | 1236 | 2016-02-16 11:09:00.000 |
+----+----------------+-------------------------+
| 9 | 1236 | 2016-02-15 11:19:00.000 |
+----+----------------+-------------------------+
| 10 | 1236 | 2016-02-14 09:12:00.000 |
+----+----------------+-------------------------+
我需要找到一种方法来返回每个员工在过去两周内连续记录交易的天数。例如:
+------+--------------+-------------------------+-------------------------+
| days |employeeNumber| startTime | endTime |
+------+--------------+-------------------------+-------------------------+
| 2 | 1234 | 2016-02-22 11:01:00.000 | 2016-02-23 15:11:00.000 |
+------+--------------+-------------------------+-------------------------+
| 1 | 1235 | 2016-02-22 11:01:00.000 | 2016-02-22 11:01:00.000 |
+------+--------------+-------------------------+-------------------------+
| 7 | 1236 | 2016-02-14 09:12:00.000 | 2016-02-20 09:16:00.000 |
+------+--------------+-------------------------+-------------------------+
我一直在处理以下查询,但它只返回一个用户,并且没有只考虑过去两周。
WITH
dates(date) AS (
SELECT DISTINCT CAST(transactionTime AS DATE)
FROM Fuel.dbo.comdata
WHERE employeeNumber = 123456
),
groups AS (
SELECT ROW_NUMBER() OVER (ORDER BY date) AS rn,
DATEADD(DAY, -ROW_NUMBER() OVER (ORDER BY date), date) AS grp,
date
FROM dates
)
SELECT COUNT(*) AS consecutiveDates,
MIN(date) AS minDate, MAX(date) AS maxDate
FROM groups
GROUP BY grp
ORDER BY 1 DESC, 2 DESC
感谢您的帮助。
更新
由于Gordon Linoff的回答,我发现以下问题非常有用。但是,我注意到最小/最大日期与连续天数不匹配。如图所示,实时数据:
SELECT * FROM (
SELECT employeeNumber, COUNT(*) AS consecutiveDays,
MIN(transactionTime) AS startTime, MAX(transactionTime) AS endTime
FROM (
SELECT cd.*, DATEADD(DAY, -DENSE_RANK() OVER (PARTITION BY
employeeNumber ORDER BY transactionTime), CAST(transactionTime AS
DATE)) AS grp
FROM Fuel.dbo.comdata cd
WHERE transactionTime >= DATEADD(DAY, -14, GETDATE())
) cd
GROUP BY employeeNumber, grp
) AS tbl1
WHERE consecutiveDays >= 7
+--------------+-------------------------+------------------------+
| empNum | days| startTime | endTime |
+--------+-------------------------------+------------------------+
| 16742 | 7 | 2016-04-28 17:00:00.000 | 2016-05-07 17:04:00.000|
+--------+-------------------------------+------------------------+
| 15056 | 8 | 2016-04-27 13:03:00.000 | 2016-05-08 09:51:00.000|
+--------+-------------------------------+------------------------+
正如您所看到的,连续天数与开始/结束时间不匹配。有什么想法吗?
我会使用行号方法来处理差异(假设每个员工每天最多有一条记录):
select employee, count(*) as numdays,
min(timestamp) as startTime, max(timestamp) as endTime
from (select cd.*,
dateadd(day,
- row_number() over (partition by employee order by transactionTime),
cast(transactionTime as date)
) as grp
from Fuel.dbo.comdata cd
) cd
group by employee, grp;
其想法是根据transactionTime为每个员工生成一系列序列号。当事务是连续的几天时,这与transactionTime之间的差异是恒定的。
如果您可以在同一天有多个事务,那么您可以使用dense_rank()
。
如果您在同一天有重复:
select employee, count(*) as numdays,
min(timestamp) as startTime, max(timestamp) as endTime
from (select cd.*,
dateadd(day,
- dense_rank() over (partition by employee order by cast(transactionTime as date)),
cast(transactionTime as date)
) as grp
from Fuel.dbo.comdata cd
) cd
group by employee, grp;