我有一个表,其中有快递公司的快递数据。我每天都安排了一次送货。我想统计一下失败交货的次数。如果连续几天交货失败,则应将其计算为1。例如,交付安排在2021年4月的每一天。4月15日交货失败。之后,从4月18日到4月20日,交货再次失败。虽然交付失败的天数是4天,但我想将失败的次数算作2,因为连续失败的天数将算作1。
DeliveryId DeliveryDate Status
1 2021-04-14 Success
2 2021-04-15 Failure
3 2021-04-16 Success
4 2021-04-17 Success
5 2021-04-18 Failure
6 2021-04-19 Failure
7 2021-04-20 Failure
8 2021-04-21 Success
我想要一个新的列,显示第一次失败的deliveryId,如下所示。
DeliveryId DeliveryDate Status FailedDeliveryId
1 2021-04-14 Success
2 2021-04-15 Failure 2
3 2021-04-16 Success
4 2021-04-17 Success
5 2021-04-18 Failure 5
6 2021-04-19 Failure 5
7 2021-04-20 Failure 5
8 2021-04-21 Success
我尝试了几种选择,但未能达到上述结果。我在查询中使用了LAG函数来查找以前的交货状态。但问题是,如果交付失败超过3或4天,那么我将不得不使用LAG功能返回3或4天后检查状态。我想让它成为一个动态查询。以下是我使用的
SELECT *,
CASE WHEN Status='Failure' AND Prev_Status='Success' THEN DeliveryId
WHEN Status='Failure' AND Prev_Status='Failure' THEN Prev_DeliveryId
END AS FailureInstance
FROM (
SELECT *,
LAG(Status,1) OVER(ORDER BY DeliveryDate ASC) Prev_Status,
LAG(DeliveryId,1) OVER(ORDER BY DeliveryDate ASC) Prev_DeliveryId
FROM table1
) A
…
declare @t table(DeliveryId int, DeliveryDate date, Status varchar(10));
insert into @t(DeliveryId, DeliveryDate, Status)
values
(10, '2021-04-14', 'Success'),
(20, '2021-04-15', 'Failure'),
(30, '2021-04-16', 'Success'),
(40, '2021-04-17', 'Success'),
(70, '2021-04-18', 'Failure'),
(60, '2021-04-19', 'Failure'),
(50, '2021-04-20', 'Failure'),
(80, '2021-04-21', 'Success');
select *,
case when Status='Failure' then min(grpDeliveryId) over(partition by grp) end as FailedDeliveryId,
case when Status='Failure' then datediff(minute, min(DeliveryDate) over(partition by grp), min(grpSuccessDate) over(partition by grp) ) end as MinutesDiffFailSuccess
from
(
select *,
sum(addorcountme) over(order by DeliveryDate) as grp
from
(
select *,
case when Status='Failure' and lag(Status) over(order by DeliveryDate)='Failure' then null else 1 end as addorcountme,
case when Status='Failure' and lag(Status) over(order by DeliveryDate)='Failure' then null else DeliveryId end as grpDeliveryId,
case when Status='Failure' and lead(Status) over(order by DeliveryDate)='Success' then lead(DeliveryDate) over(order by DeliveryDate) end as grpSuccessDate
from @t
) as t
) as g;
/*
select *, case when Status='Failure' then min(DeliveryId) over(partition by grp) end as FailedDeliveryId
from
(
select *,
sum(sumorcountme) over(order by DeliveryDate) as grp
from
(
select *, case when Status='failure' and lag(Status) over(order by DeliveryDate)='Failure' then null else 1 end as sumorcountme
from @t
) as t
) as g; */
select t.*,
(case when status = 'Failure'
then min(case when status = 'Failure' then DeliveryId end) over (partition by grp)
end) as first_failureId
from (select t.*,
sum(case when status = 'Success' then 1 else 0 end) over (order by DeliveryDate) as grp
from t
) t;
这里有一个db<gt;不停摆弄
实际上,一个稍微简单一点的版本以相反的顺序分配组,所以第一条记录是失败的,而不是成功的,所以min()
中的case
是不必要的:
select t.*,
(case when status = 'Failure'
then min(DeliveryId) over (partition by grp)
end) as first_failureId
from (select t.*,
sum(case when status = 'Success' then 1 else 0 end) over (order by DeliveryDate desc) as grp
from t
) t
order by DeliveryId;
嗯。另一种方法使用CCD_ 3来检测状态何时改变。然后只对变化使用累积最大值:
select t.*,
(case when status = 'Failure'
then max(case when prev_status is null or prev_status <> status then DeliveryId end) over (order by DeliveryDate)
end) as first_failureId
from (select t.*,
lag(status) over (order by DeliveryDate) as prev_status
from t
) t
order by DeliveryId;
这是一个"间隙和岛";问题我最喜欢的文章是https://blog.jooq.org/2016/04/25/10-sql-tricks-that-you-didnt-think-were-possible/,提示#4。
解决孤岛的关键是使用键(即日期(-row_number,这样可以将具有相同编号的孤岛分组。结果与您的类似,但在不使用lag
的情况下,它可以处理任意多个故障。你只想在这里使用失败:
select *,DeliveryDate-row_number() over (order by DeliveryId asc) as grp
from table1
where Status='Failure'
既然有了这个,就可以在上面添加dense_rank(((按grp排序(来获得失败次数,也可以合并成功次数:
;with cte as
(
select *,DeliveryDate-row_number() over (order by DeliveryId asc) as grp
from table1
where Status='Failure'
)
select
cte.*,dense_rank() over (order by grp) as FailureNum
from cte
union all
select *,null as grp,null as FailureNum
from table1
where Status='Success'
这是我正在进行的工作。一旦Status
变回Success
,需要找到重新启动计数的解决方法
使用https://dbfiddle.uk/
CREATE TABLE test
(
DeliveryId int NOT NULL,
DeliveryDate date NOT NULL,
Status varchar(max) NOT NULL)
INSERT INTO test (DeliveryId,DeliveryDate,Status)
values
(1, '2021-04-14', 'Success'),
(2, '2021-04-15', 'Failure'),
(3, '2021-04-16', 'Success'),
(4, '2021-04-17', 'Success'),
(5, '2021-04-18', 'Failure'),
(6, '2021-04-19', 'Failure'),
(7, '2021-04-20', 'Failure'),
(8, '2021-04-21', 'Success')
with cte as
(
SELECT DeliveryId,
DeliveryDate,
Status,
1 as FailedDeliveryId
FROM test
WHERE DeliveryId = 1
UNION all
SELECT t.DeliveryId,
t.DeliveryDate,
t.Status,
cte.FailedDeliveryId + CASE WHEN cte.Status <> t.Status THEN 1 ELSE 0 END
FROM test as t INNER JOIN cte ON t.DeliveryId - 1 = cte.DeliveryId
)
SELECT
DeliveryId,
DeliveryDate,
Status,
FailedDeliveryId
FROM cte
ORDER BY DeliveryId