在TSQL中对连续值计数一次

我有一个表，其中有快递公司的快递数据。我每天都安排了一次送货。我想统计一下失败交货的次数。如果连续几天交货失败，则应将其计算为1。例如，交付安排在2021年4月的每一天。4月15日交货失败。之后，从4月18日到4月20日，交货再次失败。虽然交付失败的天数是4天，但我想将失败的次数算作2，因为连续失败的天数将算作1。

DeliveryId     DeliveryDate     Status
1              2021-04-14       Success
2              2021-04-15       Failure
3              2021-04-16       Success
4              2021-04-17       Success
5              2021-04-18       Failure
6              2021-04-19       Failure
7              2021-04-20       Failure
8              2021-04-21       Success

我想要一个新的列，显示第一次失败的deliveryId，如下所示。

DeliveryId     DeliveryDate     Status     FailedDeliveryId
1              2021-04-14       Success     
2              2021-04-15       Failure     2
3              2021-04-16       Success     
4              2021-04-17       Success     
5              2021-04-18       Failure     5
6              2021-04-19       Failure     5
7              2021-04-20       Failure     5
8              2021-04-21       Success

我尝试了几种选择，但未能达到上述结果。我在查询中使用了LAG函数来查找以前的交货状态。但问题是，如果交付失败超过3或4天，那么我将不得不使用LAG功能返回3或4天后检查状态。我想让它成为一个动态查询。以下是我使用的

SELECT *, 
CASE WHEN Status='Failure' AND Prev_Status='Success' THEN DeliveryId 
WHEN Status='Failure' AND Prev_Status='Failure' THEN Prev_DeliveryId 
END AS FailureInstance 
FROM (
SELECT *, 
LAG(Status,1) OVER(ORDER BY DeliveryDate ASC) Prev_Status,
LAG(DeliveryId,1) OVER(ORDER BY DeliveryDate ASC) Prev_DeliveryId
FROM   table1 
) A

…

declare @t table(DeliveryId int, DeliveryDate date, Status varchar(10));
insert into @t(DeliveryId, DeliveryDate, Status)
values
(10, '2021-04-14', 'Success'),
(20, '2021-04-15', 'Failure'),
(30, '2021-04-16', 'Success'),
(40, '2021-04-17', 'Success'),
(70, '2021-04-18', 'Failure'),
(60, '2021-04-19', 'Failure'),
(50, '2021-04-20', 'Failure'),
(80, '2021-04-21', 'Success');

select *, 
case when Status='Failure' then min(grpDeliveryId) over(partition by grp) end as FailedDeliveryId,
case when Status='Failure' then datediff(minute, min(DeliveryDate) over(partition by grp), min(grpSuccessDate) over(partition by grp) ) end as MinutesDiffFailSuccess
from
(
select *, 
sum(addorcountme) over(order by DeliveryDate) as grp
from
(
select *, 
case when Status='Failure' and lag(Status) over(order by DeliveryDate)='Failure' then null else 1 end as addorcountme,
case when Status='Failure' and lag(Status) over(order by DeliveryDate)='Failure' then null else DeliveryId end as grpDeliveryId,
case when Status='Failure' and lead(Status) over(order by DeliveryDate)='Success' then lead(DeliveryDate) over(order by DeliveryDate) end as grpSuccessDate
from @t
) as t
) as g;

/*
select *, case when Status='Failure' then min(DeliveryId) over(partition by grp) end as FailedDeliveryId
from
(
select *, 
sum(sumorcountme) over(order by DeliveryDate) as grp
from
(
select *, case when Status='failure' and lag(Status) over(order by DeliveryDate)='Failure' then null else 1 end as sumorcountme
from @t
) as t
) as g; */

一个方法只使用窗口函数。您可以根据每行的成功次数为每组失败分配一个组。然后，对于每个故障组，只需取故障的最小id：

select t.*,
(case when status = 'Failure'
then min(case when status = 'Failure' then DeliveryId end) over (partition by grp)
end) as first_failureId
from (select t.*,
sum(case when status = 'Success' then 1 else 0 end) over (order by DeliveryDate) as grp
from t
) t;

这里有一个db<gt；不停摆弄

实际上，一个稍微简单一点的版本以相反的顺序分配组，所以第一条记录是失败的，而不是成功的，所以min()中的case是不必要的：

select t.*,
(case when status = 'Failure'
then min(DeliveryId) over (partition by grp)
end) as first_failureId
from (select t.*,
sum(case when status = 'Success' then 1 else 0 end) over (order by DeliveryDate desc) as grp
from t
) t
order by DeliveryId;

嗯。另一种方法使用CCD_ 3来检测状态何时改变。然后只对变化使用累积最大值：

select t.*,
(case when status = 'Failure'
then max(case when prev_status is null or prev_status <> status then DeliveryId end) over (order by DeliveryDate)
end) as first_failureId
from (select t.*,
lag(status) over (order by DeliveryDate) as prev_status
from t
) t
order by DeliveryId;

这是一个"间隙和岛"；问题我最喜欢的文章是https://blog.jooq.org/2016/04/25/10-sql-tricks-that-you-didnt-think-were-possible/，提示#4。

解决孤岛的关键是使用键(即日期(-row_number，这样可以将具有相同编号的孤岛分组。结果与您的类似，但在不使用lag的情况下，它可以处理任意多个故障。你只想在这里使用失败：

select *,DeliveryDate-row_number() over (order by DeliveryId asc) as grp
from table1
where Status='Failure'

既然有了这个，就可以在上面添加dense_rank(((按grp排序(来获得失败次数，也可以合并成功次数：

;with cte as
(
select *,DeliveryDate-row_number() over (order by DeliveryId asc) as grp
from table1
where Status='Failure'
)
select 
cte.*,dense_rank() over (order by grp) as FailureNum
from cte
union all
select *,null as grp,null as FailureNum
from table1
where Status='Success'

这是我正在进行的工作。一旦Status变回Success，需要找到重新启动计数的解决方法

使用https://dbfiddle.uk/

CREATE TABLE test
(
DeliveryId int NOT NULL,
DeliveryDate date NOT NULL,
Status varchar(max) NOT NULL)
INSERT INTO test (DeliveryId,DeliveryDate,Status) 
values 
(1, '2021-04-14', 'Success'),
(2, '2021-04-15', 'Failure'),
(3, '2021-04-16', 'Success'),
(4, '2021-04-17', 'Success'),
(5, '2021-04-18', 'Failure'),
(6, '2021-04-19', 'Failure'),
(7, '2021-04-20', 'Failure'),
(8, '2021-04-21', 'Success')
with cte as
(
SELECT DeliveryId,
DeliveryDate,
Status,
1 as FailedDeliveryId
FROM test
WHERE DeliveryId = 1
UNION all
SELECT t.DeliveryId,
t.DeliveryDate,
t.Status,
cte.FailedDeliveryId + CASE WHEN cte.Status <> t.Status THEN 1 ELSE 0 END
FROM test as t INNER JOIN cte ON t.DeliveryId - 1 = cte.DeliveryId
)

SELECT 
DeliveryId,
DeliveryDate,
Status,
FailedDeliveryId
FROM cte       
ORDER BY DeliveryId

相关内容

最新更新

热门标签：