我有一个任务是解决行业中众所周知的问题,任务是识别那些连续活动的客户,在给定的时间内,我们允许合同之间的小休息。
我做了第一部分填充矩阵表,就像下面的代码片段一样,在整个时间段内填充矩阵表,并设置标志,如果它在这个日期是活跃的,我认为这是唯一可靠的方法,因为合同可以有重叠,等等。
所以现在我需要检查CustID是否为contact activity的1/0,我卡住了如何跟踪这个任务,假设在我的例子中有3天的休息,这是可以的,但我需要确保这些天是一个接一个的。
你有什么好主意,我可以很好地做到这一点,感谢你的帮助和引导。我看过一些例子,但他们是用SAS做的,所以很难理解。
declare @maxBreak int = 3 -- 3 days max allowed for continuse contract
declare @PeriodStart date = '2015-1-11', @PeriodEnd date = '2015-1-19';
;with matrix_dd as
(
select *
from
(select 111 CustID, '2015-1-11' dd, 1 Active union
select 111 CustID, '2015-1-12' dd, 0 Active union
select 111 CustID, '2015-1-13' dd, 0 Active union
select 111 CustID, '2015-1-14' dd, 0 Active union
select 111 CustID, '2015-1-15' dd, 1 Active union
select 111 CustID, '2015-1-16' dd, 1 Active union
select 111 CustID, '2015-1-17' dd, 1 Active union
select 111 CustID, '2015-1-18' dd, 1 Active union
select 111 CustID, '2015-1-19' dd, 0 Active union
select 111 CustID, '2015-1-20' dd, 0 Active) a
)
select *
from matrix_dd
最好M
此解决方案计算活动范围以及自上次间隔结束以来的休息时间:
declare @maxBreak int = 3 -- 3 days max allowed for continuse contract
declare @PeriodStart date = '2015-1-11', @PeriodEnd date = '2015-1-19';
with matrix_dd as
(
select * from ( values
(111, '2015-1-11', 1 ),
(111, '2015-1-12', 0 ),
(111, '2015-1-13', 0 ),
(111, '2015-1-14', 0 ),
(111, '2015-1-15', 1 ),
(111, '2015-1-16', 1 ),
(111, '2015-1-17', 1 ),
(111, '2015-1-18', 1 ),
(111, '2015-1-19', 0 ),
(111, '2015-1-20', 0 )
) as x(CustID, dd, Active)
), active_with_groups as (
select *,
row_number() over (partition by CustID order by dd) -
datediff(day, '2000-01-01', dd) as gid
from matrix_dd
where active = 1
and dd between @PeriodStart and @PeriodEnd
), islands as (
select CustId, min(dd) as islandStart, max(dd) as islandEnd
from active_with_groups
group by CustID, gid
), islands_with_gaps as (
select *,
datediff(
day,
lag(islandEnd, 1, islandStart)
over (partition by CustID order by islandStart),
islandStart
) - 1 as [break]
from islands
)
select *
from islands_with_gaps
where [break] >= @maxBreak
order by islandStart
让我们来分析一下。在"active_with_groups"公共表表达式(CTE)中,我所做的就是通过使用datediff()
将日期转换为具有相同关系的整数。为什么?对于这个问题,整数更容易处理。请注意,我也使用row_number()
来获得一个连续序列,然后获得它与datediff()
值之间的差值。关键的观察结果是,如果天数也不连续上升,那么差异将是不同的。同样地,如果日期连续向上,则差异将相同。因此,可以使用此值作为连续范围内的值的组标识符。
接下来,我们使用该组标识符进行分组(打赌您没有看到这一点!)。这就给出了每个区间的开始和结束。这里没有什么很聪明的事。
下一步是计算上一个间隔结束和当前间隔开始之间经过的时间量。为此,我们使用对lag()
函数的简单调用。这里唯一需要注意的是,我选择让lag()
函数在第一个间隔的情况下发出默认值islandStart
。它也可以很容易地没有默认值(这会导致它发出一个NULL值)。
最后,我们寻找间隔超过指定阈值的间隔。
和Ben的回答差不多。我假设所有的日期都在数据中表示。所以我们只需要确保0的长度不超过3。
with inactive_runs as (
select
CustID,
row_number() over (partition by CustID order by dd)
- datediff(day, min(dd) over (partition by CustID), dd) as grp
from matrix_dd
where Active = 0
)
select distinct CustID from matrix_dd m
where 3 >= all (
select count(*) from inactive_runs ir
where ir.CustID = m.CustID
group by grp
);
http://rextester.com/AHI22250 使用all
并不是特别常见。这是另一个选项:
...
with inactive_runs as (
select
CustID, dd, /* <-- had to add dd */
row_number() over (partition by CustID order by dd)
- datediff(day, min(dd) over (partition by CustID), dd) as grp
from #matrix_dd
where Active = 0
)
select distinct CustID from matrix_dd m
where not exists (
select 1 from inactive_runs ir
where ir.CustID = m.CustID
group by grp
having datediff(day, min(dd), max(dd)) > 2
);
我看了一下你上面的评论。我想这证实了我的怀疑,你每个日期都有一排。如果你有一个新版本的SQL Server,你可以对前三行求和。不幸的是,如果长度是可变的,则不能使用变量来指定窗口大小:
with cust as (
select
CustID,
case when
sum(case when Active = 0 then 1 end) over (
partition by CustID
order by dd
rows between 3 preceding and current row
) = 4 then 1
end as isBrk
from matrix_dd
)
select CustID
from cust
group by CustID
having count(isBrk) = 0;
编辑:根据您对"预矩阵"格式的数据的评论,是的,这是一个更简单的查询。此时,您只需要查看前一个结束日期和当前行的开始日期。
with data as (
select * from (
values (111, 1230, '2014-12-11', '2015-01-11'),
(111, 1231, '2015-01-15', '2015-01-18'),
(111, 1232, '2015-03-22', '2015-04-01')
) as t (CustID, ContractID, StartDD, EndDD)
), gaps as (
select
CustID,
datediff(day,
lag(EndDD, 1, StartDD) over (partition by CustID order by StartDD),
StartDD
) as days
from data
)
select CustID
from gaps
group by CustID;
having max(days) <= 3;