SQL Server:查找Cust与连续注册



我有一个任务是解决行业中众所周知的问题,任务是识别那些连续活动的客户,在给定的时间内,我们允许合同之间的小休息。

我做了第一部分填充矩阵表,就像下面的代码片段一样,在整个时间段内填充矩阵表,并设置标志,如果它在这个日期是活跃的,我认为这是唯一可靠的方法,因为合同可以有重叠,等等。

所以现在我需要检查CustID是否为contact activity的1/0,我卡住了如何跟踪这个任务,假设在我的例子中有3天的休息,这是可以的,但我需要确保这些天是一个接一个的。

你有什么好主意,我可以很好地做到这一点,感谢你的帮助和引导。我看过一些例子,但他们是用SAS做的,所以很难理解。

declare @maxBreak int = 3 -- 3 days max allowed for continuse contract
declare @PeriodStart date =  '2015-1-11', @PeriodEnd date =  '2015-1-19'; 
;with matrix_dd as 
(
    select *  
    from 
        (select    111 CustID, '2015-1-11' dd, 1 Active   union  
         select    111 CustID, '2015-1-12' dd, 0 Active   union
         select    111 CustID, '2015-1-13' dd, 0 Active   union 
         select    111 CustID, '2015-1-14' dd, 0 Active   union
         select    111 CustID, '2015-1-15' dd, 1 Active   union 
         select    111 CustID, '2015-1-16' dd, 1 Active   union
         select    111 CustID, '2015-1-17' dd, 1 Active   union 
         select    111 CustID, '2015-1-18' dd, 1 Active   union
         select    111 CustID, '2015-1-19' dd, 0 Active   union 
         select    111 CustID, '2015-1-20' dd, 0 Active) a
)
select * 
from matrix_dd

最好M

此解决方案计算活动范围以及自上次间隔结束以来的休息时间:

declare @maxBreak int = 3 -- 3 days max allowed for continuse contract
declare @PeriodStart date =  '2015-1-11', @PeriodEnd date =  '2015-1-19'; 
with matrix_dd as 
(
    select *  from ( values
        (111, '2015-1-11', 1 ),
        (111, '2015-1-12', 0 ),
        (111, '2015-1-13', 0 ),
        (111, '2015-1-14', 0 ),
        (111, '2015-1-15', 1 ),
        (111, '2015-1-16', 1 ),
        (111, '2015-1-17', 1 ),
        (111, '2015-1-18', 1 ),
        (111, '2015-1-19', 0 ),
        (111, '2015-1-20', 0 )
    ) as x(CustID, dd, Active)
), active_with_groups as (
    select *,
        row_number() over (partition by CustID order by dd) - 
           datediff(day, '2000-01-01', dd) as gid
    from matrix_dd
    where active = 1
        and dd between @PeriodStart and @PeriodEnd
), islands as (
    select CustId, min(dd) as islandStart, max(dd) as islandEnd
    from active_with_groups
    group by CustID, gid
), islands_with_gaps as (
    select *, 
        datediff(
            day, 
            lag(islandEnd, 1, islandStart) 
               over (partition by CustID order by islandStart), 
            islandStart
        ) - 1 as [break]
    from islands
)
select * 
from islands_with_gaps
where [break] >= @maxBreak
order by islandStart

让我们来分析一下。在"active_with_groups"公共表表达式(CTE)中,我所做的就是通过使用datediff()将日期转换为具有相同关系的整数。为什么?对于这个问题,整数更容易处理。请注意,我也使用row_number()来获得一个连续序列,然后获得它与datediff()值之间的差值。关键的观察结果是,如果天数不连续上升,那么差异将是不同的。同样地,如果日期连续向上,则差异将相同。因此,可以使用此值作为连续范围内的值的组标识符。

接下来,我们使用该组标识符进行分组(打赌您没有看到这一点!)。这就给出了每个区间的开始和结束。这里没有什么很聪明的事。

下一步是计算上一个间隔结束和当前间隔开始之间经过的时间量。为此,我们使用对lag()函数的简单调用。这里唯一需要注意的是,我选择让lag()函数在第一个间隔的情况下发出默认值islandStart。它也可以很容易地没有默认值(这会导致它发出一个NULL值)。

最后,我们寻找间隔超过指定阈值的间隔。

和Ben的回答差不多。我假设所有的日期都在数据中表示。所以我们只需要确保0的长度不超过3。

with inactive_runs as (
    select
        CustID,
        row_number() over (partition by CustID order by dd)
            - datediff(day, min(dd) over (partition by CustID), dd) as grp
    from matrix_dd
    where Active = 0
)
select distinct CustID from matrix_dd m
where 3 >= all (
    select count(*) from inactive_runs ir
    where ir.CustID = m.CustID
    group by grp
);
http://rextester.com/AHI22250

使用all并不是特别常见。这是另一个选项:

...
with inactive_runs as (
    select
        CustID, dd, /* <-- had to add dd */
        row_number() over (partition by CustID order by dd)
            - datediff(day, min(dd) over (partition by CustID), dd) as grp
    from #matrix_dd
    where Active = 0
)
select distinct CustID from matrix_dd m
where not exists (
    select 1 from inactive_runs ir
    where ir.CustID = m.CustID
    group by grp
    having datediff(day, min(dd), max(dd)) > 2
);
我看了一下你上面的评论。我想这证实了我的怀疑,你每个日期都有一排。如果你有一个新版本的SQL Server,你可以对前三行求和。不幸的是,如果长度是可变的,则不能使用变量来指定窗口大小:
with cust as (
    select
        CustID,
        case when
        sum(case when Active = 0 then 1 end) over (
                partition by CustID
                order by dd
                rows between 3 preceding and current row 
            ) = 4 then 1
        end as isBrk
    from matrix_dd
)
select CustID
from cust
group by CustID
having count(isBrk) = 0;
编辑:

根据您对"预矩阵"格式的数据的评论,是的,这是一个更简单的查询。此时,您只需要查看前一个结束日期和当前行的开始日期。

with data as (
    select * from (
        values (111, 1230, '2014-12-11', '2015-01-11'),
               (111, 1231, '2015-01-15', '2015-01-18'),
               (111, 1232, '2015-03-22', '2015-04-01') 
        ) as t (CustID, ContractID, StartDD, EndDD)
), gaps as (
    select
        CustID,
        datediff(day,
            lag(EndDD, 1, StartDD) over (partition by CustID order by StartDD),
            StartDD
        ) as days
    from data
)
select CustID
from gaps
group by CustID;
having max(days) <= 3;

最新更新