OVER子句中的DistinctCount+LastNonEmpty



我过去有以下代码,以便对所有客户执行所有"-"操作:

with
T1 as
(
select
[Contract] = 'Contract1',
[Customer] = 'Customer4',
[Date] = '2017-01-01',
[Action] = '+'
union all 
select
[Contract] = 'Contract1',
[Customer] = 'Customer6',
[Date] = '2017-01-02',
[Action] = '+'
union all 
select
[Contract] = 'Contract1',
[Customer] = 'Customer4',
[Date] = '2017-01-03',
[Action] = '-'
union all 
select
[Contract] = 'Contract1',
[Customer] = 'Customer4',
[Date] = '2017-01-04',
[Action] = '+'
union all 
select
[Contract] = 'Contract1',
[Customer] = 'Customer4',
[Date] = '2017-01-05',
[Action] = '-'
union all 
select
[Contract] = 'Contract1',
[Customer] = 'Customer6',
[Date] = '2017-01-06',
[Action] = '-'
union all 
select
[Contract] = 'Contract1',
[Customer] = 'Customer8',
[Date] = '2017-01-07',
[Action] = '+'
union all 
select
[Contract] = 'Contract1',
[Customer] = 'Customer8',
[Date] = '2017-01-08',
[Action] = '-'
union all 
select
[Contract] = 'Contract1',
[Customer] = 'Customer4',
[Date] = '2017-01-09',
[Action] = '+'
)
select 
[Customer],
[Date]  
from T1
where [Action] = '-'

现在我需要在合同字段上做这件事。这意味着当最后一次操作为"-"时,我必须返回Contract和Date值,而不是在该日期之前执行"+"操作的所有客户。非常期望的输出应该是:

Date         | Contract
------------ | ------ 
2017-01-06   | Contract1   
2017-01-08   | Contract1

预期的算法应该如下所示:

[PlusDC] = count(distinct iif([Action] = '+',Customer,NULL)) over (partition by [Contract] order by [Date])
[MinusDC] = count(distinct iif([Action] = '-',Customer,NULL)) over (partition by [Contract] order by [Date])

但是:

  • 无论如何都不起作用
  • 即使它有效,即使[PlusDC]=[MinusDC],它也会返回值2017-01-09,这是不正确的

粗略地说,我必须针对所有客户检查以下代码:

  1. [Action]='-'用于当前行。

  2. 每个客户的lag([Action],1)='-'(如果客户记录出现在当天晚些时候,则为Null)。

更新:为了让事情变得更加清楚,我对我的数据进行了一个以列为导向的视图:

-----------------------------------------------------------------------
| Date         | Contract  | Customer4  | Customer6 | Customer8 | All |
| ------------ | --------- | ---------  | --------- | --------- | --- |
| 2017-01-01   | Contract1 |     +      |           |           |     |
| 2017-01-02   | Contract1 |            |     +     |           |     |
| 2017-01-03   | Contract1 |            |           |           |     |
| 2017-01-04   | Contract1 |     -      |           |           |     | <-- Customer6 still has a '+'
| 2017-01-05   | Contract1 |     +      |           |           |     | 
| 2017-01-06   | Contract1 |     -      |           |           |     | <-- Customer6 still has a '+'
| 2017-01-07   | Contract1 |            |     -     |           |  -  | <-- All customers has '-' or null as a last action
| 2017-01-08   | Contract1 |            |           |     +     |     |
| 2017-01-09   | Contract1 |            |           |     -     |  -  | <-- All customers has '-' or null as a last action
-----------------------------------------------------------------------

All列表示所有客户(我需要的行)的实际状态。正如您可能注意到的,2017-01-04和2017-01-06在合同字段中不是真实的"-"。Contract1未关闭,它仍有一个Customer6处于打开状态。当每个合同有一定数量的客户时,这很容易。那么无数呢?

有什么实用的建议吗?

好的,我将像您一样首先填写一个表来解决这个问题。我要做的是在每个日期重复客户和合同的每一个组合。

我把这个CTE附加到你的示例代码中:

,
FullTable as
(
select
a.[Contract]
,a.[Customer]
,b.[Date]
,c.[Action]
,count(c.[Action]) over (partition by a.[Contract],a.[Customer] order by b.[Date]) c
from
(select distinct  
[Contract],
[Customer]
from T1) a
inner join 
(select distinct  
[Contract],
[Date]
from T1) b
on a.[Contract]=b.[Contract]
left join t1 c
on c.[Contract]=a.[Contract] and a.[Customer]=c.[Customer] and b.[Date]=c.[Date]
)

现在Fulltable做两件事,它确保每个客户每天都有一行。如果在源数据中没有针对该客户的操作,则action为NULL。我要做的第二件事是使用窗口计数来计数以前的操作次数

count(c.[Action]) over (partition by a.[Contract],a.[Customer] order by b.[Date]) c

计数不计算NULL值,因此这实际上是对数据进行分组,每个客户一组,每个日期都有一个值,并且任何直接在NULL操作之后的行都会获得相同的组

这是客户4 的数据

Contract    Customer    Date        c   Action
Contract1   Customer4   2017-01-01  1   +
Contract1   Customer4   2017-01-02  1   NULL
Contract1   Customer4   2017-01-03  2   -
Contract1   Customer4   2017-01-04  3   +
Contract1   Customer4   2017-01-05  4   -
Contract1   Customer4   2017-01-06  4   NULL
Contract1   Customer4   2017-01-07  4   NULL
Contract1   Customer4   2017-01-08  4   NULL
Contract1   Customer4   2017-01-09  5   +

现在我制作了一个新的CTE,名为DaillyStatus。这个CTE填充NULLS,这样每天都会保存该合同和客户的最新状态,而不是NULL。这意味着,对于表中的每一天,都可以找到每个客户合同组合的状态。要做到这一点,我只需为我刚刚找到的的每个组获取MAX

,DailyStatus as
(
select
[Contract]
,[Customer]
,[Date]
,[Action]
,c
,max([Action]) over (partition by [Contract],[Customer],c) FilledAction
from
FullTable
)
Contract    Customer    Date        c   FilledAction    Action
Contract1   Customer6   2017-01-01  0   NULL    NULL
Contract1   Customer6   2017-01-02  1   +       +
Contract1   Customer6   2017-01-03  1   +       NULL
Contract1   Customer6   2017-01-04  1   +       NULL
Contract1   Customer6   2017-01-05  1   +       NULL
Contract1   Customer6   2017-01-06  2   -       -
Contract1   Customer6   2017-01-07  2   -       NULL
Contract1   Customer6   2017-01-08  2   -       NULL
Contract1   Customer6   2017-01-09  2   -       NULL

使用此表,我们可以获得表中每个日期的每个客户的状态。由于'+'>'-'>NULL,我们可以找到所有客户都有'-'作为最新操作或在该日期或之前没有操作的日期(NULL)

select
[Contract]
,[Date]
,max(FilledAction) 
from  DailyStatus
group by [Contract],[Date]
having max(FilledAction) ='-'

完整的解决方案在这里:

,FullTable as
(
select
a.[Contract]
,a.[Customer]
,b.[Date]
,c.[Action]
,count(c.[Action]) over (partition by a.[Contract],a.[Customer] order by b.[Date]) c
from
(select distinct  
[Contract],
[Customer]
from T1) a
inner join 
(select distinct  
[Contract],
[Date]
from T1) b
on a.[Contract]=b.[Contract]
left join t1 c
on c.[Contract]=a.[Contract] and a.[Customer]=c.[Customer] and b.[Date]=c.[Date]
)
,DailyStatus as
(
select
[Contract]
,[Customer]
,[Date]
,[Action]
,c
,max([Action]) over (partition by [Contract],[Customer],c) FilledAction
from
FullTable
)
select
[Contract]
,[Date]
,max(FilledAction) 
from  DailyStatus
group by [Contract],[Date]
having max(FilledAction) ='-'

我认为您可以这样使用ROW_NUMBER()

;with tt as (
select T1.[Contract], T1.[Date], T1.[Action], t.[Customer], t.[Action] lAction, t.[Date] lDate
-- this `rn` will give me the last action for each other customer older that each Date
, row_number() over (partition by T1.[Contract], T1.[Date], t.[Customer] order by t.[Date] desc) rn
from T1
-- I use this self left join to gather data with:
left join T1 t
on T1.[Contract] = t.[Contract]    -- same Contract
and T1.[Date] > t.[Date]           -- older than current date
and T1.[Customer] != t.[Customer]  -- for other customers
-- So I will have actions of other customer older than each date
)
select [Contract], [Date]
from T1
-- I just check if there is not any data in `tt` with:
where not exists(
select 1
from tt 
where tt.[Contract] = T1.[Contract]   -- same contract
and tt.[Date] = T1.[Date]           -- same date
and rn = 1                          -- only last action
and (T1.[Action] = '+'              -- current customer's action is '+'
or isnull(lAction, '+') = '+')      -- or others last actions is '+'
)   
group by [Contract], [Date];

多亏了这些答案,我想出了自己的解决方案,它和Søren的一样正确,也和shA.t的一样快。谢谢大家!

,[SelfJoinSet] as (
select 
T1.[Contract], -- Contract from the base
T1.[Date], -- Date from the base
T2.[Customer], -- Customers from the current contract
T2.[Action], -- ActionType of the customer
-- RowNumber in order to get the last record for each customer within the current contract:
[RN] = row_number() over (partition by T2.[Customer],T1.[Date] order by T2.[Date] desc) 
from T1
left join T1 as T2 -- SelfJoin 
on T1.[Contract] = T2.[Contract] -- Within the current contract
and T1.[Date] >= T2.[Date] -- Get all records till the current date    
where T1.[Action] = '-' --Filter out all '+' actions, we'll get only '-' records anyway (for performance optimization reasons).
)
select 
[Contract],
[Date]
from [SelfJoinSet]
where [RN] = 1 --Show only last record per each customer
group by [Contract],[Date] -- Collapse all to the base table records
having count(distinct iif([Action] = '-',[Customer],NULL)) = count(distinct [Customer]) -- Leave only records where the last action is  '-' for all customers

最新更新