在SQL中查找重叠的事件并保留较长的事件



我试图过滤我的事件表,进行一些统计,目标是在事件发生时从一个已经有活动event的对象中排除事件。

例如:ObjectId 4触发了事件1,但在执行此事件的过程中,它也触发了事件2。由于事件2在执行事件1的时间范围内,因此需要丢弃事件1。

DateIni             DateEnd               ObjectId  Description
2021-04-24 11:59:24 2021-04-24 12:00:24   4         Event 1
2021-04-24 11:59:25 2021-04-24 12:00:23   4         Event 2
2021-04-23 20:23:35 2021-04-23 20:24:35   11        Event 3
2021-04-23 20:23:37 2021-04-23 20:24:32   11        Event 4
2021-04-23 15:17:12 2021-04-23 15:18:12   2         Event 5
2021-04-23 15:17:14 2021-04-23 15:18:09   2         Event 6

我正在做这件事,但直到现在都没有成功。

from Table A
INNER join Table B
on (A.DateEnd < B.DateIni AND A.DateIni >= B.DateEnd)
or (A.DateEnd >= B.DateIni AND A.DateIni <= B.DateEnd)
or (A.DateEnd > B.DateIni AND A.DateIni < B.DateEnd)
or (A.DateEnd <= B.DateIni AND A.DateIni > B.DateEnd)
and B.ObjectId = A.ObjectId 
and B.Description != A.Description

关于如何做到这一点,有什么建议吗?

是否可以在日期不重叠的地方包含事件?如果不是,那么只在集合上使用row_number((并获取第一个事件会更简单。

你可以这样做——可能有更好的选择,但这应该有效:

--==== Sample data
Declare @testData Table (DateIni datetime, DateEnd datetime, ObjectId int, Description varchar(30));
Insert Into @testData (DateIni, DateEnd, ObjectId, [Description])
Values ('2021-04-24 11:59:24', '2021-04-24 12:00:24',  4, 'Event 1')
, ('2021-04-24 11:59:25', '2021-04-24 12:00:23',  4, 'Event 2')
, ('2021-04-24 12:59:25', '2021-04-24 13:00:23',  4, 'Event 7')
, ('2021-04-23 20:23:35', '2021-04-23 20:24:35', 11, 'Event 3')
, ('2021-04-23 20:23:37', '2021-04-23 20:24:32', 11, 'Event 4')
, ('2021-04-23 21:23:37', '2021-04-23 21:24:32', 11, 'Event 8')
, ('2021-04-23 15:17:12', '2021-04-23 15:18:12',  2, 'Event 5')
, ('2021-04-23 15:17:14', '2021-04-23 15:18:09',  2, 'Event 6')
, ('2021-04-23 16:17:14', '2021-04-23 16:18:09',  2, 'Event 9');
--==== Result using above sample data
With previousDates
As (
Select *
, PreviousEndDate = lag(td.DateEnd, 1) over(Partition By td.ObjectId Order By td.DateIni)
, PreviousStartDate = lag(td.DateIni, 1) over(Partition By td.ObjectId Order By td.DateIni)
From @testData               td
)
Select pd.DateIni
, pd.DateEnd
, pd.ObjectId
, pd.Description
From previousDates           pd     
Where 1 = Case When pd.DateIni Between pd.PreviousStartDate And pd.PreviousEndDate Then 0
When pd.DateEnd Between pd.PreviousStartDate And pd.PreviousEndDate Then 0
Else 1
End;

现在-如果我们关心的只是下一个事件何时开始-如果下一个活动在上一个活动完成之前开始:

--==== Result using above sample data
With previousDates
As (
Select *
, PreviousEndDate = lag(td.DateEnd, 1) over(Partition By td.ObjectId Order By td.DateIni)
From @testData               td
)
Select pd.DateIni
, pd.DateEnd
, pd.ObjectId
, pd.Description
From previousDates           pd     
Where pd.PreviousEndDate Is Null
Or pd.PreviousEndDate < pd.DateIni;

或者-我们可以确定这个事件是否是第一个事件,并通过这种方式进行过滤:

--==== Result using above sample data
With firstEvents
As (
Select *
, isFirstEvent = iif(lag(td.DateEnd, 1, td.DateEnd) over(Partition By td.ObjectId Order By td.DateIni) <= td.DateEnd, 1, 0)
From @testData               td
)
Select fe.DateIni
, fe.DateEnd
, fe.ObjectId
, fe.Description
From firstEvents             fe
Where fe.isFirstEvent = 1;

您只需使用not exists:

select t.*
from t
where not exists (select 1
from t t2
where t2.objectid = t.objectid and
t2.dateini < t.dateini and
t2.dateend > t.dateend
);

为了提高性能,您需要(objectid, dateini, dateend)上的索引。

最新更新