SQL缺口/孤岛问题-确定某人是否已经工作了X年而没有休息Y天



为日本一家公司解决问题。政府有一些规定,比如…如果你持工作签证:

  1. 如果不休息30天,您在公司的工作时间不能超过3年
  2. 你不能在同一家人事公司工作超过5年而不休假6个月

因此,我们想弄清楚在未来30/60/90天内是否有人会违反这两条规则。

样本数据(合同清单(:

if object_id('tempdb..#sampleDates') is not null drop table #sampleDates
create table #sampleDates (UserId int, CompanyID int, WorkPeriodStart datetime, WorkPeriodEnd datetime)
insert #sampleDates (UserId, CompanyID, WorkPeriodStart, WorkPeriodEnd) values (27809, 972, '2019-10-10', '2020-10-10')
insert #sampleDates (UserId, CompanyID, WorkPeriodStart, WorkPeriodEnd) values (27853, 484, '2019-10-10', '2020-10-10')
insert #sampleDates (UserId, CompanyID, WorkPeriodStart, WorkPeriodEnd) values (27856, 172, '2019-10-10', '2020-10-10')
insert #sampleDates (UserId, CompanyID, WorkPeriodStart, WorkPeriodEnd) values (27857, 1234, '2015-01-01', '2015-12-31')
insert #sampleDates (UserId, CompanyID, WorkPeriodStart, WorkPeriodEnd) values (27857, 1234, '2016-01-01', '2017-02-28')
insert #sampleDates (UserId, CompanyID, WorkPeriodStart, WorkPeriodEnd) values (27857, 1234, '2017-01-01', '2017-12-31')
insert #sampleDates (UserId, CompanyID, WorkPeriodStart, WorkPeriodEnd) values (27857, 1234, '2018-01-01', '2018-12-31')
insert #sampleDates (UserId, CompanyID, WorkPeriodStart, WorkPeriodEnd) values (27857, 1234, '2019-01-01', '2020-01-31')
insert #sampleDates (UserId, CompanyID, WorkPeriodStart, WorkPeriodEnd) values (27857, 1234, '2020-01-01', '2020-12-31')
insert #sampleDates (UserId, CompanyID, WorkPeriodStart, WorkPeriodEnd) values (27897, 179, '2019-10-10', '2020-10-10')

我的第一个问题可能是日期重叠。我已经接近这个问题的解决方案了,但在我知道如何解决X年/Y天工作假的问题之前,我不确定我的cte或临时表的输出应该是什么样子。

我不希望有人为我做这项工作,但我想找到一篇可以告诉我的文章:

  • 如何确定某人在这段时间内是否休息过,休息了多长时间(日期范围之间的间隔(
  • 我如何计算他们在接下来的30/60/90天里是否已经工作了3/5年而没有休息30/180天

在我开始对程序进行编码之前,这似乎很简单。

感谢您提前提供的帮助。

编辑:

值得一提的是,这是我第二次尝试消除重叠日期(第一个版本使用了dense_rank方法,它一直有效,直到我搞砸了一些事情,用了一些简单的方法(:


;with CJ as (
select UserId, CompanyID, WorkPeriodStart, WorkPeriodEnd from #sampleDates c 
)
select 
c.CompanyID,
c.WorkPeriodStart,
min(t1.WorkPeriodEnd) as EndDate
from CJ c
inner join CJ t1 on c.WorkPeriodStart <= t1.WorkPeriodEnd and c.UserId = t1.UserId and c.CompanyID = t1.CompanyID
and not exists(select * from CJ t2 where t1.UserId = t2.UserId and t1.CompanyID = t2.CompanyID and t1.WorkPeriodEnd >= t2.WorkPeriodStart AND t1.WorkPeriodEnd < t2.WorkPeriodEnd) 
where not exists(select * from CJ c2 where c.UserId = c2.UserId and c.CompanyID = c2.CompanyID and c.WorkPeriodStart > c2.WorkPeriodStart AND c.WorkPeriodStart <= c2.WorkPeriodEnd) 
group by c.UserId, c.CompanyID, c.WorkPeriodStart 
order by c.UserId, c.WorkPeriodStart 

免责声明:这是一个不完整的答案

我稍后可以继续,但这显示了如何计算岛屿。那么识别罪犯就不应该那么复杂了。

请参见增强示例。我添加了用户27897,它有三个岛:0、1和2。见下文:

create table t (UserId int, CompanyID int, WorkPeriodStart date, WorkPeriodEnd date);
insert t (UserId, CompanyID, WorkPeriodStart, WorkPeriodEnd) values
(27809, 972, '2019-10-10', '2020-10-10'),
(27853, 484, '2019-10-10', '2020-10-10'),
(27856, 172, '2019-10-10', '2020-10-10'),
(27857, 1234, '2015-01-01', '2015-12-31'),
(27857, 1234, '2016-01-01', '2017-02-28'),
(27857, 1234, '2017-01-01', '2017-12-31'),
(27857, 1234, '2018-01-01', '2018-12-31'),
(27857, 1234, '2019-01-01', '2020-01-31'),
(27857, 1234, '2020-01-01', '2020-12-31'),
(27897, 179, '2015-05-28', '2015-09-30'),
(27897, 179, '2017-03-11', '2017-04-30'),
(27897, 188, '2017-02-20', '2017-07-07'),
(27897, 179, '2019-10-10', '2020-10-10');

有了这些数据,计算每行孤岛的查询可以看起来像:

select *,
sum(hop) over(partition by UserId order by WorkPeriodStart) as island
from (
select *,
case when WorkPeriodStart > dateadd(day, 1, max(WorkPeriodEnd) 
over(partition by UserId 
order by WorkPeriodStart 
rows between unbounded preceding and 1 preceding))
then 1 else 0 end as hop
from t
) x
order by UserId, WorkPeriodStart

结果:

UserId  CompanyID  WorkPeriodStart  WorkPeriodEnd  hop  island
------  ---------  ---------------  -------------  ---  ------
27809        972  2019-10-10       2020-10-10       0       0
27853        484  2019-10-10       2020-10-10       0       0
27856        172  2019-10-10       2020-10-10       0       0
27857       1234  2015-01-01       2015-12-31       0       0
27857       1234  2016-01-01       2017-02-28       0       0
27857       1234  2017-01-01       2017-12-31       0       0
27857       1234  2018-01-01       2018-12-31       0       0
27857       1234  2019-01-01       2020-01-31       0       0
27857       1234  2020-01-01       2020-12-31       0       0
27897        179  2015-05-28       2015-09-30       0       0
27897        188  2017-02-20       2017-07-07       1       1
27897        179  2017-03-11       2017-04-30       0       1
27897        179  2019-10-10       2020-10-10       1       2

现在,我们可以扩充这个查询以获得";工作日";对于每个岛;休息日";在每个岛屿之前,通过做:

select *,
datediff(day, s, e) + 1 as worked,
datediff(day, lag(e) over(partition by UserId order by island), s) as prev_days_off
from (
select UserId, island, min(WorkPeriodStart) as s, max(WorkPeriodEnd) as e
from (
select *,
sum(hop) over(partition by UserId order by WorkPeriodStart) as island
from (
select *,
case when WorkPeriodStart > dateadd(day, 1, max(WorkPeriodEnd) 
over(partition by UserId 
order by WorkPeriodStart 
rows between unbounded preceding and 1 preceding))
then 1 else 0 end as hop
from t
) x
) y
group by UserId, island
) x
order by UserId, island

结果:

UserId  island  s           e           worked  prev_days_off
------  ------  ----------  ----------  ------  -------------
27809       0  2019-10-10  2020-10-10     367         <null>
27853       0  2019-10-10  2020-10-10     367         <null>
27856       0  2019-10-10  2020-10-10     367         <null>
27857       0  2015-01-01  2020-12-31    2192         <null>
27897       0  2015-05-28  2015-09-30     126         <null>
27897       1  2017-02-20  2017-07-07     138            509
27897       2  2019-10-10  2020-10-10     367            825

这个结果非常接近你所需要的。这些数据实际上对于根据您的条件筛选行非常有用。

此脚本合并任何重叠的工作周期,然后计算前3年和5年内的总工作天数。然后取此值,并确定这是否超过UserIdCompanyId在3年期限内允许的最大工作日,以及仅超过UserId在5年期限内所允许的最大上班日。(这是对你问题中规则的正确解释吗?(

然后,它只需将306090天添加到总数中,看看这个较大的值是否会超过相应的限制。考虑到不同的分组规则,这将是更干净的2个查询(对于5年规则,没有UserId的重复(,但结果仍然是针对任何违规UserId的标志。

在下面的例子中,你可以看到UserId = 27857目前只违反了5年规则,但如果他们再呆60天,也会违反3年规则。此外,UserId = 27858目前还可以,但在60天内将违反5年规则。

我已经对如何定义一年以及WorkPeriodEnd值是否包含做出了一些假设,所以请检查所需的逻辑是否正确应用。

脚本

if object_id('tempdb..#sampleDates') is not null drop table #sampleDates
create table #sampleDates (UserId int, CompanyId int, WorkPeriodStart datetime, WorkPeriodEnd datetime)
insert #sampleDates values
(27809, 972, '2019-10-10', '2020-10-10')
,(27853, 484, '2019-10-10', '2020-10-10')
,(27856, 172, '2019-10-10', '2020-10-10')
,(27857, 1234, '2015-01-01', '2015-12-31')
,(27857, 1234, '2016-01-01', '2017-02-28')
,(27857, 1234, '2017-01-01', '2017-12-31')
,(27857, 1234, '2018-01-01', '2018-12-31')
,(27857, 1234, '2019-01-01', '2020-01-31')
,(27857, 1234, '2020-01-01', '2020-05-31')
,(27858, 1234, '2015-01-01', '2015-12-31')
,(27858, 1234, '2016-01-01', '2017-02-28')
,(27858, 1234, '2017-01-01', '2017-12-31')
,(27858, 1234, '2018-01-01', '2018-12-31')
,(27858, 1234, '2019-09-01', '2020-01-31')
,(27858, 1234, '2020-01-01', '2020-08-31')
,(27859, 12345, '2015-01-01', '2015-12-31')
,(27859, 12346, '2016-01-01', '2017-02-28')
,(27859, 12347, '2017-01-01', '2017-12-31')
,(27859, 12348, '2018-01-01', '2018-12-31')
,(27859, 12349, '2019-01-01', '2020-01-31')
,(27859, 12340, '2020-01-01', '2020-12-31')
,(27897, 179, '2019-10-10', '2020-10-10')
;
declare @3YearsAgo date = dateadd(year,-3,getdate());
declare @3YearWorkingDays int = (365*3)-30;
declare @5YearsAgo date = dateadd(year,-5,getdate());
declare @5YearWorkingDays int = (365*5)-(365/2);
with p as
(
select UserId
,CompanyId
,min(WorkPeriodStart) as WorkPeriodStart
,max(WorkPeriodEnd) as WorkPeriodEnd
from(select l.*,
sum(case when dateadd(day,1,l.PrevEnd) < l.WorkPeriodStart then 1 else 0 end) over (partition by l.UserId, l.CompanyId order by l.WorkPeriodStart rows unbounded preceding) as grp
from(select d.*,
lag(d.WorkPeriodEnd) over (partition by d.UserId, d.CompanyId order by d.WorkPeriodEnd) as PrevEnd
from #sampleDates as d
) as l
) as g
group by grp
,UserId
,CompanyId
)
,d as
(
select UserId
,CompanyId
,sum(case when @3YearsAgo < WorkPeriodEnd
then datediff(day
,case when @3YearsAgo between WorkPeriodStart and WorkPeriodEnd then @3YearsAgo else WorkPeriodStart end
,WorkPeriodEnd
)
else 0
end
) as WorkingDays3YearsToToday

,sum(case when @5YearsAgo < WorkPeriodEnd
then datediff(day
,case when @5YearsAgo between WorkPeriodStart and WorkPeriodEnd then @5YearsAgo else WorkPeriodStart end
,WorkPeriodEnd
)
else 0
end
) as WorkingDays5YearsToToday
from p
group by UserId
,CompanyId
)
select UserId
,CompanyId
,@3YearWorkingDays as Limit3Year
,@5YearWorkingDays as Limit5Year
,WorkingDays3YearsToToday
,WorkingDays5YearsToToday
,case when WorkingDays3YearsToToday > @3YearWorkingDays then 1 else 0 end as Violation3YearNow
,case when sum(WorkingDays5YearsToToday) over (partition by UserId) > @5YearWorkingDays then 1 else 0 end as Violation5YearNow

,case when WorkingDays3YearsToToday + 30 > @3YearWorkingDays then 1 else 0 end as Violation3Year30Day
,case when sum(WorkingDays5YearsToToday) over (partition by UserId) + 30 > @5YearWorkingDays then 1 else 0 end as Violation5Year30Day
,case when WorkingDays3YearsToToday + 60 > @3YearWorkingDays then 1 else 0 end as Violation3Year60Day
,case when sum(WorkingDays5YearsToToday) over (partition by UserId) + 60 > @5YearWorkingDays then 1 else 0 end as Violation5Year60Day
,case when WorkingDays3YearsToToday + 90 > @3YearWorkingDays then 1 else 0 end as Violation3Year90Day
,case when sum(WorkingDays5YearsToToday) over (partition by UserId) + 90 > @5YearWorkingDays then 1 else 0 end as Violation5Year90Day
from d
order by UserId
,CompanyId;

输出

+--------+-----------+------------+------------+--------------------------+--------------------------+-------------------+-------------------+---------------------+---------------------+---------------------+---------------------+---------------------+---------------------+
| UserId | CompanyId | Limit3Year | Limit5Year | WorkingDays3YearsToToday | WorkingDays5YearsToToday | Violation3YearNow | Violation5YearNow | Violation3Year30Day | Violation5Year30Day | Violation3Year60Day | Violation5Year60Day | Violation3Year90Day | Violation5Year90Day |
+--------+-----------+------------+------------+--------------------------+--------------------------+-------------------+-------------------+---------------------+---------------------+---------------------+---------------------+---------------------+---------------------+
|  27809 |       972 |       1065 |       1643 |                      366 |                      366 |                 0 |                 0 |                   0 |                   0 |                   0 |                   0 |                   0 |                   0 |
|  27853 |       484 |       1065 |       1643 |                      366 |                      366 |                 0 |                 0 |                   0 |                   0 |                   0 |                   0 |                   0 |                   0 |
|  27856 |       172 |       1065 |       1643 |                      366 |                      366 |                 0 |                 0 |                   0 |                   0 |                   0 |                   0 |                   0 |                   0 |
|  27857 |      1234 |       1065 |       1643 |                     1029 |                     1760 |                 0 |                 1 |                   0 |                   1 |                   1 |                   1 |                   1 |                   1 |
|  27858 |      1234 |       1065 |       1643 |                      877 |                     1608 |                 0 |                 0 |                   0 |                   0 |                   0 |                   1 |                   0 |                   1 |
|  27859 |     12340 |       1065 |       1643 |                      365 |                      365 |                 0 |                 1 |                   0 |                   1 |                   0 |                   1 |                   0 |                   1 |
|  27859 |     12345 |       1065 |       1643 |                        0 |                      147 |                 0 |                 1 |                   0 |                   1 |                   0 |                   1 |                   0 |                   1 |
|  27859 |     12346 |       1065 |       1643 |                        0 |                      424 |                 0 |                 1 |                   0 |                   1 |                   0 |                   1 |                   0 |                   1 |
|  27859 |     12347 |       1065 |       1643 |                      147 |                      364 |                 0 |                 1 |                   0 |                   1 |                   0 |                   1 |                   0 |                   1 |
|  27859 |     12348 |       1065 |       1643 |                      364 |                      364 |                 0 |                 1 |                   0 |                   1 |                   0 |                   1 |                   0 |                   1 |
|  27859 |     12349 |       1065 |       1643 |                      395 |                      395 |                 0 |                 1 |                   0 |                   1 |                   0 |                   1 |                   0 |                   1 |
|  27897 |       179 |       1065 |       1643 |                      366 |                      366 |                 0 |                 0 |                   0 |                   0 |                   0 |                   0 |                   0 |                   0 |
+--------+-----------+------------+------------+--------------------------+--------------------------+-------------------+-------------------+---------------------+---------------------+---------------------+---------------------+---------------------+---------------------+

以下是我最终得到的结果。

<无用的解释>

我一直面临的问题是:

  • 如何处理任何和所有日期范围的重叠,并仅确定合同日期范围内的天数
  • 客户端仍然使用SQL 2008,所以我需要一些老式的tsql
  • 确保准确计算中断时间(合同之间的时间(

所以我选择了自己的解决方案,这可能是愚蠢的,因为它需要在内存中为每个Workday/Candidate组合生成一个记录。我看不出合同表超出了5-10k的记录范围。我朝这个方向走的唯一原因。

我创建了一个日历表,其中包含1980年1月1日至2050年12月31日的每个日期然后我根据CandidateId的日历表加入了合同范围。这些将是工作日期。日历表中与合同范围内的日期不匹配的任何日期都是休息日。

<无用的解释>

日历表

if object_id('CalendarTable') is not null drop table CalendarTable
go
create table CalendarTable (pk int identity, CalendarDate date )
declare @StartDate date = cast('1980-01-01' as date)
declare @EndDate date = cast('2050-12-31' as date)
while @StartDate <= @EndDate
begin
insert into CalendarTable ( CalendarDate ) values ( @StartDate )
set @StartDate = dateadd(dd, 1, @StartDate)
end
go

查询5年违规(工作5年,没有6个月的冷静期(

declare @enddate date = dateadd(dd, 30, getdate()) 
declare @beginDate date = dateadd(dd, -180, dateadd(year, -5, getdate()))
select poss.CandidateId, 
min(work.CalendarDate) as FirstWorkDate, 
count(work.CandidateId) as workedDays, 
sum(case when work.CandidateId is null then 1 else 0 end) as breakDays, 
case when count(work.CandidateId) > (365*5) and sum(case when work.CandidateId is null then 1 else 0 end) < (365/2) then 1 else 0 end as Year5Violation,
case when count(work.CandidateId) > (365*5) and sum(case when work.CandidateId is null then 1 else 0 end) < (365/2) then DATEADD(year, 5, min(work.CalendarDate)) else null end as ViolationDate
from 
(
select cand.CandidateId, cal.CalendarDate
from CalendarTable cal
join (select distinct c.CandidateId from contracts c where c.WorkPeriodStart is not null and c.WorkPeriodEnd is not null and c.Deleted = 0) cand on 1 = 1
where cal.CalendarDate between @beginDate and @enddate
) as poss 
left join 
(
select distinct c.CandidateId, cal.CalendarDate
from contracts c
join CalendarTable cal on cal.CalendarDate between c.WorkPeriodStart and c.WorkPeriodEnd 
where c.WorkPeriodStart is not null and c.WorkPeriodEnd is not null and c.Deleted = 0
) as work on work.CandidateId = poss.CandidateId and work.CalendarDate = poss.CalendarDate
group by poss.CandidateId

相关内容

  • 没有找到相关文章

最新更新