如何优化此工作TAT查询



我的任务是创建一个不包括周末和节假日的TAT报告。这篇文章的目的是确定我所做的解决方案是否可以通过改进或使用另一种替代方案来更好地优化。

当我最初开始执行这项任务时,我想加入一个日期表,但这似乎有问题,因为如果我有一个这样的数据集:

create table #example (ID varchar(50), startdate datetime2, enddate datetime2)
insert into #example
values
('4CF662A3-57EE-446B-9FC3-3F105219F253',      '2021-05-07 21:45:00.000',              '2021-05-15 08:46:32.000')
,('8B58294D-C20E-4E87-8B9C-186EFB3DCF19',      '2021-04-29 08:52:00.000',              '2021-05-06 20:02:58.000')
,('6177A53C-B285-4087-9F03-BF3324992DEE',      '2021-05-20 10:47:00.000',              '2021-05-27 22:05:30.000')
,('26E3A73E-9D5A-472C-890A-57D51B038C79',      '2021-05-04 20:24:00.000',              '2021-05-12 09:11:47.000')
,('ECC7DB03-A2D9-4D03-AC8D-47B5204C5512',      '2021-05-13 20:24:00.000',              '2021-05-21 10:29:03.000')
,('0A673471-CA22-4AA5-8024-EEF7CEBA8A0D',      '2021-05-10 20:24:00.000',              '2021-05-18 11:03:41.000')
,('27ADB272-A9E0-42ED-AE3C-E42F051C60DF',      '2021-05-03 20:06:00.000',              '2021-05-11 10:59:29.000')
,('5B9D97EC-A994-45E3-BB18-E583C9702389',      '2021-05-11 18:39:00.000',              '2021-05-19 09:56:18.000')
,('DE88A793-4F1C-479D-BB28-4245D9D30606',      '2021-05-10 20:07:00.000',              '2021-05-18 11:48:17.000')
,('42962CBB-79AF-44CB-8EB9-565A75EEA565',      '2021-04-26 21:35:00.000',              '2021-05-04 13:16:25.000')
,('62A980BF-D6B5-47F3-A671-D7F3DE046747',      '2021-05-20 18:08:00.000',              '2021-05-28 10:06:56.000')
,('E1D89979-7D77-4D63-983E-8127A1CB81FA',      '2021-05-18 20:28:00.000',              '2021-05-26 12:27:32.000')
,('D3D923AC-21E0-401E-A9DB-7BE1BE87A6B1',      '2021-05-13 19:56:00.000',              '2021-05-21 11:55:22.000')
,('364F5BF9-55A3-430E-9F41-F3EB41D30633',      '2021-05-13 19:58:00.000',              '2021-05-21 12:09:43.000')
select * from #example drop table #example

然后加入日期表只会告诉我加入的日期是工作日、假期等。

(注意:我看到一些尝试通过加入日期表来创建业务TAT报告,但它们似乎没有预期的那么准确,或者存在奇怪的问题。出于这些原因,我自己没有深入探讨这个选项。(

接下来,我使用循环研究了一个标量UDF。我的灵感主要来自这篇文章:https://anyonconsulting.com/business_intelligence/how-to-count-the-number-of-business-days-between-two-dates-which-exclude-weekends-and-holidays/.这就是我最终开发的查询:

declare @DateFrom as datetime2 = '2021-05-17 19:06:00.000';
declare @DateTo as datetime2 = '2021-05-31 17:59:51.000';
-- Switches things around
if @DateFrom > @DateTo
begin
declare
@T datetime2 = @DateTo
, @F datetime2 = @DateFrom
select @DateFrom = @T
, @DateTo = @F
end
declare @temptable table (
CalendarDate datetime2 not null primary key
, DayID as (datepart(weekday,[CalendarDate]))
, isHoliday as 
(case  
-- New Year's Day, Observed 1st Day of Year
when month([CalendarDate]) = 1  and day([CalendarDate]) = 1 and datepart(weekday,[CalendarDate]) in (2,3,4,5,6) then 1 --New Year’s Day
when month([CalendarDate]) = 12  and day([CalendarDate]) = 31 and datepart(weekday,[CalendarDate]) =6 then 1 --New Year’s Day
when month([CalendarDate]) = 1  and day([CalendarDate]) = 2 and datepart(weekday,[CalendarDate]) =2 then 1 --New Year’s Day
-- Martin Luther King Jr. Day, Observed 3rd Monday of January
when month([CalendarDate]) = 1  and day([CalendarDate]) >= 15 and day([CalendarDate]) <= 21 and datepart(weekday,[CalendarDate]) = 2 then 1
-- Presidents' Day, Officially Washington’s Birthday, Observed 3rd Monday of February
when month([CalendarDate]) = 2  and day([CalendarDate]) >= 15 and day([CalendarDate]) <= 21 and datepart(weekday,[CalendarDate]) = 2 then 1 
-- Memorial Day, Observed Last Monday of May
when month([CalendarDate]) = 5  and day([CalendarDate]) >= 25 and datepart(weekday,[CalendarDate]) = 2 then 1
-- Independence Day, July 4th
when month([CalendarDate]) = 7  and day([CalendarDate]) = 4 and datepart(weekday,[CalendarDate]) IN (2,3,4,5,6) then 1
when month([CalendarDate]) = 7  and day([CalendarDate]) = 3 and datepart(weekday,[CalendarDate]) =6 then 1
when month([CalendarDate]) = 7  and day([CalendarDate]) = 5 and datepart(weekday,[CalendarDate]) =2 then 1
-- Labor Day, Observed First Monday of September
when month([CalendarDate]) = 9  and day([CalendarDate]) <= 7 and datepart(weekday,[CalendarDate]) = 2 then 1
-- Thanksgiving Day, Observed Fourth Thursday of November
when month([CalendarDate]) = 11 and day([CalendarDate]) >= 22 and day([CalendarDate]) <= 28 AND datepart(weekday,[CalendarDate]) = 5 then 1
-- Christmas Day, Observed 25th of December
when month([CalendarDate]) = 12 and day([CalendarDate]) = 25 and datepart(weekday,[CalendarDate]) IN (2,3,4,5,6) then 1
when month([CalendarDate]) = 12 and day([CalendarDate]) = 24 and datepart(weekday,[CalendarDate]) =6 then 1
when month([CalendarDate]) = 12 and day([CalendarDate]) = 26 and datepart(weekday,[CalendarDate]) =2 then 1
else 0
end))
-- Fill the Calendar table
declare @D datetime2 = @DateFrom
while @D <= @DateTo
begin
insert into @temptable (CalendarDate)
select @D
set @D = dateadd(hour, 1, @D)
end
declare
@Count as int = 0
, @Date as datetime2 = @DateFrom
while 
@Date <= @DateTo
begin
if ((datepart(weekday, @Date) in (1, 7))
or exists
(select *
from @temptable
where CalendarDate = @Date
and isHoliday = 1
and (DayID <> 7 or DayID <> 1)))
begin
select @Count = @Count + 1
end
select @Date = dateadd(hour, 1, @Date)
end
select convert(decimal(38,2) , (datediff(hour, @DateFrom, @DateTo) - (@Count)) / 24.0) as [TAT]

当这个脚本被放入一个函数中时,它非常准确,但性能很慢。在43057行的测试集上,它花了1分19秒(让它表现良好对我来说很重要,因为查询已经花了足够长的时间(。由于表现不佳,我开始考虑其他方法。我做了一段时间,并逐渐意识到这一点:https://www.sqlservercentral.com/articles/calculating-work-days.然而,不幸的是,它对我来说不够准确,因为我想要一个十进制的TAT(所以我想要4.68的TAT,而不是5(。

受此启发,我开始思考某种基于集合的函数https://www.sqlservercentral.com/articles/the-joy-of-numbers以及Itzik Ben Gan的T-SQL查询,关于填充数字表的第216-218页。经过一些实验,我得到了这个查询:

declare 
@TotalTimeDiff int
, @low int
, @DateFrom datetime2
, @DateTo datetime2;
set @DateFrom = '2021-05-17 19:06:00.000';
set @DateTo = '2021-05-31 17:59:51.000';
set @low = 1;
set @TotalTimeDiff = (datediff(hour, @DateFrom, @DateTo))
;with 
L0 as (select C from (values(1),(1)) as D(c)),
L1 as (select 1 as c from L0 as A cross join L0 as B),
L2 as (select 1 as c from L1 as A cross join L1 as B),
L3 as (select 1 as c from L2 as A cross join L2 as B),
L4 as (select 1 as c from L3 as A cross join L3 as B),
L5 as (select 1 as c from L4 as A cross join L4 as B), -- Up to 4,294,967,296 rows

Nums as (select ROW_NUMBER() over(order by (select null)) as rownum from L5),
Times as (select top (@TotalTimeDiff) @low + rownum - 1 as n from Nums)
select
[TAT] = isnull(((count(BusinessHours.n) - sum(BusinessHours.NonBusinessHours)) / 24.0), datediff(minute, @DateFrom, @DateTo) / 1440.0)
from (
select tms.n
, [NonBusinessHours] = 
case 
when datename(dw, dateadd(hour, tms.n, @DateFrom)) in ('Sunday', 'Saturday') then 1

-- New Year's Day, Observed 1st Day of Year
when month(dateadd(hour, tms.n, @DateFrom)) = 1  and day(dateadd(hour, tms.n, @DateFrom)) = 1 and datepart(weekday,dateadd(hour, tms.n, @DateFrom)) in (2,3,4,5,6) then 1 --    New Year’s Day
when month(dateadd(hour, tms.n, @DateFrom)) = 12  and day(dateadd(hour, tms.n, @DateFrom)) = 31 and datepart(weekday,dateadd(hour, tms.n, @DateFrom)) =6 then 1 --New Year’s    Day
when month(dateadd(hour, tms.n, @DateFrom)) = 1  and day(dateadd(hour, tms.n, @DateFrom)) = 2 and datepart(weekday,dateadd(hour, tms.n, @DateFrom)) =2 then 1 --New Year’s  Day
-- Martin Luther King Jr. Day, Observed 3rd Monday of January
when month(dateadd(hour, tms.n, @DateFrom)) = 1  and day(dateadd(hour, tms.n, @DateFrom)) >= 15 and day(dateadd(hour, tms.n, @DateFrom)) <= 21 and datepart(weekday,dateadd (hour, tms.n, @DateFrom)) = 2 then 1
-- Presidents' Day, Officially Washington’s Birthday, Observed 3rd Monday of February
when month(dateadd(hour, tms.n, @DateFrom)) = 2  and day(dateadd(hour, tms.n, @DateFrom)) >= 15 and day(dateadd(hour, tms.n, @DateFrom)) <= 21 and datepart(weekday,dateadd (hour, tms.n, @DateFrom)) = 2 then 1 
-- Memorial Day, Observed Last Monday of May
when month(dateadd(hour, tms.n, @DateFrom)) = 5  and day(dateadd(hour, tms.n, @DateFrom)) >= 25 and datepart(weekday,dateadd(hour, tms.n, @DateFrom)) = 2 then 1
-- Independence Day, July 4th
when month(dateadd(hour, tms.n, @DateFrom)) = 7  and day(dateadd(hour, tms.n, @DateFrom)) = 4 and datepart(weekday,dateadd(hour, tms.n, @DateFrom)) IN (2,3,4,5,6) then 1
when month(dateadd(hour, tms.n, @DateFrom)) = 7  and day(dateadd(hour, tms.n, @DateFrom)) = 3 and datepart(weekday,dateadd(hour, tms.n, @DateFrom)) =6 then 1
when month(dateadd(hour, tms.n, @DateFrom)) = 7  and day(dateadd(hour, tms.n, @DateFrom)) = 5 and datepart(weekday,dateadd(hour, tms.n, @DateFrom)) =2 then 1
-- Labor Day, Observed First Monday of September
when month(dateadd(hour, tms.n, @DateFrom)) = 9  and day(dateadd(hour, tms.n, @DateFrom)) <= 7 and datepart(weekday,dateadd(hour, tms.n, @DateFrom)) = 2 then 1 
-- Thanksgiving Day, Observed Fourth Thursday of November
when month(dateadd(hour, tms.n, @DateFrom)) = 11 and day(dateadd(hour, tms.n, @DateFrom)) >= 22 and day(dateadd(hour, tms.n, @DateFrom)) <= 28 AND datepart(weekday,dateadd (hour, tms.n, @DateFrom)) = 5 then 1
-- Christmas Day, Observed 25th of December
when month(dateadd(hour, tms.n, @DateFrom)) = 12 and day(dateadd(hour, tms.n, @DateFrom)) = 25 and datepart(weekday,dateadd(hour, tms.n, @DateFrom)) IN (2,3,4,5,6) then 1
when month(dateadd(hour, tms.n, @DateFrom)) = 12 and day(dateadd(hour, tms.n, @DateFrom)) = 24 and datepart(weekday,dateadd(hour, tms.n, @DateFrom)) =6 then 1
when month(dateadd(hour, tms.n, @DateFrom)) = 12 and day(dateadd(hour, tms.n, @DateFrom)) = 26 and datepart(weekday,dateadd(hour, tms.n, @DateFrom)) =2 then 1
else 0 end
from Times as tms) as [BusinessHours]

该查询在转换为标量UDF后,在17秒内计算出了这43057行的业务TAT,这与循环查询相比有了明显的改进。(顺便说一句,这都是SQL Server 2017。(

首先,我想知道最后一个解决方案是否可以被认为是";基于集合的";解决方案它似乎比明显的过程/迭代循环更基于集合,但它不是根据数据集本身计算TAT,而是根据表中的逐项输入(@DateFrom和@DateTo(计算TAT。我认为;基于集合的";解决方案意味着使用从数据集本身到Date表的复杂联接逻辑。然而,该函数是基于设置的,因为一旦函数本身运行,它就不会经过一个循环,逐个确定每个小时,而是给定一个数字列表,推送相关日期/时间集中的逐项小时,以及每个小时是否为工作小时。

第二,有人知道可以进一步改进的方法吗?还是另一种方法,达到这种准确度(是否将两个日期之间的每一个小时归类为工作小时(?比这更快的东西?

谢谢。如果这是不恰当的发布(它看起来确实比这里想要的有点不具体(,我会很感激重定向,这样我就可以在那里发布这篇文章了。我主要是想众包这个函数,看看它能走多远,并把这个函数放在那里,因为在研究了多个解决方案后,它很有趣。

多亏了@Charlieface,我只使用了一个Date表就提出了一个相当准确的解决方案。由于我只是在生产数据上制定了这个解决方案,所以我不会在这里发布,但这是可能的。话虽如此,我只能在某种程度上用这种解决方案来改善问题。出现的问题是:

  • 如果某件事是在周日完成的呢
  • 我们怎么知道实际工作了几天
  • 他们星期六也工作吗

这个解决方案确实比我上面的第二个解决方案执行得更快(大约快5-7秒(,但是,考虑到一些不准确的地方,我选择了上面的第三个解决方案。希望这篇文章能帮助指导其他希望根据自己的自定义业务逻辑计算业务TAT的人。

最新更新