我正在处理一个包含预测的SQL
查询。通常,在给定的时间段内,每天都应该有预测。但是,有时在某些情况下,给定间隔内特定日期的预测丢失,当发生这种情况时,我想执行计算以根据同一天属于同一区域的现有预测来估计预测。
我已经整理了一个查询,但它真的很慢并且占用了大量内存。谁能帮助我朝着正确的方向前进?
declare @startDt date = :startDate
declare @endDt date = :endDate;
with AllDates as
(
select @startDt as dt
union all
select dateadd(day, 1, dt)
from AllDates
where dateadd(day, 1, dt) <= @endDt
)
select
dt,
m.date,
p.lp,
p.electricityArea,
maxCapacity,
sum(hour00_01) b,
SUM(maxCapacity) as c,
ISNULL( ISNULL(hour00_01, maxCapacity * ( SELECT sum(hour00_01)/sum(maxCapacity)
FROM tbl_p p2,
tbl_m m2
WHERE netArea = p.netArea
AND plantType = '2'
and date = dt
and m2.lp = p2.lp
AND (inputType = :forecastType) )),
maxCapacity * ( SELECT sum(hour00_01)/sum(maxCapacity)
FROM tbl_p p3,
tbl_m m3
WHERE electricityArea = p.electricityArea
AND plantType = '2'
and date = dt
and m3.lp = p3.lp
AND (inputType = :forecastType))) hour00_01,
ISNULL( ISNULL( hour01_02, maxCapacity * ( SELECT sum(hour01_02)/sum(maxCapacity)
FROM tbl_p p2,
tbl_m m2
WHERE netArea = p.netArea
AND plantType = '2'
and date = dt
and m2.lp = p2.lp
AND (inputType = :forecastType))),
maxCapacity * ( SELECT sum(hour01_02) / sum(maxCapacity)
FROM tbl_p p3,
tbl_m m3
WHERE electricityArea = p.electricityArea
AND plantType = '2'
and date = dt
and m3.lp = p3.lp
AND (inputType = :forecastType))) hour01_02,
**...[all 24 hours]...**
from
AllDates ad
cross join tbl_p p
left join tbl_m m
on p.lp = m.lp
and m.date = ad.dt
and m.inputType = :forecastType
where
p.plantType = '2'
AND agreementStart <= :startDate1
AND agreementEnd >= :endDate1
GROUP BY
dt,
m.date,
p.lp,
p.electricityArea,
maxCapacity,
p.netArea,
p.electricityArea,
hour00_01, hour01_02, hour02_03, hour03_04, hour04_05, hour05_06,
hour06_07, hour07_08, hour08_09, hour09_10, hour10_11, hour11_12,
hour12_13, hour13_14, hour14_15, hour15_16, hour16_17, hour17_18,
hour18_19, hour19_20, hour20_21, hour21_22, hour22_23, hour23_24
ORDER BY
p.lp,
dt option (maxrecursion 0)
知道如何优化吗?
粘贴到注释中的表格结构与原始问题的编辑
tbl_p
COLUMN_NAME DATA_TYPE CHARACTER_MAXIMUM_LENGTH IS_NULLABLE
plantId int NULL NO
lp nchar 45 YES
unitId nchar 45 YES
plantType int NULL YES
electricityArea nchar 45 YES
netArea nchar 45 YES
maxCapacity int NULL YES
yearlyCapacity int NULL YES
numberOfPlants int NULL YES
manufacturer nchar 45 YES
groundLevel nchar 45 YES
altitudeLevel nchar 45 YES
updatedFromIp nchar 45 YES
xCoordinates nchar 45 YES
yCoordinates nchar 45 YES
plantStatus nchar 10 YES
agreementStart datetime NULL YES
agreementEnd datetime NULL YES
tbl_m is (with some removed columns to fit it here):
COLUMN_NAME DATA_TYPE CHARACTER_MAXIMUM_LENGTH IS_NULLABLE
id int NULL NO
lp nchar 45 YES
timeStampReturned datetime NULL YES
date date NULL YES
hour00_01 decimal NULL YES
hour01_02 decimal NULL YES
hour02_03 decimal NULL YES
...
hour21_22 decimal NULL YES
hour22_23 decimal NULL YES
inputType nchar 45 YES
根据您的执行计划,您有一个简单的性能不佳的原因。当您查看每列hour00_01、hour00_02等的执行计划时,您执行 2 次表扫描 .8% + .1%,而不是 3.1% 的哈希匹配 + 索引假脱机的另外 .1%。因此,您占总执行量 4.1% 的成本重复 24 次,因为它是针对每一列完成的。相反,您应该重构代码以生成一个 CTE、临时表或表变量,该变量对您需要的每一列求和。例如,您的代码而不是单个子查询将是这样的。
SELECT SUM(hour00_01) / SUM(maxCapacity) AS hour00_01
,SUM(hour01_02) / SUM(maxCapacity) AS hour01_02
-- Plus other 22 hours --
FROM tbl_p p2
JOIN tbl_m m2
ON m2.lp = p2.lp
AND netArea = p.netArea
WHERE plantType = '2'
AND date = dt
AND ( inputType = 'Type' )
SELECT SUM(hour00_01) / SUM(maxCapacity) AS hour00_01
,SUM(hour01_02) / SUM(maxCapacity) AS hour01_02
-- Plus other 22 hours --
FROM tbl_p p2
JOIN tbl_m m2
ON m2.lp = p2.lp
AND electricityArea = p.electricityArea
WHERE plantType = '2'
AND date = dt
AND ( inputType = 'Type' )
如果您这样做,您将获得所有金额,而不必多次打表。优化查询时,减少表的行程始终很重要。如果您一次完成所有总和,它将与您相同的额外表扫描,并且所有列只有一个哈希连接,而不是每列一个哈希连接
此外,您应该考虑SUM() OVER (PARTITION BY
子句,该子句允许您在不增加表行程的情况下进行行内计算。