使用缺少行的计算优化慢速 sql 查询



我正在处理一个包含预测的SQL查询。通常,在给定的时间段内,每天都应该有预测。但是,有时在某些情况下,给定间隔内特定日期的预测丢失,当发生这种情况时,我想执行计算以根据同一天属于同一区域的现有预测来估计预测。

我已经整理了一个查询,但它真的很慢并且占用了大量内存。谁能帮助我朝着正确的方向前进?

declare @startDt date = :startDate
declare @endDt date = :endDate; 
with  AllDates as
(
   select  @startDt as dt
   union all
   select  dateadd(day, 1, dt)
   from    AllDates
   where   dateadd(day, 1, dt) <= @endDt
)
select  
      dt, 
      m.date, 
      p.lp, 
      p.electricityArea, 
      maxCapacity, 
      sum(hour00_01) b, 
      SUM(maxCapacity) as c, 
      ISNULL( ISNULL(hour00_01, maxCapacity * ( SELECT sum(hour00_01)/sum(maxCapacity) 
                                                   FROM tbl_p p2, 
                                                        tbl_m m2 
                                                   WHERE netArea = p.netArea 
                                                     AND plantType = '2'  
                                                     and date = dt 
                                                     and m2.lp = p2.lp 
                                                     AND (inputType = :forecastType) )), 
                                maxCapacity * ( SELECT sum(hour00_01)/sum(maxCapacity) 
                                                   FROM tbl_p p3, 
                                                        tbl_m m3 
                                                   WHERE electricityArea = p.electricityArea 
                                                     AND plantType = '2'  
                                                     and date = dt 
                                                     and m3.lp = p3.lp 
                                                     AND (inputType = :forecastType))) hour00_01, 
      ISNULL( ISNULL( hour01_02, maxCapacity * ( SELECT sum(hour01_02)/sum(maxCapacity)
                                                    FROM tbl_p p2, 
                                                         tbl_m m2 
                                                    WHERE netArea = p.netArea 
                                                      AND plantType = '2'  
                                                      and date = dt 
                                                      and m2.lp = p2.lp 
                                                      AND (inputType = :forecastType))), 
                                 maxCapacity * ( SELECT sum(hour01_02) / sum(maxCapacity) 
                                                    FROM tbl_p p3, 
                                                         tbl_m m3 
                                                    WHERE electricityArea = p.electricityArea 
                                                      AND plantType = '2'  
                                                      and date = dt 
                                                      and m3.lp = p3.lp 
                                                      AND (inputType = :forecastType))) hour01_02,
      **...[all 24 hours]...**
   from
      AllDates ad
         cross join tbl_p p
         left join tbl_m m
            on p.lp = m.lp
            and m.date = ad.dt 
            and m.inputType = :forecastType
   where 
          p.plantType = '2' 
      AND agreementStart <= :startDate1 
      AND agreementEnd >= :endDate1
   GROUP BY 
      dt, 
      m.date, 
      p.lp, 
      p.electricityArea, 
      maxCapacity, 
      p.netArea, 
      p.electricityArea, 
      hour00_01, hour01_02, hour02_03, hour03_04, hour04_05, hour05_06, 
      hour06_07, hour07_08, hour08_09, hour09_10, hour10_11, hour11_12, 
      hour12_13, hour13_14, hour14_15, hour15_16, hour16_17, hour17_18, 
      hour18_19, hour19_20, hour20_21, hour21_22, hour22_23, hour23_24
   ORDER BY 
      p.lp, 
      dt option  (maxrecursion 0)

知道如何优化吗?

粘贴到注释中的表格结构与原始问题的编辑

tbl_p
COLUMN_NAME        DATA_TYPE   CHARACTER_MAXIMUM_LENGTH   IS_NULLABLE 
plantId            int         NULL                       NO
lp                 nchar       45                         YES
unitId             nchar       45                         YES
plantType          int         NULL                       YES
electricityArea    nchar       45                         YES
netArea            nchar       45                         YES
maxCapacity        int         NULL                       YES
yearlyCapacity     int         NULL                       YES
numberOfPlants     int         NULL                       YES
manufacturer       nchar       45                         YES
groundLevel        nchar       45                         YES
altitudeLevel      nchar       45                         YES
updatedFromIp      nchar       45                         YES
xCoordinates       nchar       45                         YES
yCoordinates       nchar       45                         YES
plantStatus        nchar       10                         YES
agreementStart     datetime    NULL                       YES
agreementEnd       datetime    NULL                       YES 
tbl_m is (with some removed columns to fit it here): 
COLUMN_NAME        DATA_TYPE   CHARACTER_MAXIMUM_LENGTH   IS_NULLABLE 
id                 int         NULL                       NO
lp                 nchar       45                         YES
timeStampReturned  datetime    NULL                       YES
date               date        NULL                       YES
hour00_01          decimal     NULL                       YES
hour01_02          decimal     NULL                       YES
hour02_03          decimal     NULL                       YES
...
hour21_22          decimal     NULL                       YES
hour22_23          decimal     NULL                       YES
inputType          nchar       45                         YES

根据您的执行计划,您有一个简单的性能不佳的原因。当您查看每列hour00_01、hour00_02等的执行计划时,您执行 2 次表扫描 .8% + .1%,而不是 3.1% 的哈希匹配 + 索引假脱机的另外 .1%。因此,您占总执行量 4.1% 的成本重复 24 次,因为它是针对每一列完成的。相反,您应该重构代码以生成一个 CTE、临时表或表变量,该变量对您需要的每一列求和。例如,您的代码而不是单个子查询将是这样的。

SELECT SUM(hour00_01) / SUM(maxCapacity) AS hour00_01
       ,SUM(hour01_02) / SUM(maxCapacity) AS hour01_02
       -- Plus other 22 hours --
    FROM tbl_p p2
    JOIN tbl_m m2
        ON m2.lp = p2.lp
           AND netArea = p.netArea
    WHERE plantType = '2'
        AND date = dt
        AND ( inputType = 'Type' )

SELECT SUM(hour00_01) / SUM(maxCapacity) AS hour00_01
       ,SUM(hour01_02) / SUM(maxCapacity) AS hour01_02
       -- Plus other 22 hours --
    FROM tbl_p p2
    JOIN tbl_m m2
        ON m2.lp = p2.lp
           AND electricityArea = p.electricityArea
    WHERE plantType = '2'
        AND date = dt
        AND ( inputType = 'Type' )

如果您这样做,您将获得所有金额,而不必多次打表。优化查询时,减少表的行程始终很重要。如果您一次完成所有总和,它将与您相同的额外表扫描,并且所有列只有一个哈希连接,而不是每列一个哈希连接

此外,您应该考虑SUM() OVER (PARTITION BY子句,该子句允许您在不增加表行程的情况下进行行内计算。

相关内容

  • 没有找到相关文章

最新更新