以下SQL Server查询:
SELECT DISTINCT NodeID, dateadd(hour, datediff(hour, 0, Timestamp), 0) as dt_hour
,AVG(Availability)
OVER (PARTITION BY NodeID, dateadd(hour, datediff(hour, 0, Timestamp), 0)) AS avg
,PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY Availability)
OVER (PARTITION BY NodeID, dateadd(hour, datediff(hour, 0, Timestamp), 0)) AS perc90
,MAX(Availability)
OVER (PARTITION BY NodeID, dateadd(hour, datediff(hour, 0, Timestamp), 0)) AS max
FROM InterfaceAvailability_CS_Detail_hist
order by NodeID, dt_hour;
运行时间大约是这个的3倍:
SELECT DISTINCT NodeID, dateadd(hour, datediff(hour, 0, Timestamp), 0) as dt_hour
,PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY Availability)
OVER (PARTITION BY NodeID, dateadd(hour, datediff(hour, 0, Timestamp), 0)) AS perc90
FROM InterfaceAvailability_CS_Detail_hist
order by NodeID, dt_hour;
我怀疑它对每个聚合(AVG
、PERCENTILE_CONT
、MAX
(都进行了冗余分区。有没有一种方法可以只写一次OVER
子句并将其应用于每个聚合?
如果没有看到查询计划,很难判断发生了什么,但我的猜测是,通过在parititonby子句中使用标量函数,您限制了sql server优化查询的能力。
我尝试了一个类似的查询,通过使用CTE为dt_hour列生成值,我能够生成一个具有较少排序、嵌套循环和流聚合操作的计划。由于我不知道您原始查询中这些操作的成本,我不知道这是否会产生更好的结果,因为在计划中看到更少的东西并不一定意味着改进。
WITH processed AS (
SELECT *, dateadd(hour, datediff(hour, 0, Timestamp), 0) as dt_hour
FROM InterfaceAvailability_CS_Detail_hist
)
SELECT DISTINCT NodeID, dt_hour
,AVG(Availability)
OVER (PARTITION BY NodeID, dt_hour) AS avg
,PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY Availability)
OVER (PARTITION BY NodeID, dt_hour) AS perc90
,MAX(Availability)
OVER (PARTITION BY NodeID, dt_hour) AS max
FROM processed
order by NodeID, dt_hour;