查询单独运行时速度快,但在连接中添加时速度慢



在SQL Server中,这个查询运行得非常快,不到一秒:

SELECT T1.id
FROM first AS T1
WHERE T1.id = 21

这个查询也非常快,不到1秒,即使它有5300万条记录,但id 21只有大约6条记录:

SELECT TOP 1 T2.value
FROM second AS T2 WITH(INDEX(IX_second))
WHERE T2.id = 21 
AND T2.b = 1 
AND T2.c = 0 
AND T2.d = 0 
AND T2.e = 0
ORDER BY T2.id, T2.b, T2.c, T2.d, T2.e, T2.timestamp DESC

然而,在这个查询中,我将内部SELECT中的21替换为T1。Id,非常非常慢,超过80秒:

SELECT T1.id, T3.value
FROM first AS T1
JOIN second AS T3 ON T3.id IN (SELECT TOP 1 T2.id
FROM second AS T2 WITH(INDEX(IX_second))
WHERE T2.id = T1.id 
AND T2.b = 1 
AND T2.c = 0 
AND T2.d = 0 
AND T2.e = 0
ORDER BY T2.id, T2.b, T2.c, T2.d, T2.e, T2.timestamp DESC)
WHERE T1.id = 21

为什么这个查询需要这么长时间,我如何使它更快?


编辑:下面是计划,更改了一些表和字段名以保护无辜:)brentozar.com/pastetheplan/?id=rJYBSfwws

这看起来像是可以使用CROSS APPLY的情况。这允许TOP 1ORDER BY,但会避免对second表的双重引用。

试题:

SELECT T1.id, T3.value
FROM first AS T1
CROSS APPLY (
SELECT TOP 1 T2.*
FROM second AS T2 --WITH(INDEX(IX_second))
WHERE T2.id = T1.id 
AND T2.b = 1 
AND T2.c = 0 
AND T2.d = 0 
AND T2.e = 0
ORDER BY T2.id, T2.b, T2.c, T2.d, T2.e, T2.timestamp DESC
) T3
WHERE T1.id = 21

如果IX_secondsecond(id)上的索引,那么SQL server查询优化器很可能会选择该索引,而不需要索引提示。

只是检查一下:你确定你指的是T2.id = T1.id而不是T2.first_id = T1.id吗?

还有一个注意事项:由于T2.id, T2.b, T2.c, T2.d, T2.e将在CROSS APPLY结果中全部修复,您可能会从ORDER BY子句中删除它们。

附录:根据您发布的执行计划中的查询,上述内容相当于:

SELECT *
FROM Event_Item AS ei
CROSS APPLY (
SELECT TOP 1 eisp1.*
FROM Event_Item_Spread AS eisp1
-- WITH(INDEX(IX_Event_Item_Spread__event_item__sportsbook__period__ingame__alt__timestamp_desc))
WHERE eisp1.event_item_id = ei.id
AND eisp1.sportsbook_id = 1
AND eisp1.period = 0
AND eisp1.in_game = 0
AND eisp1.alt = 0
ORDER BY eisp1.timestamp DESC
) AS eisp
WHERE ei.id = 39604314

这一切都取决于DBMS计算数据的方式。但在这种情况下,53M将被处理(T1 X T3)次。如果索引不是用值(b,c,d,e…)构建的,那么每次都要进行排序,反向,查询和可能与源表的一些连接。

我不明白你的问题的逻辑。为什么不只用一个连接呢?
SELECT T1.id, T3.value
FROM first AS T1
JOIN second AS T2 ON T1.id = T2.id 
WHERE T1.id = 21 
AND T2.b = 1 AND T2.c = 0 AND T2.d = 0 AND T2.e = 0

这看起来与上面的结果相同。

IN()语句中使用子查询在语法上是可以的,但我不建议这样做,因为它们是管理表之间连接的缓慢方式。

根据其他人的建议,使用临时表。在我看来,即使是CTE()也至少是代码的一个更清晰的版本,并且更容易理解。你还需要研究你的执行计划。

我还没有测试过这个,但试着这样做:

SELECT TOP 1 T2.id
INTO #tblTEMP1
FROM second AS T2 
WHERE T2.b = 1 
AND T2.c = 0 
AND T2.d = 0 
AND T2.e = 0
ORDER BY T2.id, T2.timestamp DESC
..
..
SELECT T1.id, T3.value
FROM first AS T1
INNER JOIN #tblTEMP1 AS t ON t.ID = t1.ID
INNER JOIN second AS T3 ON T3.id = t.ID
WHERE T1.id = 21

我打赌这会更快,但可能没有你期望的那么快。

同样,您需要研究每种情况下的执行计划,确定确切的瓶颈,并在必要的地方放置INDEXes

最新更新