有没有UNION的替代方案可以减少扫描次数



查看db fiddle。

下表

CREATE TABLE foo (x INTEGER PRIMARY KEY, y INTEGER);
INSERT INTO foo VALUES (0,41), (1, 23), (2,45), (3,32), ...

我需要在10个x的组上具有min(y)xy,对于max(y):也是如此

SELECT x, min(y) FROM foo GROUP BY (x/10)
UNION
SELECT x, max(y) FROM foo GROUP BY (x/10);

EXPLAIN QUERY PLAN输出显示对表格进行了两次扫描

`--COMPOUND QUERY
|--LEFT-MOST SUBQUERY
|  |--SCAN TABLE foo
|  `--USE TEMP B-TREE FOR GROUP BY
`--UNION ALL
|--SCAN TABLE foo
`--USE TEMP B-TREE FOR GROUP BY

是否有任何方法可以重新编写查询,以便只执行一次扫描

同时,我所做的是选择所有行(SELECT x, y FROM foo;(,并在行返回到主机语言时手动聚合min/max:

int lastGroup = 0;
while (sqlite3_step(query) == SQLITE_ROW) {
int x = sqlite3_column_int(query, 0);
int y = sqlite3_column_int(query, 1);
int group = x / 10;
if (group != lastGroup) {
// save minX, minY, maxX, maxY in a list somewhere
// reset minX, minY, maxX, maxY
// ...
lastGroup = group;
}  
else {
if (y < minY) {
minX = x;
minY = y;
}
else if (y > maxY) {
maxX = x;
maxY = y;
}
}
}

这实现了单次扫描,整个过程的速度是原来的两倍多。。。但如果可能的话,我宁愿在SQL中解密地表达这个逻辑。

为什么不只做一个具有更多列的group by

下表

SELECT (x/10) * 10, min(y), max(y)
FROM foo
GROUP BY (x/10)

如果您想要多行,您可以在之后取消透视:

SELECT x, (CASE WHEN x.which = 1 THEN min_y ELSE max_y END) as min_max_y
FROM (SELECT (x/10) * 10 as x, min(y) as min_y, max(y) as max_y
FROM foo
GROUP BY (x/10)
) f CROSS JOIN
(SELECT 1 as which UNION ALL SELECT 2) x;

编辑:

您使用的是SQLite扩展,它与标准或任何其他SQL语言都不一致。写这篇文章的更好方法是使用窗口函数:

select x, y
from (select f.*,
row_number() over (partition by (x/10) order by y asc) as seqnum_asc,
row_number() over (partition by (x/10) order by y desc) as seqnum_desc
from foo f
) f
where 1 in (seqnum_asc, seqnum_desc);

或者,如果您不喜欢子查询,请使用first_value()

select distinct (x/10)*10,  -- this is not necessary but helps to make the purpose clear
first_value(x) over (partition by (x/10) order by y asc) as x_at_min_y,
min(y) over (partition by x/10) as min_y,
first_value(x) over (partition by (x/10) order by y desc) as x_at_max_y,
max(y) over (partition by x/10) as max_y
from foo;

这是一把db小提琴。

如果您愿意,您可以稍后取消预览,如上图所示。

最新更新