选择具有最大时间戳的重复行的最有效方法

假设我有一个名为t的表，它类似于

id  content  time
1     'a'     100
1     'a'     101
1     'b'     102
2     'c'     200
2     'c'     201

id是重复的，对于相同的id，内容也可能是重复的。现在我想为每个id选择具有最大时间戳的行，这将是

id  content  time
1      'b'    102
2      'c'    201

这是我目前的解决方案：

select t1.id, t1.content, t1.time 
from (
select id, content, time from t 
) as t1 
right join (
select id, max(time) as time from t group by id
) as t2 
on t1.id = t2.id and t1.time = t2.time;

但这在我看来效率很低。因为理论上，当执行select id, max(time) as time from t group by id时，我想要的行已经定位好了。right join带来了额外的O(n^2(时间成本，这似乎是不必要的。

那么，有没有更有效的方法来做到这一点，或者我理解错误的地方？

使用DISTINCT ON:

SELECT DISTINCT ON (id) id, content, time
FROM yourTable
ORDER BY id, time DESC;

在Postgres上，这通常是编写查询性能最高的方法，而且它应该优于ROW_NUMBER和其他方法。

以下索引可能会加快此查询的速度：

CREATE INDEX idx ON yourTable (id, time DESC, content);

如果使用该索引，Postgres可以为每个id快速查找具有最新时间的记录。该索引还包括content列。

试试这个

SELECT a.id, a.content, a.time FROM t AS a
INNER JOIN (
SELECT a.content, MAX(a.time) AS time FROM t
GROUP BY a.content
) AS b ON a.content = b.content AND a.time = b.time

相关内容

最新更新

热门标签：