我目前正在尝试找到在包含近 600M 条记录的表中查找每个组的最后一条记录的最有效方法。
我发现的最快方法是使用子查询,但省略其中的 FROM 子句:
```
EXPLAIN
SELECT customer, server, disk
FROM t1
WHERE timestamp = (SELECT MAX(timestamp))
GROUP BY customer, server, disk;
```
但是EXPLAIN
提供了有关子查询的No tables used
:
```
+------+--------------------+-------+------+---------------+------+---------+------+-----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+--------------------+-------+------+---------------+------+---------+------+-----------+----------------------------------------------+
| 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 185093129 | Using where; Using temporary; Using filesort |
| 2 | DEPENDENT SUBQUERY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
+------+--------------------+-------+------+---------------+------+---------+------+-----------+----------------------------------------------+
```
唯一的其他选择是将JOINS
或子查询与FROM
子句一起使用;但两者似乎都执行了两次整个表扫描。我在这里描述的方法有什么问题吗?
(SELECT MAX(timestamp))
是一个没有表的子查询。
你可以试试这样的想法
SELECT @var:= MAX(timestamp) from t1;
SELECT customer, server, disk
FROM t1
WHERE timestamp = @var
GROUP BY customer, server, disk;
或(较慢)
SELECT customer, server, disk
FROM t1
WHERE timestamp = (SELECT MAX(timestamp) from t1)
GROUP BY customer, server, disk;