我很好奇在单个查询中使用两次 COUNT(column_name( 的性能。以下是有问题的查询:
SELECT
employee_name,
COUNT(employee_name)
FROM
employee
GROUP BY
employee_name
HAVING
COUNT(employee_name) > 1;
将
COUNT(employee_name)
被处决两次?此外,当我将来有这样的问题时,我如何才能自己检查幕后发生的事情的表现?
谢谢!
可以使用优化程序跟踪来获取有关优化程序如何执行查询以及原因的更多信息。 对于这种特殊情况,跟踪不会显式告知计算计数的次数,但我们可以获取有关用于执行聚合的临时表的信息:
mysql> SET optimizer_trace='enabled=on';
Query OK, 0 rows affected (0,00 sec)
mysql> SELECT c2, COUNT(c2) FROM temp GROUP BY c2 HAVING COUNT(c2) > 1;
+------+-----------+
| c2 | COUNT(c2) |
+------+-----------+
| 1 | 2 |
| 2 | 2 |
+------+-----------+
2 rows in set (0,00 sec)
mysql> SELECT trace->'$.steps[*].join_execution.steps[*].creating_tmp_table'
-> FROM information_schema.optimizer_trace;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| trace->'$.steps[*].join_execution.steps[*].creating_tmp_table' |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| [{"tmp_table_info": {"table": "intermediate_tmp_table", "location": "memory (heap)", "key_length": 5, "row_length": 23, "unique_constraint": false, "row_limit_estimate": 729444}}] |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0,01 sec)
mysql> SELECT c2, COUNT(c2) AS c FROM temp GROUP BY c2 HAVING c > 1;
+------+---+
| c2 | c |
+------+---+
| 1 | 2 |
| 2 | 2 |
+------+---+
2 rows in set (0,00 sec)
mysql> SELECT trace->'$.steps[*].join_execution.steps[*].creating_tmp_table' -> FROM information_schema.optimizer_trace;
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| trace->'$.steps[*].join_execution.steps[*].creating_tmp_table' |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| [{"tmp_table_info": {"table": "intermediate_tmp_table", "location": "memory (heap)", "key_length": 5, "row_length": 14, "unique_constraint": false, "row_limit_estimate": 1198372}}] |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0,00 sec)
对于上述内容,我们看到当使用别名而不是重复 COUNT 表达式时,临时表的行大小较小(14 与 23 字节(。 这表示对于查询,计数在聚合期间执行两次。
选择任何方便的表格并执行以下操作:
mysql> SELECT RAND() AS r FROM canada HAVING r < 0.1 limit 11;
+-----------------------+
| r |
+-----------------------+
| 0.6982369559800596 |
| 0.33121224616767114 |
| 0.3811396559524719 |
| 0.4718028721136999 |
另请参阅:
将"rand(("与"have"一起使用
在 ORDER BY 子句中使用聚合函数和聚合函数别名是否存在与性能相关的差异?
我认为还有其他涉及非兰德案件的讨论。
原始问题使用COUNT(employee_name)
,在两种情况下提供相同的值。 因此,您无法真正判断它是否被"评估"了两次。 通过使用RAND()
,很明显它被重新评估了。