GROUP BY -我如何优化这个MySQL查询?


SELECT results.idpatient AS nhs_number, TIMESTAMPDIFF(YEAR, STR_TO_DATE(results.dob,'%Y-%m-%d'), CURDATE()) AS age, most_overdue.days_overdue,
most_overdue.current_status, most_overdue.action, indications.associated_indications
FROM mytable AS results
INNER JOIN (
SELECT
idpatient,
MAX(days_overdue) as days_overdue,
SUBSTRING(MAX(CONCAT(LPAD(days_overdue, 11, '0'), current_status)), 12) AS current_status,
SUBSTRING(MAX(CONCAT(LPAD(days_overdue, 11, '0'), action)), 12) AS action
FROM
mytable
GROUP BY
idpatient
) AS most_overdue
ON results.idpatient = most_overdue.idpatient
INNER JOIN (
SELECT idpatient, action, GROUP_CONCAT(indication SEPARATOR ',') as associated_indications
FROM mytable
GROUP BY idpatient, action
) AS indications
ON results.idpatient = indications.idpatient AND most_overdue.action = indications.action
GROUP BY results.idpatient
LIMIT 0,100;

上面的查询在我的大MySQL数据库上运行需要~4s,问题似乎是GROUP BYs。

在这里看到一个简化的小提琴

我读了MySQL关于优化groupby的页面,但这似乎没有涵盖它与GROUP_CONCAT的结合使用。

模式如下:

CREATE TABLE `mytable` (
`idpatient` varchar(32) NOT NULL,
`indication` varchar(255) NOT NULL,
`action` varchar(255) NOT NULL,
`current_status` varchar(255) NOT NULL,
`query_type` varchar(255) NOT NULL,
`last_date` varchar(255) NOT NULL,
`days_overdue` bigint(20) DEFAULT NULL,
`dob` varchar(255) NOT NULL,
PRIMARY KEY (`idpatient`,`indication`,`action`),
KEY `action_idx` (`action`),
KEY `indication_idx` (`indication`),
KEY `idpatient_action_idx` (`idpatient`,`action`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

EXPLAIN告诉我在运行慢速查询时正在使用indication_idx和idpatient_action_idx。

任何帮助,使这个低于15将非常感激,谢谢!

我猜,在没有任何证据支持我的猜测的情况下,您希望结果集显示100个最低的idpatient值。

您的查询由两个子查询和一个主查询组成。让我们从第一个子查询开始.

SELECT 
idpatient,
MAX(days_overdue) as days_overdue,
SUBSTRING(MAX(CONCAT(LPAD(days_overdue, 11, '0'), current_status)), 12) AS current_status,
SUBSTRING(MAX(CONCAT(LPAD(days_overdue, 11, '0'), action)), 12) AS action
FROM    mytable
GROUP BY idpatient
ORDER BY idpatient
LIMIT    0, 100;

注意,我在这个子查询中包含了你的LIMIT子句。

此子查询将受益于以下覆盖索引。

ALTER TABLE mytable 
ADD INDEX patient_days_status_activity
(idpatient, days_overdue, current_status, action);

索引的前两个列支持GROUP BY操作。如果你的表小于100k行,你可以从索引中省略最后两个列;它们只会使current_statusaction的查找速度快一点。

索引就绪后,检查该子查询的性能;看看EXPLAIN输出。说服自己这已经足够快了。

现在对于第二个子查询。您的idpatient_action_idx索引有助于此查询。你应该加上ORDER BY/LIMIT条款。您还可以将indication添加到该索引中,使其成为覆盖索引。
ALTER TABLE mytable
DROP INDEX idpatient_action_idx,
ADD INDEX idpatient_action_idx (idpatient, action, indication);

如果您愿意,可以更改索引。然后检查该子查询的性能。如果两个子查询都具有可接受的性能,那么主查询也可能具有可接受的性能。

这个查询是否给出正确的结果(我假设它仍然很慢)?

SELECT results.idpatient AS nhs_number
, most_overdue.days_overdue
, most_overdue.current_status
, most_overdue.action
, indications.associated_indications
FROM (
SELECT distinct idpatient
FROM mytable
) AS results
INNER JOIN (
SELECT
idpatient,
MAX(days_overdue) as days_overdue,
SUBSTRING(MAX(CONCAT(LPAD(days_overdue, 11, '0'), current_status)), 12) AS current_status,
SUBSTRING(MAX(CONCAT(LPAD(days_overdue, 11, '0'), action)), 12) AS action
FROM
mytable
GROUP BY
idpatient
) AS most_overdue
ON results.idpatient = most_overdue.idpatient
INNER JOIN (
SELECT idpatient, action, GROUP_CONCAT(indication SEPARATOR ',') as associated_indications
FROM mytable
GROUP BY idpatient, action
) AS indications
ON results.idpatient = indications.idpatient AND most_overdue.action = indications.action;

这可能是一种更简洁的方法来派生与大多数days_expired相关联的current_status/action:

SELECT results.idpatient AS nhs_number
, (most_overdue.days_overdue)
, (results.current_status)
, results.action
, indications.associated_indications
FROM mytable AS results
JOIN (
SELECT 
idpatient,
MAX(days_overdue) as days_overdue
FROM mytable
GROUP BY idpatient
) AS most_overdue
ON results.idpatient = most_overdue.idpatient
AND results.days_overdue = most_overdue.days_overdue
JOIN (
SELECT idpatient, action, GROUP_CONCAT(indication SEPARATOR ',') as associated_indications
FROM mytable
GROUP BY idpatient, action
) AS indications
ON results.idpatient = indications.idpatient 
AND results.action = indications.action
-- GROUP BY results.idpatient, results.action     
ORDER BY results.idpatient;

我不禁想到应该可以删除至少一个对mytable的访问权限,但现在我不知道如何

最新更新