MySQL 查询优化:使用计数子查询进行选择非常慢

我有以下表格：

mysql> show create table rsspodcastitems G
*************************** 1. row ***************************
Table: rsspodcastitems
Create Table: CREATE TABLE `rsspodcastitems` (
`id` char(20) NOT NULL,
`description` mediumtext,
`duration` int(11) default NULL,
`enclosure` mediumtext NOT NULL,
`guid` varchar(300) NOT NULL,
`indexed` datetime NOT NULL,
`published` datetime default NULL,
`subtitle` varchar(255) default NULL,
`summary` mediumtext,
`title` varchar(255) NOT NULL,
`podcast_id` char(20) NOT NULL,
PRIMARY KEY  (`id`),
UNIQUE KEY `podcast_id` (`podcast_id`,`guid`),
UNIQUE KEY `UKfb6nlyxvxf3i2ibwd8jx6k025` (`podcast_id`,`guid`),
KEY `IDXkcqf7wi47t3epqxlh34538k7c` (`indexed`),
KEY `IDXt2ofice5w51uun6w80g8ou7hc` (`podcast_id`,`published`),
KEY `IDXfb6nlyxvxf3i2ibwd8jx6k025` (`podcast_id`,`guid`),
KEY `published` (`published`),
FULLTEXT KEY `title` (`title`),
FULLTEXT KEY `summary` (`summary`),
FULLTEXT KEY `subtitle` (`subtitle`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
1 row in set (0.00 sec)
mysql> show create table station_cache G
*************************** 1. row ***************************
Table: station_cache
Create Table: CREATE TABLE `station_cache` (
`Station_id` char(36) NOT NULL,
`item_id` char(20) NOT NULL,
`item_type` int(11) NOT NULL,
`podcast_id` char(20) NOT NULL,
`published` datetime NOT NULL,
KEY `Station_id` (`Station_id`,`published`),
KEY `IDX12n81jv8irarbtp8h2hl6k4q3` (`Station_id`,`published`),
KEY `item_id` (`item_id`,`item_type`),
KEY `IDXqw9yqpavo9fcduereqqij4c80` (`item_id`,`item_type`),
KEY `podcast_id` (`podcast_id`,`published`),
KEY `IDXkp2ehbpmu41u1vhwt7qdl2fuf` (`podcast_id`,`published`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

第二个表的"item_id"列是指前者的"id"列(两者之间没有外键，因为关系是多态的，即第二个表可能引用了不在第一个表中的实体，而是在其他相似但不同的表中)。

我正在尝试获取一个查询，该查询列出了第一个表中没有任何相应项目的最新项目。到目前为止，我发现的性能最高的查询是：

select i.*, 
(select count(station_id) 
from station_cache 
where item_id =  i.id) as stations 
from rsspodcastitems i 
having stations = 0 
order by published desc

我还考虑过使用where not exists (...)子查询来执行限制，但这实际上比我上面的子查询慢。但这仍然需要相当长的时间才能完成。 MySQL的查询计划似乎没有使用可用的索引：

+----+--------------------+---------------+------+---------------+------+---------+------+--------+----------------+
| id | select_type        | table         | type | possible_keys | key  | key_len | ref  | rows   | Extra          |
+----+--------------------+---------------+------+---------------+------+---------+------+--------+----------------+
|  1 | PRIMARY            | i             | ALL  | NULL          | NULL | NULL    | NULL | 106978 | Using filesort |
|  2 | DEPENDENT SUBQUERY | station_cache | ALL  | NULL          | NULL | NULL    | NULL |  44227 | Using where    |
+----+--------------------+---------------+------+---------------+------+---------+------+--------+----------------+

请注意，查询的两部分都没有使用key，而它应该能够使用主表中的KEY published (published)和子查询的KEY item_id (item_id,item_type)。

任何建议如何在不等待几分钟的情况下获得适当的结果？

我希望最快的查询是：

select i.*
from rsspodcastitems i 
where not exists (select 1
from station_cache sc
where sc.item_id = i.id
)
order by published desc;

这将利用station_cache(item_id)索引，也许rsspodcastitems(published, id).

如果查询返回大量行，则查询可能会更快。查询的措辞允许rsspodcastitems(published)上的索引避免文件排序。如果删除group by，exists版本应该更快。

我应该指出，我喜欢你对having条款的使用。过去遇到这种情况时，我使用了一个子查询：

select i.*, 
(select count(station_id) 
from station_cache 
where item_id =  i.id) as stations 
from (select i.*
from rsspodcastitems i 
order by published desc
) i
where not exists (select 1
from station_cache sc
where sc.item_id = i.id
);

这允许对一个索引进行排序。

我更喜欢您的方法略有变化：

select i.*, 
(exists (select 1 
from station_cache sc
where sc.item_id = i.id
)
) as has_station 
from rsspodcastitems i 
having has_station = 0 
order by published desc;

这应该比带有count()的版本略快。

您可能希望检测并从表中删除冗余索引。查看两个表的 CREATE TABLE 信息可帮助您发现多个表，包括podcast_id、guid和Station_id、published、item_id、item_type和podcast_id，published可能还有更多。

我最终的解决方案是删除全文索引并使用外部生成的索引表(通过迭代文本中的单词、过滤停用词和应用词干算法生成)来允许搜索。我不知道为什么全文索引会导致性能问题，但它们似乎会减慢触及表的每个查询的速度，即使它们没有被使用。

相关内容

最新更新

热门标签：