如何使此WHERE NOT EXISTS查询更加高效



首先,表

调查

CREATE TABLE `surveys` (
`survey_id` int(11) NOT NULL AUTO_INCREMENT,
`survey_name` varchar(100) COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`survey_id`)
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

INSERT INTO `surveys` (`survey_id`, `survey_name`) VALUES
(1, 's1'),
(2, 's2');

调查响应

CREATE TABLE `survey_responses` (
`sr_id` int(10) NOT NULL AUTO_INCREMENT,
`sr_text` varchar(100) COLLATE utf8_unicode_ci NOT NULL,
`sr_tq_id` int(10) NOT NULL,
`sr_st_id` int(10) NOT NULL,
`sr_su_uid` int(10) NOT NULL,
PRIMARY KEY (`sr_id`)
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
INSERT INTO `survey_responses` (`sr_id`, `sr_text`, `sr_tq_id`, `sr_st_id`, `sr_su_uid`) VALUES
(1, 'a', 3, 2, 3),
(2, 'b', 4, 2, 3);

Survey_topics

CREATE TABLE `survey_topics` (
`st_id` int(10) NOT NULL AUTO_INCREMENT,
`st_name` varchar(100) COLLATE utf8_unicode_ci NOT NULL,
`st_survey_id` int(10) NOT NULL,
PRIMARY KEY (`st_id`)
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
INSERT INTO `survey_topics` (`st_id`, `st_name`, `st_survey_id`) VALUES
(1, 't1', 1),
(2, 't2', 1),
(3, 't3', 1),
(4, 't4', 2),
(5, 't5', 2),
(6, 't6', 2);

Survey_users

CREATE TABLE `survey_users` (
`su_id` int(10) NOT NULL AUTO_INCREMENT,
`su_s_id` int(10) NOT NULL,
`su_uid` int(10) NOT NULL,
PRIMARY KEY (`su_id`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
INSERT INTO `survey_users` (`su_id`, `su_s_id`, `su_uid`) VALUES
(1, 1, 1),
(2, 1, 2),
(3, 2, 2);

主题问题

CREATE TABLE `topic_questions` (
`tq_id` int(11) NOT NULL AUTO_INCREMENT,
`tq_text` varchar(100) COLLATE utf8_unicode_ci NOT NULL,
`tq_st_id` int(10) NOT NULL,
PRIMARY KEY (`tq_id`)
) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
INSERT INTO `topic_questions` (`tq_id`, `tq_text`, `tq_st_id`) VALUES
(1, 'q1', 1),
(2, 'q2', 1),
(3, 'q3', 2),
(4, 'q4', 2);

用户

CREATE TABLE `users` (
`u_id` int(10) NOT NULL AUTO_INCREMENT,
`uname` varchar(100) COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`u_id`)
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
INSERT INTO `users` (`u_id`, `uname`) VALUES
(1, 'Bob'),
(2, 'Gary');

现在的查询,

SELECT * FROM `surveys`
INNER JOIN survey_users ON survey_users.su_s_id = surveys.survey_id
INNER JOIN survey_topics ON survey_topics.st_survey_id = surveys.survey_id
INNER JOIN topic_questions ON survey_topics.st_id = topic_questions.tq_st_id
WHERE NOT EXISTS (
SELECT * FROM survey_responses a WHERE a.sr_tq_id = topic_questions.tq_id AND a.sr_st_id = survey_topics.st_id AND a.sr_su_uid = survey_users.su_uid
)

基本上,我试图获得一份调查列表,即使有一个用户错过了一个调查回复,即使这只是针对一个主题的一个问题。

该查询运行良好,但在包含数千个调查、主题、用户、问题和回复的数据库中运行速度极慢。

请有人提供一个更快的查询,我读到使用左联接而不是WHERE NOT EXISTS更有效吗?

感谢大家的帮助,我在实际数据库中有索引,这个例子没有任何索引。

谢谢。

考虑NOT IN与NOT EXISTS与LEFT JOIN/IS NULL的其他变体,其中性能可能有所不同:

LEFT JOIN/NULL

...
LEFT JOIN survey_responses sr 
ON  sr.sr_tq_id  = topic_questions.tq_id 
AND sr.sr_st_id  = survey_topics.st_id
AND sr.sr_su_uid = survey_users.su_uid
WHERE sr.sr_tq_id  IS NULL
AND sr.sr_st_id  IS NULL
AND sr.sr_su_uid IS NULL

NOT IN(MySQL似乎支持多列IN(

...
WHERE (topic_questions.tq_id, survey_topics.st_id, survey_users.su_uid) 
NOT IN (
SELECT a.sr_tq_id​, a.sr_st_id, a.sr_su_uid ​
FROM survey_responses
)

NOT IN(由于多列而使用CTE(

WITH sub AS (
SELECT a.sr_tq_id​, a.sr_st_id, a.sr_su_uid 
​FROM survey_responses
)
SELECT
...
WHERE topic_questions.tq_id NOT IN (SELECT a.str_tq_id FROM sub)
AND survey_topics.st_id   NOT IN (SELECT a.sr_st_id  FROM sub)
AND survey_users.su_uid ​  NOT IN (SELECT a.sr_su_uid FROM sub)

也许可以尝试对topic_tquestions表进行LEFT OUTER JOIN,并在该表中包含问题的列。然后,您可以检查这些列中的NULL,以确定用户尚未回答的问题。

此外,您可能需要尝试将INNER联接切换为LEFT。它应该处理得更快。如果您需要筛选出未回答的项目、空用户等,请尝试在WHERE子句中进行筛选。

需要的综合指数:

survey_users:  (su_s_id, su_uid)
survey_topics:  (st_survey_id, st_id)
topic_questions:  (tq_st_id, tq_id)
a:  (sr_tq_id, sr_su_uid, sr_st_id)

最新更新