MySQL 8.0在大型数据集上的性能



我有一个简单的模式,下面有多个条目,我想加快查询速度。我包含了索引,但它们对性能没有重大影响,但我没有在mysql中处理大数据的经验。它运行在Windows10上的本地MySQL 8.0服务器上,内存为16GB。

DROP TABLE IF EXISTS `drugs`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!50503 SET character_set_client = utf8mb4 */;
CREATE TABLE `drugs` (
`drug_id` int NOT NULL,
`ATC` varchar(15) NOT NULL,
PRIMARY KEY (`drug_id`),
UNIQUE KEY `drug_id_UNIQUE` (`drug_id`),
KEY `drug_ATC` (`ATC`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
/*!40101 SET character_set_client = @saved_cs_client */;

--
-- Table structure for table `events`
--
DROP TABLE IF EXISTS `events`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!50503 SET character_set_client = utf8mb4 */;
CREATE TABLE `events` (
`event_id` int NOT NULL,
`Meddra` varchar(10) NOT NULL,
PRIMARY KEY (`event_id`),
UNIQUE KEY `event_id_UNIQUE` (`event_id`,`Meddra`),
KEY `event_Meddra` (`Meddra`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
/*!40101 SET character_set_client = @saved_cs_client */;
--
-- Table structure for table `reports`
--
DROP TABLE IF EXISTS `reports`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!50503 SET character_set_client = utf8mb4 */;
CREATE TABLE `reports` (
`report_id` int NOT NULL,
`reportDate` date NOT NULL,
PRIMARY KEY (`report_id`),
UNIQUE KEY `report_id_UNIQUE` (`report_id`),
KEY `reportDateIndex` (`reportDate`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
/*!40101 SET character_set_client = @saved_cs_client */;
--
-- Table structure for table `reports_drugs`
--
DROP TABLE IF EXISTS `reports_drugs`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!50503 SET character_set_client = utf8mb4 */;
CREATE TABLE `reports_drugs` (
`id` int NOT NULL AUTO_INCREMENT,
`report_id_FK` int NOT NULL,
`drug_id_FK` int NOT NULL,
`type` int NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id_UNIQUE` (`id`),
KEY `report_id_fk_idx` (`report_id_FK`),
KEY `report_id_fk_idx1` (`drug_id_FK`) /*!80000 INVISIBLE */,
CONSTRAINT `drug_report_id_fk` FOREIGN KEY (`drug_id_FK`) REFERENCES `drugs` (`drug_id`),
CONSTRAINT `report_drug_id_fk` FOREIGN KEY (`report_id_FK`) REFERENCES `reports` (`report_id`)
) ENGINE=InnoDB AUTO_INCREMENT=143260638 DEFAULT CHARSET=utf8;
/*!40101 SET character_set_client = @saved_cs_client */;
--
-- Table structure for table `reports_events`
--
DROP TABLE IF EXISTS `reports_events`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!50503 SET character_set_client = utf8mb4 */;
CREATE TABLE `reports_events` (
`id` int NOT NULL AUTO_INCREMENT,
`report_id_FK` int NOT NULL,
`events_id_FK` int NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id_UNIQUE` (`id`),
KEY `report_event_id_fk_idx` (`report_id_FK`),
KEY `event_report_id_fk_idx` (`events_id_FK`),
CONSTRAINT `event_report_id_fk` FOREIGN KEY (`events_id_FK`) REFERENCES `events` (`event_id`),
CONSTRAINT `report_event_id_fk` FOREIGN KEY (`report_id_FK`) REFERENCES `reports` (`report_id`)
) ENGINE=InnoDB AUTO_INCREMENT=143258062 DEFAULT CHARSET=utf8;

我的人工数据由reports表中的30.000.000个条目组成。每个报告的药物和事件的中值为5,这意味着reports_drugsreports_events表各有大约150.000000个条目。

正在使用的两个查询及其响应时间。

SELECT report_id,reportDate,Meddra FROM events
INNER JOIN reports_events ON events.event_id=reports_events.events_id_FK
INNER JOIN reports ON reports_events.report_id_FK=reports.report_id
WHERE Meddra=11125127;

0.36秒持续时间/37.625秒提取

SELECT report_id,reportDate,Meddra,ATC,type FROM events
INNER JOIN reports_events ON events.event_id=reports_events.events_id_FK
INNER JOIN reports ON reports_events.report_id_FK=reports.report_id
INNER JOIN reports_drugs ON reports.report_id=reports_drugs.report_id_FK
INNER JOIN drugs ON reports_drugs.drug_id_FK=drugs.drug_id
WHERE Meddra=11125127 AND ATC="ACNKMFRRMVST";

80.296秒持续时间/0秒提取

这两者在的"执行"阶段都遇到了瓶颈

持续时间源文件>源行>
查询状态源函数
第一次查询"执行">"37.614281">‘SELECT _LEX_UNIT::ExecuteIterat’‘ql_union.cc’
第二次查询'executing''80.293151''SELECT_EX_UNIT::ExecuteIterat''sql_union.cc'

请限定所有列,这使我们更容易帮助您:

SELECT  r.report_id, r.reportDate, e.Meddra
FROM  events AS e
INNER JOIN  reports_events AS e AS re  ON e.event_id=re.events_id_FK
INNER JOIN  reports AS r  ON re.report_id_FK=r.report_id
WHERE  e.Meddra=11125127;
SELECT  r.report_id, r.reportDate, e.Meddra, d.ATC, rd.type
FROM  events AS e
INNER JOIN  reports_events AS e AS re  ON e.event_id = re.events_id_FK
INNER JOIN  reports AS r  ON re.report_id_FK=r.report_id
INNER JOIN  reports_drugs AS d AS rd  ON r.report_id = rd.report_id_FK
INNER JOIN  drugs AS d  ON rd.drug_id_FK = d.drug_id
WHERE  e.Meddra = 11125127
AND  d.ATC = "ACNKMFRRMVST";

我想表中还有其他列?

您的UNIQUE密钥是多余的,应该丢弃。(PRIMARY KEYUNIQUE密钥。(

应该以更好的方式对多对多表进行索引。请参阅此处的讨论:http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table仅此一点就可以解决性能问题。

第二个查询将以以下方式之一遍历5个表:

e -> re -> r -> rd -> d
--OR--
d -> rd -> r -> re -> e

该链接提供了最佳索引,以允许以最佳方式工作(即复合索引,或通过了解InnoDB如何处理PK来实现等效索引(。

而且,由于对e.Meddrae.DTC进行了测试,优化器将半智能地选择从哪一个开始。

最新更新