如何在使用 where 子句时优化/重构 MySQL 数据透视表性能

我有两个简单的MySQL表 - 一个索引表t_id，它有一个唯一的主id;还有一个数据透视表t_data将这些ID分散到各种数据字段中：

CREATE TABLE `t_id` (  
`id` bigint(12) NOT NULL AUTO_INCREMENT,  
PRIMARY KEY (`id`)  
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;  
CREATE TABLE `t_data` (  
`id` int(11) NOT NULL,  
`field` varchar(50) CHARACTER SET cp1251 NOT NULL,  
`value` varchar(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci 
DEFAULT NULL,  
UNIQUE KEY `idxfield` (`id`,`field`),  
KEY `value` (`value`)  
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

下面是一些示例数据：

+----+--------------+-------------------+  
| id | field        | value             |  
+----+--------------+-------------------+  
| 1  | organization | Apple Inc.        |  
| 1  | state        | CA                |  
| 2  | organization | Adobe Inc.        |  
| 2  | state        | CA                |  
| 3  | organization | Alphabet Inc.     |  
| 3  | state        | CA                |  
| 4  | organization | Rockwell Collins  |  
| 4  | state        | IA                |  
| 5  | organization | GEICO             |  
| 5  | state        | MD                |  
| 6  | organization | Anheuser-Busch    |  
| 6  | state        | MO                |  
| 7  | organization | Bank of America   |  
| 7  | state        | NC                |  
+----+--------------+-------------------+

可以使用标准数据透视表选择查询进行报告：

select  
i.id,  
ifnull (max(case when d.field = 'organization' then d.value end),'') 'organization',  
ifnull (max(case when d.field = 'state' then d.value end),'') 'state'  
from `t_id` i  
left join `t_data` d  
on i.id = d.id  
group by i.id  
limit 0,10

这个简单的示例仅显示两个"虚拟"字段(组织和状态(，具有 7 个唯一 id：

+----+------------------+-------+  
| id | organization     | state |  
+----+------------------+-------+  
| 1  | Apple Inc.       |  CA   |  
| 2  | Adobe Inc.       |  CA   |  
| 3  | Alphabet Inc.    |  CA   |  
| 4  | Rockwell Collins |  IA   |  
| 5  | GEICO            |  MD   |  
| 6  | Anheuser-Busch   |  MO   |  
| 7  | Bank of America  |  NC   |  
+----+------------------+-------+

在我们的实际生产环境中，我们有几十个"虚拟"字段(不仅仅是2个(，还有数百万个唯一id(不仅仅是7个(。该数据库在对单个 id 进行 crud 类型查询(不到一秒(时表现得非常好，甚至一次列出一个限制组(再次不到一秒(。当尝试使用 where 子句约束选择时，就会出现问题(查询需要几十秒(。例如，要查找加利福尼亚州的所有组织，请执行以下操作：

select  
x.id,  
x.organization,  
x.state  
from  
(  
select  
i.id,  
ifnull (max(case when d.field = 'organization' then d.value end),'') 'organization',  
ifnull (max(case when d.field = 'state' then d.value end),'') 'state'  
from `t_id` i  
left join `t_data` d  
on i.id = d.id  
group by i.id  
) as x  
where x.state='CA'  
limit 0,10  

+----+---------------+-------+  
| id | organization  | state |  
+----+---------------+-------+  
| 1  | Apple Inc.    |  CA   |  
| 2  | Adobe Inc.    |  CA   |  
| 3  | Alphabet Inc. |  CA   |  
+----+---------------+-------+

这有效，但需要很长时间(再次，10 秒(！这里的最佳实践是什么 - 有没有更好的方法来编写这些类型的查询？如何针对 where 子句优化这些数据透视表查询？

对于大型数据集，这应该要快得多。此外，它可以轻松地扩展到任意数量的"虚拟"字段。您可以将您可能具有的任何搜索条件放在 %% 之间。

select  
i.id,  
coalesce(max(case when field = 'organization' then value end), '') as organization,  
coalesce(max(case when field = 'state' then value end), '') as state  
from t_id i  
left join t_data d  
on i.id = d.id  
and i.id like '%%'  
and i.id in (  
select id  
from `t_data`  
where `field` = 'organization'   
and `value` like '%%'  
and  id  in (  
select id  
from `t_data`  
where `field` = 'state'  
and `value` like '%%'  
)  
)  
group by i.id

如果你想找到在加利福尼亚州运营的组织，你实际上不需要子查询：

SELECT
i.id,
COALESCE(MAX(CASE WHEN field = 'organization' THEN value END), '') AS organization,
COALESCE(MAX(CASE WHEN field = 'state' THEN value END), '') AS state
FROM t_id i
LEFT JOIN t_data d
ON i.id = d.id
GROUP BY
i.id
HAVING
COUNT(CASE WHEN field = 'state' AND value = 'CA' THEN 1 END) > 0;

这里的诀窍是在HAVING子句中断言匹配的id组需要有加利福尼亚州state的记录。

这是EAV，而不是Pivot。因此，解决方案在于"自我加入"。

SELECT  a.id,
a.value AS organization,
b.value AS state
FROM t_data AS a
JOIN t_data AS b  ON a.id = b.id
WHERE a.field = 'organization'
AND b.field = 'state';

如果您需要t_id来控制哪些 ID，请滑入

JOIN t_id AS i  ON i.id = a.id

如果您想限制为 CA，请继续

AND b.value = 'CA'

并添加

INDEX(field, value)

因此，它不必扫描那么多行来查找 CA 条目。

相关内容

最新更新

热门标签：