我有两个简单的MySQL表 - 一个索引表t_id
,它有一个唯一的主id;还有一个数据透视表t_data
将这些ID分散到各种数据字段中:
CREATE TABLE `t_id` (
`id` bigint(12) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;
CREATE TABLE `t_data` (
`id` int(11) NOT NULL,
`field` varchar(50) CHARACTER SET cp1251 NOT NULL,
`value` varchar(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci
DEFAULT NULL,
UNIQUE KEY `idxfield` (`id`,`field`),
KEY `value` (`value`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
下面是一些示例数据:
+----+--------------+-------------------+
| id | field | value |
+----+--------------+-------------------+
| 1 | organization | Apple Inc. |
| 1 | state | CA |
| 2 | organization | Adobe Inc. |
| 2 | state | CA |
| 3 | organization | Alphabet Inc. |
| 3 | state | CA |
| 4 | organization | Rockwell Collins |
| 4 | state | IA |
| 5 | organization | GEICO |
| 5 | state | MD |
| 6 | organization | Anheuser-Busch |
| 6 | state | MO |
| 7 | organization | Bank of America |
| 7 | state | NC |
+----+--------------+-------------------+
可以使用标准数据透视表选择查询进行报告:
select
i.id,
ifnull (max(case when d.field = 'organization' then d.value end),'') 'organization',
ifnull (max(case when d.field = 'state' then d.value end),'') 'state'
from `t_id` i
left join `t_data` d
on i.id = d.id
group by i.id
limit 0,10
这个简单的示例仅显示两个"虚拟"字段(组织和状态(,具有 7 个唯一 id:
+----+------------------+-------+
| id | organization | state |
+----+------------------+-------+
| 1 | Apple Inc. | CA |
| 2 | Adobe Inc. | CA |
| 3 | Alphabet Inc. | CA |
| 4 | Rockwell Collins | IA |
| 5 | GEICO | MD |
| 6 | Anheuser-Busch | MO |
| 7 | Bank of America | NC |
+----+------------------+-------+
在我们的实际生产环境中,我们有几十个"虚拟"字段(不仅仅是2个(,还有数百万个唯一id(不仅仅是7个(。 该数据库在对单个 id 进行 crud 类型查询(不到一秒(时表现得非常好,甚至一次列出一个限制组(再次不到一秒(。 当尝试使用 where 子句约束选择时,就会出现问题(查询需要几十秒(。 例如,要查找加利福尼亚州的所有组织,请执行以下操作:
select
x.id,
x.organization,
x.state
from
(
select
i.id,
ifnull (max(case when d.field = 'organization' then d.value end),'') 'organization',
ifnull (max(case when d.field = 'state' then d.value end),'') 'state'
from `t_id` i
left join `t_data` d
on i.id = d.id
group by i.id
) as x
where x.state='CA'
limit 0,10
+----+---------------+-------+
| id | organization | state |
+----+---------------+-------+
| 1 | Apple Inc. | CA |
| 2 | Adobe Inc. | CA |
| 3 | Alphabet Inc. | CA |
+----+---------------+-------+
这有效,但需要很长时间(再次,10 秒(! 这里的最佳实践是什么 - 有没有更好的方法来编写这些类型的查询? 如何针对 where 子句优化这些数据透视表查询?
对于大型数据集,这应该要快得多。 此外,它可以轻松地扩展到任意数量的"虚拟"字段。 您可以将您可能具有的任何搜索条件放在 %% 之间。
select
i.id,
coalesce(max(case when field = 'organization' then value end), '') as organization,
coalesce(max(case when field = 'state' then value end), '') as state
from t_id i
left join t_data d
on i.id = d.id
and i.id like '%%'
and i.id in (
select id
from `t_data`
where `field` = 'organization'
and `value` like '%%'
and id in (
select id
from `t_data`
where `field` = 'state'
and `value` like '%%'
)
)
group by i.id
如果你想找到在加利福尼亚州运营的组织,你实际上不需要子查询:
SELECT
i.id,
COALESCE(MAX(CASE WHEN field = 'organization' THEN value END), '') AS organization,
COALESCE(MAX(CASE WHEN field = 'state' THEN value END), '') AS state
FROM t_id i
LEFT JOIN t_data d
ON i.id = d.id
GROUP BY
i.id
HAVING
COUNT(CASE WHEN field = 'state' AND value = 'CA' THEN 1 END) > 0;
这里的诀窍是在HAVING
子句中断言匹配的id
组需要有加利福尼亚州state
的记录。
这是EAV,而不是Pivot。 因此,解决方案在于"自我加入"。
SELECT a.id,
a.value AS organization,
b.value AS state
FROM t_data AS a
JOIN t_data AS b ON a.id = b.id
WHERE a.field = 'organization'
AND b.field = 'state';
如果您需要t_id
来控制哪些 ID,请滑入
JOIN t_id AS i ON i.id = a.id
如果您想限制为 CA,请继续
AND b.value = 'CA'
并添加
INDEX(field, value)
因此,它不必扫描那么多行来查找 CA 条目。