需要在EAV结构化数据上进行复杂搜索MySQL优化



i有一个具有EAV结构化数据的大数据库,必须可搜索且可触摸。我尝试了书中的所有技巧来使它足够快,但是在某些情况下,它仍然无法在合理的时间内完成。

这是我的表结构(仅相关零件,询问您是否需要更多):

CREATE TABLE IF NOT EXISTS `object` (
  `object_id` bigint(20) NOT NULL AUTO_INCREMENT,
  `oid` varchar(32) CHARACTER SET utf8 NOT NULL,
  `status` varchar(100) CHARACTER SET utf8 DEFAULT NULL,
  `created` datetime NOT NULL,
  `updated` datetime NOT NULL,
  PRIMARY KEY (`object_id`),
  UNIQUE KEY `oid` (`oid`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `version` (
  `version_id` bigint(20) NOT NULL AUTO_INCREMENT,
  `type_id` bigint(20) NOT NULL,
  `object_id` bigint(20) NOT NULL,
  `created` datetime NOT NULL,
  `status` varchar(100) CHARACTER SET utf8 DEFAULT NULL,
  PRIMARY KEY (`version_id`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `value` (
  `value_id` bigint(20) NOT NULL AUTO_INCREMENT,
  `object_id` int(11) NOT NULL,
  `attribute_id` int(11) NOT NULL,
  `version_id` bigint(20) NOT NULL,
  `type_id` bigint(20) NOT NULL,
  `value` text NOT NULL,
  PRIMARY KEY (`value_id`),
  KEY `field_id` (`attribute_id`),
  KEY `action_id` (`version_id`),
  KEY `form_id` (`type_id`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8;

这是一个示例对象。我的数据库中有大约100万。每个对象可能具有不同数量的属性,并具有不同的属性

INSERT INTO `owner` (`owner_id`, `uid`, `status`, `created`, `updated`) VALUES (1, 'cwnzrdxs4dzxns47xs4tx', 'Green', NOW(), NOW());
INSERT INTO `object` (`object_id`, `type_id`, `owner_id`, `created`, `status`) VALUES (1, 1, 1, NOW(), NOW());
INSERT INTO `value` (`value_id`, `owner_id`, `attribute_id`, `object_id`, `type_id`, `value`) VALUES (1, 1, 1, 1, 1, 'Munich');
INSERT INTO `value` (`value_id`, `owner_id`, `attribute_id`, `object_id`, `type_id`, `value`) VALUES (2, 1, 2, 1, 1, 'Germany');
INSERT INTO `value` (`value_id`, `owner_id`, `attribute_id`, `object_id`, `type_id`, `value`) VALUES (3, 1, 3, 1, 1, '123');
INSERT INTO `value` (`value_id`, `owner_id`, `attribute_id`, `object_id`, `type_id`, `value`) VALUES (4, 1, 4, 1, 1, '2012-01-13');
INSERT INTO `value` (`value_id`, `owner_id`, `attribute_id`, `object_id`, `type_id`, `value`) VALUES (5, 1, 5, 1, 1, 'A cake!');

现在要解决我当前的机制。我的第一次尝试是MySQL的典型方法。在我需要的任何东西上进行一个巨大的SQL。完成Desaster!由于精疲力尽的RAM,长达长时间的加载,甚至撞到了PHP和MySQL服务器。

所以我将查询分为几个步骤:

1确定所有需要的属性_ids。

我可以在另一个表中查找它们,该表引用对象的type_id。结果是attribute_ids的列表。(此表与性能不太相关,因此它不包含在我的样本中。)

:type_id包含我要在搜索中包含的任何对象中的所有type_ids。我已经在应用程序中获得了此信息。所以这很便宜。

SELECT * FROM attribute WHERE form_id IN (:type_id)

结果是type_id整数的数组。

2搜索匹配对象编译了一个大的SQL查询,该查询为我想要的每个条件添加了一个内部连接。这听起来很恐怖,但最后,这是最快的方法:(

典型生成的查询可能看起来像这样。可悲的是,限制是必要的,否则我可能会获得如此多的ID,以至于生成的数组使PHP爆炸或打破了下一个查询中的语句:

SELECT DISTINCT `version`.object_id FROM `version`
INNER JOIN `version` AS condition1 
        ON `version`.version_id = condition1.version_id 
       AND condition1.created = '2012-03-04' -- Filter by version date
INNER JOIN `value` AS condition2 
        ON `version`.version_id = condition2.version_id
       AND condition2.type_id IN (:type_id) -- try to limit joins to object types we need
       AND condition2.attribute_id = :field_id2 -- searching for a value in a specific attribute
       AND condition2.value = 'Munich' -- searching for the value 'Munich'
INNER JOIN `value` AS condition3 
        ON `version`.version_id = condition3.version_id
       AND condition3.type_id IN (:type_id) -- try to limit joins to object types we need
       AND condition3.attribute_id = :field_id3 -- searching for a value in a specific attribute
       AND condition3.value = 'Green' -- searching for the value 'Green'
WHERE `version`.type_id IN (:type_id) ORDER BY `version`.version_id DESC LIMIT 10000

结果将包含来自我可能需要的任何对象的所有对象。我选择object_ids而不是version_ids,因为我需要具有匹配对象的所有版本,无论哪种版本匹配。

3排序和页面结果接下来,我将创建一个查询,以某个属性对对象进行分类,然后在结果上打印。

SELECT DISTINCT object_id
FROM value
WHERE object_id IN (:foundObjects)
AND attribute_id = :attribute_id_to_sort
AND value > ''
ORDER BY value ASC LIMIT :limit OFFSET :offset

结果是以前搜索的对象ID的分类列表

4获取我们的完整对象,版本和属性在最后一步中,我将为所找到的任何对象和版本选择所有值。

SELECT `value`.*, `object`.*, `version`.*, `type`.*
`object`.status AS `object.status`,
`object`.flag AS `object.flag`,
`version`.created AS `version.created`,
`version`.status AS `version.status`,
FROM version
INNER JOIN `type` ON `version`.form_id = `type`.type_id
INNER JOIN `object` ON `version`.object_id = `object`.object_id
LEFT JOIN value ON `version`.version_id = `value`.version_id
WHERE version.object_id IN (:sortedObjectIds) AND `version.type_id IN (:typeIds)
ORDER BY version.created DESC

然后,结果将通过PHP编译为nice对象 ->版本 ->值数组结构。


现在问题

  • 可以以任何方式加速整个混乱吗?
  • 我可以以某种方式从搜索查询中删除限制10000限制?

如果其他所有失败,也许可以切换数据库技术?请参阅我的另一个问题:在具有不同属性的大量对象中进行了优化的数据库


现实生活样本

表大小:对象-193801行,版本-193841行,值-1053928行

SELECT * FROM attribute WHERE attribute_id IN (30)
SELECT DISTINCT `version`.object_id
FROM version  
INNER JOIN value AS condition_d4e328e33813 
     ON version.version_id = condition_d4e328e33813.version_id
    AND condition_d4e328e33813.type_id IN (30)
    AND condition_d4e328e33813.attribute_id IN (377) 
    AND condition_d4e328e33813.value LIKE '%e%'  
INNER JOIN value AS condition_2c870b0a429f 
     ON version.version_id = condition_2c870b0a429f.version_id
    AND condition_2c870b0a429f.type_id IN (30)
    AND condition_2c870b0a429f.attribute_id IN (376) 
    AND condition_2c870b0a429f.value LIKE '%s%' 
WHERE version.type_id IN (30) 
ORDER BY version.version_id DESC LIMIT 10000 -- limit to 10000 or it breaks!

解释:

id  select_type  table                   type      possible_keys                key         key_len ref                               rows      Extra   
1   SIMPLE       condition_2c870b0a429f  ref       field_id,action_id,form_id   field_id    4       const                             178639    Using where; Using temporary; Using filesort
1   SIMPLE       action                  eq_ref    PRIMARY                      PRIMARY     8       condition_2c870b0a429f.action_id  1         Using where
1   SIMPLE       condition_d4e328e33813  ref       field_id,action_id,form_id   action_id   8       action.action_id                  11        Using where; Distinct

对象搜索完成(峰值RAM:5.91MB,时间:4.64s)

SELECT DISTINCT object_id
FROM version
WHERE object_id IN (193793,193789, ... ,135326,135324) -- 10000 ids in here!
ORDER BY created ASC
LIMIT 50 OFFSET 0                                                  

对象排序完成(峰值RAM:6.68MB,时间:0.352S)

SELECT `value`.*, object.*, version.*, type.*,
    object.status AS `object.status`,
    object.flag AS `object.flag`,
    version.created AS `version.created`,
    version.status AS `version.status`,
    version.flag AS `version.flag`
FROM version
INNER JOIN type ON version.type_id = type.type_id
INNER JOIN object ON version.object_id = object.object_id
LEFT JOIN value ON version.version_id = `value`.version_id
WHERE version.object_id IN (135324,135326,...,135658,135661) AND version.type_id IN (30)
ORDER BY quality DESC, version.created DESC 

对象负载查询完成(峰值RAM:6.68MB,时间:0.083S)
对象汇编为已完成的阵列(峰值RAM:6.68MB,时间:0.007s)

只是尝试在搜索查询之前添加解释:

EXPLAIN SELECT DISTINCT `version`.object_id FROM `version`, etc ...

然后检查"额外"列中的结果,它将为您提供一些速度加速查询的线索,例如在右字段上添加索引。

另外,您可以删除加入,在MySQL响应中获得更多结果,并通过使用PHP循环处理大型数组。

我首先要尝试覆盖索引(即:所有列以匹配您要查询的标准,甚至是由于将其拉出来的标准)。这样,引擎不必返回原始页面数据。

由于您需要版本中的" object_id",并且使用" version_id"作为联接基础与其他表。您的版本表还具有type_id上的Where子句,因此我将在

上有一个索引

版本表 - (object_id,version_id,type_id)

对于您的"值"表,也可以匹配标准

值表 - (version_id,type_id,attribute_id,value,创建)

最新更新