在大型关联上高效选择和区分



我有三个模型:CatalogProductValueValue表有一个characteristic_id列,我想在一组values上获取不同characteristic_id的列表。

这些关系是:

  • 一个catalog有很多products
  • 一个product有很多values

这是我提出的查询:

Value.joins(:product).select(:characteristic_id).distinct.where(products: {catalog_id: catalog.id}).pluck(:characteristic_id)
=> [441, 2582, 3133]

返回正确的结果,但在包含一百万种产品(约 50 秒(的大型目录上非常慢。 我找不到更有效的方法来做到这一点。

下面是查询的EXPLAIN

=> EXPLAIN for: SELECT DISTINCT "values"."characteristic_id" FROM "values" INNER JOIN "products" ON "products"."id" = "values"."product_id" WHERE "products"."catalog_id" = $1 [["catalog_id", 1767]]
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
HashAggregate  (cost=1515106.82..1515109.15 rows=233 width=4)
Group Key: "values".characteristic_id
->  Hash Join  (cost=124703.76..1492245.65 rows=9144469 width=4)
Hash Cond: ("values".product_id = products.id)
->  Seq Scan on "values"  (cost=0.00..1002863.07 rows=34695107 width=8)
->  Hash  (cost=114002.20..114002.20 rows=652285 width=4)
->  Bitmap Heap Scan on products  (cost=12311.64..114002.20 rows=652285 width=4)
Recheck Cond: (catalog_id = 1767)
->  Bitmap Index Scan on index_products_on_catalog_id  (cost=0.00..12148.57 rows=652285 width=0)
Index Cond: (catalog_id = 1767)
(10 rows)

关于如何更快地运行此查询的任何想法?

确保两个外键都有索引:

  • 索引"values"."product_id"
  • 索引"products"."catalog_id"
  1. 尝试在values.characteristic_id上添加索引。
  2. 通常GROUP BY可能比DISTINCT快:

    Value.joins(:p roduct(.where(products: {catalog_id: catalog.id}

    (.select(:characteristic_id(.group(:characteristic_id(.pluck(:characteristic_id(

最新更新