我正在运行postgresql 14.4.
我有两个表,stocks_buys
和stocks_quotes
。报价包含证券的每日收盘金额,而买入则包含总金额和股票数量的特定交易/合同。它们的模式如下:
create_table "stocks_buys", force: :cascade do |t|
t.bigint "security_id", null: false
t.integer "total_cents", default: 0, null: false
t.decimal "quantity", precision: 8, scale: 2, default: "0.0"
t.datetime "created_at", null: false
t.datetime "updated_at", null: false
t.date "date"
t.bigint "account_id"
t.decimal "shares_not_sold", precision: 8, scale: 2, default: "0.0"
t.index ["account_id"], name: "index_stocks_buys_on_account_id"
t.index ["date"], name: "index_stocks_buys_on_date"
t.index ["security_id"], name: "index_stocks_buys_on_security_id"
end
create_table "stocks_quotes", force: :cascade do |t|
t.bigint "security_id"
t.date "date"
t.datetime "created_at", default: -> { "CURRENT_TIMESTAMP" }, null: false
t.datetime "updated_at", default: -> { "CURRENT_TIMESTAMP" }, null: false
t.bigint "close_cents"
t.index ["date"], name: "index_stocks_quotes_on_date"
t.index ["security_id", "date"], name: "index_stocks_quotes_on_security_id_and_date", unique: true
t.index ["security_id"], name: "index_stocks_quotes_on_security_id"
end
我试图创建的查询是将这两个表连接在一起,并显示从购买日期到现在每个合同的所有报价金额。
我现在查询的样子如下:
SELECT
"stocks_buys"."id" as "buy_id",
"stocks_buys"."security_id",
"stocks_buys"."account_id",
"stocks_buys"."shares_not_sold" as "quantity",
"stocks_quotes"."date" as "date",
"stocks_buys"."total_cents" as "cost_basis_total_cents",
"stocks_quotes"."close_cents",
"stocks_buys"."quantity" as "cost_basis_quantity"
FROM "stocks_buys"
INNER JOIN "stocks_quotes"
on "stocks_quotes"."security_id" = "stocks_buys"."security_id"
AND "stocks_quotes"."date" >= "stocks_buys"."date"
WHERE "stocks_buys"."shares_not_sold" > 0
对于大约43k个报价和430个购买,这个查询大约需要60-70ms。
运行explain查询返回:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=2060.54..6186.27 rows=88844 width=48) (actual time=25.060..54.465 rows=29726 loops=1)
Hash Cond: (stocks_buys.security_id = stocks_quotes.security_id)
Join Filter: (stocks_quotes.date >= stocks_buys.date)
Rows Removed by Join Filter: 344962
Buffers: shared hit=1130
-> Seq Scan on stocks_buys (cost=0.00..36.36 rows=115 width=40) (actual time=0.047..0.198 rows=115 loops=1)
Filter: (shares_not_sold > '0'::numeric)
Rows Removed by Filter: 314 Buffers: shared hit=31 -> Hash (cost=1526.35..1526.35 rows=42735 width=20) (actual time=24.983..24.984 rows=42735 loops=1)
Buckets: 65536 Batches: 1 Memory Usage: 2850kB
Buffers: shared hit=1099
-> Seq Scan on stocks_quotes (cost=0.00..1526.35 rows=42735 width=20) (actual time=0.007..13.217 rows=42735 loops=1)
Buffers: shared hit=1099
Planning Time: 0.443 ms
Execution Time: 56.382 ms
(16 rows)
Time: 57.790 ms
然而,我遇到的问题是,如果我从选择中删除close_cents
列,那么运行所需的时间将被削减到几乎是原始查询计划的三分之一。当我解释没有close_cents
的新查询时,我得到:
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.41..5527.21 rows=88302 width=40) (actual time=0.119..15.750 rows=29726 loops=1)
Buffers: shared hit=568
-> Seq Scan on stocks_buys (cost=0.00..36.36 rows=114 width=40) (actual time=0.067..0.355 rows=115 loops=1)
Filter: (shares_not_sold > '0'::numeric)
Rows Removed by Filter: 314
Buffers: shared hit=31
-> Index Only Scan using index_stocks_quotes_on_security_id_and_date on stocks_quotes (cost=0.41..36.30 rows=1187 width=12) (actual time=0.008..0.068 rows=258 loops=115)
Index Cond: ((security_id = stocks_buys.security_id) AND (date >= stocks_buys.date))
Heap Fetches: 0
Buffers: shared hit=537
Planning:
Buffers: shared hit=3
Planning Time: 0.822 ms
Execution Time: 18.706 ms
(14 rows)
如果我尝试SET enable_seqscan = OFF;
,试图强迫Postgres使用索引时,包括close_cents
,它仍然不想使用index_stocks_quotes_on_security_id_and_date
索引。
Merge Join (cost=0.56..6958.49 rows=88912 width=48) (actual time=0.281..76.936 rows=29726 loops=1)
Merge Cond: (stocks_buys.security_id = stocks_quotes.security_id)
Join Filter: (stocks_quotes.date >= stocks_buys.date)
Rows Removed by Join Filter: 344962
Buffers: shared hit=732
-> Index Scan using index_stocks_buys_on_security_id on stocks_buys (cost=0.27..148.00 rows=115 width=40) (actual time=0.180..0.427 rows=115 loops=1) Filter: (shares_not_sold > '0'::numeric)
Rows Removed by Filter: 314 Buffers: shared hit=78
-> Materialize (cost=0.29..2249.14 rows=42735 width=20) (actual time=0.087..34.198 rows=383051 loops=1)
Buffers: shared hit=654
-> Index Scan using index_stocks_quotes_on_security_id on stocks_quotes (cost=0.29..2142.31 rows=42735 width=20) (actual time=0.076..9.197 rows=42735 loops=1)
Buffers: shared hit=654
Planning Time: 0.708 ms
Execution Time: 78.437 ms
我最初将close_cents作为十进制(我仍然将其存储为cents,但只是想要更精确,冷静),并将其更改为bigint,看看是否有帮助。
怎么了?为什么要将这个列添加到select中,而不是在join或其他地方使用它,从而导致如此严重的减速呢?有什么事我能做得不那么慢吗?
谢谢!
表stock_quotes
有索引t.index ["security_id", "date"]
,它恰好用于搜索,还包括选择列表中的列。这就是为什么当你删除close_cents
列时,它使用这个索引而不是Seq Scan
。
你可以尝试创建一个"覆盖索引"。对于原始查询:
create index ix1 on stock_quotes (security_id, date) include (close_cents);
尝试创建这个索引并检索原始查询的新计划。