为什么添加一个bigint列到一个选择加倍我的查询执行时间?

我正在运行postgresql 14.4.

我有两个表，stocks_buys和stocks_quotes。报价包含证券的每日收盘金额，而买入则包含总金额和股票数量的特定交易/合同。它们的模式如下:

create_table "stocks_buys", force: :cascade do |t|
t.bigint "security_id", null: false                                                                                                                        
t.integer "total_cents", default: 0, null: false                                                                                                           
t.decimal "quantity", precision: 8, scale: 2, default: "0.0"                                                                                               
t.datetime "created_at", null: false                                                                                                                       
t.datetime "updated_at", null: false                                                                                                                       
t.date "date"
t.bigint "account_id"                                                                                                                                      
t.decimal "shares_not_sold", precision: 8, scale: 2, default: "0.0"                                                                                        
t.index ["account_id"], name: "index_stocks_buys_on_account_id"                                                                                            
t.index ["date"], name: "index_stocks_buys_on_date"                                                                                                        
t.index ["security_id"], name: "index_stocks_buys_on_security_id"                                                                                          
end  
create_table "stocks_quotes", force: :cascade do |t|                                                                                                         
t.bigint "security_id"                                                                                                                                     
t.date "date"                                                                                                                                              
t.datetime "created_at", default: -> { "CURRENT_TIMESTAMP" }, null: false                                                                                  
t.datetime "updated_at", default: -> { "CURRENT_TIMESTAMP" }, null: false                                                                                  
t.bigint "close_cents"                                                                                                                                           
t.index ["date"], name: "index_stocks_quotes_on_date"                                                                                                      
t.index ["security_id", "date"], name: "index_stocks_quotes_on_security_id_and_date", unique: true                                                         
t.index ["security_id"], name: "index_stocks_quotes_on_security_id"                                                                                        
end

我试图创建的查询是将这两个表连接在一起，并显示从购买日期到现在每个合同的所有报价金额。

我现在查询的样子如下:

SELECT 
"stocks_buys"."id" as "buy_id",                                                                                                                              
"stocks_buys"."security_id",                                                                                                                                 
"stocks_buys"."account_id",                                                                                                                                  
"stocks_buys"."shares_not_sold" as "quantity",                                                                                                               
"stocks_quotes"."date" as "date",                                                                                                                            
"stocks_buys"."total_cents" as "cost_basis_total_cents",                                                                                                     
"stocks_quotes"."close_cents",
"stocks_buys"."quantity" as "cost_basis_quantity"                                                                                                            
FROM "stocks_buys"                                                                                                                                             
INNER JOIN "stocks_quotes" 
on "stocks_quotes"."security_id" = "stocks_buys"."security_id"
AND "stocks_quotes"."date" >= "stocks_buys"."date"                 
WHERE "stocks_buys"."shares_not_sold" > 0

对于大约43k个报价和430个购买，这个查询大约需要60-70ms。

运行explain查询返回:

QUERY PLAN                                                           
--------------------------------------------------------------------------------------------------------------------------------
Hash Join  (cost=2060.54..6186.27 rows=88844 width=48) (actual time=25.060..54.465 rows=29726 loops=1)
Hash Cond: (stocks_buys.security_id = stocks_quotes.security_id)
Join Filter: (stocks_quotes.date >= stocks_buys.date)
Rows Removed by Join Filter: 344962
Buffers: shared hit=1130
->  Seq Scan on stocks_buys  (cost=0.00..36.36 rows=115 width=40) (actual time=0.047..0.198 rows=115 loops=1)
Filter: (shares_not_sold > '0'::numeric)
Rows Removed by Filter: 314         Buffers: shared hit=31   ->  Hash  (cost=1526.35..1526.35 rows=42735 width=20) (actual time=24.983..24.984 rows=42735 loops=1)
Buckets: 65536  Batches: 1  Memory Usage: 2850kB
Buffers: shared hit=1099
->  Seq Scan on stocks_quotes  (cost=0.00..1526.35 rows=42735 width=20) (actual time=0.007..13.217 rows=42735 loops=1)
Buffers: shared hit=1099
Planning Time: 0.443 ms
Execution Time: 56.382 ms
(16 rows)
Time: 57.790 ms

然而，我遇到的问题是，如果我从选择中删除close_cents列，那么运行所需的时间将被削减到几乎是原始查询计划的三分之一。当我解释没有close_cents的新查询时，我得到:

            QUERY PLAN                                                                                   
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop  (cost=0.41..5527.21 rows=88302 width=40) (actual time=0.119..15.750 rows=29726 loops=1)
Buffers: shared hit=568
->  Seq Scan on stocks_buys  (cost=0.00..36.36 rows=114 width=40) (actual time=0.067..0.355 rows=115 loops=1)
Filter: (shares_not_sold > '0'::numeric)
Rows Removed by Filter: 314
Buffers: shared hit=31
->  Index Only Scan using index_stocks_quotes_on_security_id_and_date on stocks_quotes  (cost=0.41..36.30 rows=1187 width=12) (actual time=0.008..0.068 rows=258 loops=115)
Index Cond: ((security_id = stocks_buys.security_id) AND (date >= stocks_buys.date))
Heap Fetches: 0
Buffers: shared hit=537
Planning:
Buffers: shared hit=3
Planning Time: 0.822 ms
Execution Time: 18.706 ms
(14 rows)

如果我尝试SET enable_seqscan = OFF;，试图强迫Postgres使用索引时，包括close_cents，它仍然不想使用index_stocks_quotes_on_security_id_and_date索引。

Merge Join  (cost=0.56..6958.49 rows=88912 width=48) (actual time=0.281..76.936 rows=29726 loops=1)
Merge Cond: (stocks_buys.security_id = stocks_quotes.security_id)
Join Filter: (stocks_quotes.date >= stocks_buys.date)
Rows Removed by Join Filter: 344962
Buffers: shared hit=732
->  Index Scan using index_stocks_buys_on_security_id on stocks_buys  (cost=0.27..148.00 rows=115 width=40) (actual time=0.180..0.427 rows=115 loops=1)         Filter: (shares_not_sold > '0'::numeric)
Rows Removed by Filter: 314         Buffers: shared hit=78
->  Materialize  (cost=0.29..2249.14 rows=42735 width=20) (actual time=0.087..34.198 rows=383051 loops=1)
Buffers: shared hit=654
->  Index Scan using index_stocks_quotes_on_security_id on stocks_quotes  (cost=0.29..2142.31 rows=42735 width=20) (actual time=0.076..9.197 rows=42735 loops=1)
Buffers: shared hit=654
Planning Time: 0.708 ms
Execution Time: 78.437 ms

我最初将close_cents作为十进制(我仍然将其存储为cents，但只是想要更精确，冷静)，并将其更改为bigint，看看是否有帮助。

怎么了?为什么要将这个列添加到select中，而不是在join或其他地方使用它，从而导致如此严重的减速呢?有什么事我能做得不那么慢吗?

谢谢!

表stock_quotes有索引t.index ["security_id", "date"]，它恰好用于搜索，还包括选择列表中的列。这就是为什么当你删除close_cents列时，它使用这个索引而不是Seq Scan。

你可以尝试创建一个"覆盖索引"。对于原始查询:

create index ix1 on stock_quotes (security_id, date) include (close_cents);

尝试创建这个索引并检索原始查询的新计划。

相关内容

最新更新

热门标签：