ActiveRecord `in_batches`在滚动几页后超时

我有一个清除作业，每天运行，清理一些旧记录。我使用的是PostgreSQL 10.21。它执行操作的表具有以下属性(与问题相关的属性(：

create table example_table
(
id                          bigserial primary key,
created_at                  timestamp not null
);
create index index_example_table_on_created_at on example_table (created_at desc);

每天运行的作业是：

ExampleTable.expired.in_batches(of: 1000, &:delete_all)

该表有很多插入(多于读取(，目前它有650万行。获取其中一个批次的id时总是会发生超时。以下是ActiveRecord日志的示例：

ExampleTable Pluck (2.7ms)  SELECT "example_table"."id" FROM "example_table" WHERE (example_table.created_at < '2022-04-23 19:49:08.020992') ORDER BY "example_table"."id" ASC LIMIT $1  [["LIMIT", 1000]]
ExampleTable Pluck (2.0ms)  SELECT "example_table"."id" FROM "example_table" WHERE (example_table.created_at < '2022-04-23 19:49:08.020992') AND "example_table"."id" > $1 ORDER BY "example_table"."id" ASC LIMIT $2  [["id", 13534069], ["LIMIT", 1000]]
ExampleTable Pluck (2.0ms)  SELECT "example_table"."id" FROM "example_table" WHERE (example_table.created_at < '2022-04-23 19:49:08.020992') AND "example_table"."id" > $1 ORDER BY "example_table"."id" ASC LIMIT $2  [["id", 13535069], ["LIMIT", 1000]]
ExampleTable Pluck (1.8ms)  SELECT "example_table"."id" FROM "example_table" WHERE (example_table.created_at < '2022-04-23 19:49:08.020992') AND "example_table"."id" > $1 ORDER BY "example_table"."id" ASC LIMIT $2  [["id", 13536069], ["LIMIT", 1000]]
ExampleTable Pluck (1.8ms)  SELECT "example_table"."id" FROM "example_table" WHERE (example_table.created_at < '2022-04-23 19:49:08.020992') AND "example_table"."id" > $1 ORDER BY "example_table"."id" ASC LIMIT $2  [["id", 13537069], ["LIMIT", 1000]]
ExampleTable Pluck (2.2ms)  SELECT "example_table"."id" FROM "example_table" WHERE (example_table.created_at < '2022-04-23 19:49:08.020992') AND "example_table"."id" > $1 ORDER BY "example_table"."id" ASC LIMIT $2  [["id", 13538069], ["LIMIT", 1000]]
ExampleTable Pluck (2.1ms)  SELECT "example_table"."id" FROM "example_table" WHERE (example_table.created_at < '2022-04-23 19:49:08.020992') AND "example_table"."id" > $1 ORDER BY "example_table"."id" ASC LIMIT $2  [["id", 13539069], ["LIMIT", 1000]]
ExampleTable Pluck (1.9ms)  SELECT "example_table"."id" FROM "example_table" WHERE (example_table.created_at < '2022-04-23 19:49:08.020992') AND "example_table"."id" > $1 ORDER BY "example_table"."id" ASC LIMIT $2  [["id", 13540069], ["LIMIT", 1000]]
ExampleTable Pluck (5001.9ms)  SELECT "example_table"."id" FROM "example_table" WHERE (example_table.created_at < '2022-04-23 19:49:08.020992') AND "example_table"."id" > $1 ORDER BY "example_table"."id" ASC LIMIT $2  [["id", 13541069], ["LIMIT", 1000]]

在对最后两个进行了解释后，我有了以下计划：

对于只需要几毫秒的查询：

Limit  (cost=0.09..4343.25 rows=1000 width=8) (actual time=0.022..0.496 rows=1000 loops=1)
Output: id
Buffers: shared hit=318
->  Index Scan using example_table_pkey on public.example_table  (cost=0.09..363301.29 rows=83649 width=8) (actual time=0.021..0.430 rows=1000 loops=1)
Output: id
Index Cond: (example_table.id > 13540069)
Filter: (example_table.created_at < '2022-04-23 19:49:08.020992'::timestamp without time zone)
Buffers: shared hit=318
Planning time: 0.073 ms
Execution time: 0.545 ms

对于超时的：

Limit  (cost=0.09..4343.93 rows=1000 width=8) (actual time=0.015..2702.674 rows=941 loops=1)
Output: id
Buffers: shared hit=2794464
->  Index Scan using example_table_pkey on public.example_table  (cost=0.09..363297.11 rows=83635 width=8) (actual time=0.015..2702.614 rows=941 loops=1)
Output: id
Index Cond: (example_table.id > 13541069)
Filter: (example_table.created_at < '2022-04-23 19:49:08.020992'::timestamp without time zone)
Rows Removed by Filter: 6577891
Buffers: shared hit=2794464
Planning time: 0.086 ms
Execution time: 2702.721 ms

这两个计划之间唯一的区别是，需要更多的是删除很多行，但我不明白它们为什么不同。。。与另一个相比，最后一个也有巨大的份额冲击。我已经测试过删除created_at条件，它确实有效，所以它一定是带有where子句的东西。

是什么原因导致了一页上的峰值？研究生不喜欢在一个不是PK的条件下滚动记录吗？

其中一部分很容易解释。最终，它用完了过期的记录，当它用完时，它会读取整个表(通过索引(，徒劳地寻找更多。最后一批只找到了941个，因为这就是全部，但它通读了6577891个，寻找另外59个，却没有找到。

显然，最好在最后一批中使用created_at上的索引，因为它会知道扫描何时无效并停止。但它没有这样做，因为它不知道哪一批是最后一批，因为它认为还有83635行要找，而不是941行。统计数据只是偶尔重新计算，在这种情况下，这还不够频繁。如果您可以在每个批次之间分析表，这可能会解决问题，但如此频繁的分析本身就是一项高成本。

它喜欢在"；id"；这样它就可以遵守ORDERBY，而不必读取和排序预期的83635行。但为什么ORDER BY甚至在那里？您显示的代码并没有明显暗示这一点——也许activerecord添加它只是为了降低死锁的风险？

为什么这会被超时？如果它花了几秒钟的时间运行，但成功了，而不是超时，你可能永远不会注意到。我认为没有理由对后台管家的工作实行严格的暂停。也许你可以安排将其从超时中免除。为什么要小批量生产？如果你根本没有极限，你可能就不会有这个问题。

相关内容

最新更新

热门标签：