使用在 Django ListView 中具有 isnull=False 和 order_by 的外键求解慢速查询

我有一个Django ListView，它允许通过"活跃"的人进行分页。

(简化的)模型：

class Person(models.Model):
name = models.CharField()
# ...
active_schedule = models.ForeignKey('Schedule', related_name='+', null=True, on_delete=models.SET_NULL)
class Schedule(models.Model):
field = models.PositiveIntegerField(default=0)
# ...
person = models.ForeignKey(Person, related_name='schedules', on_delete=models.CASCADE)

"人员"表包含近 700.000 行，"计划"表包含超过 2.000.000 行(平均每个"人员"都有 2-3 条计划记录，尽管许多记录没有记录，而很多记录更多)。对于"活跃"的人，设置了active_schedule外键，其中任何时候都有大约 5.000

。ListView 应该显示所有活动人员，按field按计划排序(以及其他一些条件，对于这种情况似乎无关紧要)。

然后，查询将变为：

Person.objects
.filter(active_schedule__isnull=False)
.select_related('active_schedule')
.order_by('active_schedule__field')

具体来说，相关字段上的order_by使此查询非常慢(即：大约需要一秒钟，这对于 Web 应用程序来说太慢了)。

我希望filter条件会选择 5000 条记录，然后变得相对容易排序。但是当我对这个查询运行解释时，它显示(Postgres)数据库弄乱了更多的行：

Gather Merge  (cost=224316.51..290280.48 rows=565366 width=227)
Workers Planned: 2
->  Sort  (cost=223316.49..224023.19 rows=282683 width=227)
Sort Key: exampledb_schedule.field
->  Parallel Hash Join  (cost=89795.12..135883.20 rows=282683 width=227)
Hash Cond: (exampledb_person.active_schedule_id = exampledb_schedule.id)
->  Parallel Seq Scan on exampledb_person  (cost=0.00..21263.03 rows=282683 width=161)
Filter: (active_schedule_id IS NOT NULL)
->  Parallel Hash  (cost=67411.27..67411.27 rows=924228 width=66)
->  Parallel Seq Scan on exampledb_schedule  (cost=0.00..67411.27 rows=924228 width=66)

我最近将模型更改为这种方式。在以前的版本中，我有一个模型，其中只有~5.000个活跃的人。在这张小桌子上做order_by要快得多！我希望与当前型号达到相同的速度。

我尝试只检索列表视图所需的字段(使用values)，这确实有帮助，但帮助不大。我还尝试将related_name设置为active_schedule并从 Schedule 解决问题，但这没有区别。我试着在Schedule.field上放一个db_index，但这似乎只会让事情变慢。条件查询也没有帮助(尽管我可能没有尝试所有的可能性)。我不知所措。

ORM 查询生成的 SQL 语句：

SELECT 
"exampledb_person"."id", 
"exampledb_person"."name", 
...
"exampledb_person"."active_schedule_id", 
"exampledb_person"."created", 
"exampledb_person"."updated", 
"exampledb_schedule"."id", 
"exampledb_schedule"."person_id", 
"exampledb_schedule"."field", 
...
"exampledb_schedule"."created", 
"exampledb_schedule"."updated" 
FROM 
"exampledb_person" 
INNER JOIN 
"exampledb_schedule" 
ON ("exampledb_person"."active_schedule_id" = "exampledb_schedule"."id") 
WHERE 
"exampledb_person"."active_schedule_id" IS NOT NULL 
ORDER BY 
"exampledb_schedule"."field" ASC

(为简单起见，省略了某些字段。

是否可以加快此查询速度，或者我应该恢复为活动人员使用特殊模型？

编辑：当我更改查询时，仅用于比较/测试，以对Person上的UNindexed字段进行排序，查询同样显示。但是，如果我随后向该字段添加索引，查询速度很快！我不得不尝试这个，因为 SQL 语句确实表明它在"exampledb_schedule"."field"上排序 - 一个没有索引的字段，但就像我说的：在字段上添加索引没有区别。

编辑：我想还值得注意的是，当直接在 Schedule 上尝试更简单的排序查询时，无论是在索引字段上还是在索引字段上，它都会快得多。例如，对于此测试，我向Schedule.field中添加了一个索引，然后以下查询非常快：

Schedule.objects.order_by('field')

这里的某个地方有解决方案...

@guarav的评论和我的编辑为我指明了解决方案的方向，它盯着我的脸看了一会儿......

我的问题中的过滤器子句 -filter(active_schedule__isnull=False)- 似乎使数据库索引无效。我没有意识到这一点，并希望数据库专家能指出我这个方向。

解决方案是过滤Schedule.field，对于非活动人员记录为 0，对于活动人员记录为>0：

Person.objects
.select_related('active_schedule')
.filter(active_schedule__field__gte=1)
.order_by('active_schedule__field')

此查询正确使用索引并且速度很快(20 毫秒而不是 ~1000 毫秒)。

相关内容

最新更新

热门标签：