如何在与最新记录连接的模型上查询Django ORM



嗨,我有一个Django项目,有以下模型

class Organization(models.Model):
name = models.CharField("Name", max_length=128, unique=True)
description = models.CharField("Description", max_length=256)
class Scholar(models.Model):
organization = models.ForeignKey(Organization, on_delete=models.CASCADE)
name = models.CharField("Name", max_length=128)
title = models.CharField("Title", max_length=256)

def get_latest_snapshot(self):
return self.snapshotscholar_set.latest('date_crawled')
class SnapshotScholar(models.Model):
scholar = models.ForeignKey(Scholar, on_delete=models.CASCADE)
date_crawled = models.DateTimeField("Date Crawled", auto_now_add=True, db_index=True)
title = models.CharField("Title", max_length=256)
class SnapshotScholarPublication(models.Model):
snapshot_scholar = models.ForeignKey(SnapshotScholar, on_delete=models.CASCADE)
title = models.CharField("Title", max_length=256)
citation_count = models.IntegerField()

现在,当我想统计每个组织的学者数量时,我可以做Organization.objects.annotate(num_scholars=Count('scholar'))。但是,当我只想统计最新快照的发布时,如何获得每个组织的发布数量。也就是说,我希望它按SnapshotScholar.date_crawled排序,并且我希望数据库中最新SnapshotSch学者的所有出版物。

通过这里的一些问题,我设法创建了这个SQL-

SELECT COUNT(pub.id) as publications, org.id
FROM (main_snapshotscholarpublication pub, main_snapshotscholar snap, main_scholar scholar, main_organization org)
INNER JOIN (
SELECT MAX(main_snapshotscholar.date_crawled) as latest_date, main_snapshotscholar.scholar_id as 'id'
FROM main_snapshotscholar
GROUP BY main_snapshotscholar.scholar_id
) as latest_snap ON (latest_snap.id = snap.id)
WHERE pub.snapshot_scholar_id = snap.id
AND snap.scholar_id = scholar.id
AND scholar.organization_id = org.id
GROUP BY org.id

我用这个原始SQL得到的结果是,与实际计数相比,误差幅度为1-5%。有人能帮我想办法得到正确的结果吗?

感谢

编辑:根据其中一位回答问题的人的要求,这里有一些样本数据-https://pastebin.com/4ZJkymeb。只需用python manage.py loaddata data.json加载数据

您可以使用Subquery来获取结果。例如:

from django.db.models import OuterRef, Subquery, Q, F, Count
subquery = SnapshotScholar.objects.filter(scholar=OuterRef('pk')).order_by('-date_crawled')
queryset = Scholar.objects.annotate(max_scholar_snapshot=Subquery(subquery.values('pk')[:1])).annotate(publication_count=Count('snapshotscholar__snapshotscholarpublication', filter=Q(snapshotscholar=F('max_scholar_snapshot'))))
queryset.values('organization', 'pk', 'publication_count')

在这里,我首先用Subquery的queryset来注释SnapshotScholar信息。我使用date_crawledSnapshotScholar进行排序,然后将最新的id附加到查询集。然后,我在过滤上一步中找到的SnapshotScholar的值的基础上,为SnapshotScholarPublication运行Count

根据@ruddra的答案,试试这个:

snapshot_subquery = SnapshotScholar.objects.filter(
scholar__organization=OuterRef('pk'),
).order_by('-date_crawled')
organizations = Organization.objects.annotate(
latest_snapshot=Subquery(snapshot_subquery.values('pk')[:1]),
).annotate(
latest_snapshot_publication_count=Count(
'scholar__snapshotscholar__snapshotscholarpublication', 
filter=Q(scholar__snapshotscholar=F('latest_snapshot'))
),
)

重要的是,首先按date_crawled降序排列快照学者,然后按limiting the subquery rows to one row排序。。[Django-doc]使用切片表示法。

我会更改您的模型,在组织模型中创建一个属性以获得它:

class Organization(models.Model):
name = models.CharField("Name", max_length=128, unique=True)
description = models.CharField("Description", max_length=256)
@property
def last_publications_number(self):
total = 0
for s in self.scholar_set.all():
total += s.get_latest_snapshot_publication_number
return total
class Scholar(models.Model):
organization = models.ForeignKey(Organization, on_delete=models.CASCADE)
name = models.CharField("Name", max_length=128)
title = models.CharField("Title", max_length=256)
@property
def get_latest_snapshot_publication_number(self):
return self.snapshotscholar_set.order_by('-date_crawled').first().publication_number
class SnapshotScholar(models.Model):
scholar = models.ForeignKey(Scholar, on_delete=models.CASCADE)
date_crawled = models.DateTimeField("Date Crawled", auto_now_add=True, db_index=True)
title = models.CharField("Title", max_length=256)
@property
def publication_number(self):
return snapshotscholarpublication_set.count()
class SnapshotScholarPublication(models.Model):
snapshot_scholar = models.ForeignKey(SnapshotScholar, on_delete=models.CASCADE)
title = models.ForeignKey("Title", max_length=256)
citation_count = models.IntegerField()

因此,您可以通过以下途径访问:

publication_numbers = Organization.objects.get(<whatever>).last_publications_number

相关内容

  • 没有找到相关文章

最新更新