DJANGO数据库访问优化:有效地建立了多对多关系(现有对象之间)

我正在使用django 2.2和postgresql数据库。

我有两种模型：Gene和Annotation，需要创建和链接(多次 - sT_S>千万的基因和注释。

class Gene(models.Model):
    identifier = models.CharField(max_length=50, primary_key=True)
    annotation = models.ManyToManyField(Annotation)

class Annotation(models.Model):
    name = models.CharField(max_length=120, unique=True, primary_key=True)

我已经找到了一种非常有效地创建对象的方法：

Gene.objects.bulk_create([Gene(identifier=identifier) for identifier in gene_id_set])

这是我的Django-Docs启发的创建关系的方式：

relationships = {
    'gene1': ['anno1', 'anno2'],
    'gene2': ['anno3'],
    ...
}
for gene in relationships:
    gene = Annotation.objects.get(pk='gene1')
    gene.annotation_set.set([Annotation.objects.get(pk=anno) for anno in relationships[gene])

这很笨拙：它击中了数据库4次！使用Django-Built-in-tools或RAW SQL查询是否没有更好的方法？

多到许多表(myapp_gene_annotation(看起来像这样：

id gene_id   annotation_id
1  gene1       anno1
2  gene1       anno2
3  gene2       anno3
...

现在我们可以创建Gene_annotation对象：隐式模型Django已为ManyToMany表构建，例如：

through_model = Gene.annotation.through
objs = [
    through_model(gene_id=gene_id, annotation_id=anno_id)
    for gene_id, rels in relationships.items()
    for anno_id in rels
]

现在我们可以在through_model的表中执行批量插入：

through_model.objects.bulk_create(objs)

当然，您应该仅添加关系您添加了 Gene s和 Annotation s，因为否则，数据库侧的外键约束将引起错误。

我们将在这里插入所有关系时间。如果表格很大，这可能会导致多个查询，但仍然比每一个关系查询一次更有效。

相关内容

最新更新

热门标签：