如何使数据迁移功能更加高效



我正在清理一个旧的PostgreSQL数据库,这样null就不会用于基于字符串的字段。为了做到这一点,我使用了一个Django数据迁移,它在我的本地机器上运行(尽管有点慢(,但迁移在生产中被扼杀了。

所有数据库连接在迁移过程中都是关闭的,所以我猜它会占用大量内存。我已经尝试过在shell中手动执行此操作,但这个过程仍然被扼杀。与增加批量大小相同。

有没有一种更有效的方法来执行这个函数,而不会占用大量资源?

from django.db import models
def set_null_char_field_to_empty_string(apps, schema_editor):
model_classes = [
apps.get_model('order', 'Order'),
]
for model in model_classes:
char_fields = [f.name for f in model._meta.get_fields()
if isinstance(f, models.CharField)]
if char_fields:
filter_args = models.Q()
for field in char_fields:
filter_args |= models.Q(**{f'{field}__isnull': True})
objs = model.objects.filter(filter_args)
objs_to_update = []
for obj in objs:
for field in char_fields:
setattr(obj, field, getattr(obj, field) or '')
objs_to_update.append(obj)
model.objects.bulk_update(
objs_to_update,
fields=char_fields,
batch_size=500
)

您应该能够完全在数据库上完成这项工作,而无需获取任何数据。只需一次获取一个CharField,然后构造一个查询来更新现有的值。我还没有测试过,但类似于:

from django.db import models
def set_null_char_field_to_empty_string(apps, schema_editor):
model_classes = [
apps.get_model('order', 'Order'),
]
for model in model_classes:
char_fields = [f.name for f in model._meta.get_fields()
if isinstance(f, models.CharField)]
for field in char_fields:
model.objects.filter(**{f'{field}__isnull': True})
.update(**{field: ''})

我所知道的避免在内存中存储更新对象的唯一方法是使用select_for_update((,尽管我不知道它在处理大量数据时的行为。

def set_null_char_field_to_empty_string(apps, schema_editor):
model_classes = [
apps.get_model('order', 'Order'),
]
for model in model_classes:
char_fields = [f.name for f in model._meta.get_fields() if isinstance(f, models.CharField)]
if char_fields:
filter_args = models.Q()
for field in char_fields:
filter_args |= models.Q(**{f'{field}__isnull': True})
objs = model.objects.select_for_update().filter(filter_args)
with transaction.atomic():
for obj in objs:
for field in char_fields:
setattr(obj, field, getattr(obj, field) or '')

最新更新