我有一个PostgreSQL DB表user_book_details,包含451007条记录。user_book_details表每天都会填充大约1K条新记录。
我有以下问题,每次都需要很长时间(13小时(才能完成。
update user_book_details as A1 set min_date=
(select min(A2.acc_date) as min_date from user_book_details A2 where A2.user_id=A1.user_id
and A2.book_id=A1.book_id) where A1.min_date is null;
如何重写查询以提高性能?仅供参考,user_id和book_id列上没有索引。
您的查询正常:
update user_book_details ubd
set min_date = (select min(ubd2.acc_date)
from user_book_details ubd2
where ubd2.user_id = ubd.user_id and
ubd2.book_id = ubd.book_id
)
where ubd.min_date is null;
为了提高性能,您需要user_book_details(user_id, book_id)
上的索引。我还认为这样写会更快:
update user_book_details ubd
set min_date = min_acc_date
from (select ubd2.user_id, ubd2.book_id, min(ubd2.acc_date) as min_acc_date
from user_book_details ubd2
group by ubd2.user_id, ubd2.book_id
) ubd2
where ubd2.user_id = ubd.user_id and
ubd2.book_id = ubd.book_id and
ubd.min_date is null;
第一种方法使用索引查找每一行的值(更新同一查询时可能会有点复杂(。第二种方法聚合数据,然后加入值。
我应该注意,这个值很容易在飞行中计算:
select ubd.*,
min(acc_date) over (partition by user_id, book_id) as min_acc_date
from user_book_details ubd;
这可能比试图在表中保持它的最新状态更可取。