我目前正试图删除一个表中大约100多万行(实际上是3000多万行,但由于这方面出现的问题,我已经制作了一个子集(,其中的条件是该行在其他表中不能作为外键有任何引用。。我一次批量删除30000行。
所以查询看起来像:
DELETE FROM table_name tn WHERE tn.id IN (
SELECT tn2.id FROM table_name as tn2
LEFT JOIN table_name_join_1 ON table_name_join_1.table_id = tn2.id
LEFT JOIN table_name_join_2 ON table_name_join_2.table_id = tn2.id
...
LEFT JOIN table_name_join_19 ON table_name_join_19.table_id = tn2.id
WHERE table_name_join_1.table_id IS NULL
AND table_name_join_2.table_id IS NULL
...
AND table_name_join_19.table_id IS NULL
LIMIT 30000 OFFSET x
)
该表在19个不同的表中被引用,因此子查询中有很多左联接,并且在没有LIMIT
&OFFSET
,用于计算将受到影响的行总数。
问题是,当在delete语句中执行查询时,查询只是挂起,但当仅使用COUNT(1)
计数时可以工作。。我不确定是否有更好的方法可以删除表中的许多行。。还是检查引用有问题的表的表,看看一些索引是否在某种程度上是古怪的。
希望有人能帮忙:D看到一个查询工作,然后在用作子查询时直接挂起/失败,这很烦人。
我在Python 2.7.17上使用psycopg2(一种工作方式(。。我还推测了何时从psycopg2连接关闭光标以提高速度。。目前,我在运行delete的循环之外创建光标,并在脚本完成后将其与数据库连接一起关闭。。。上一页。每次提交delete语句后,光标都会关闭,但对我来说似乎有点过分……我不知道?当前循环看起来像:
cursor = conn.cursor()
while count >= offset:
...
delete(cursor, batch_size, offset)
...
offset += batch_size
此外,在执行完每个delete语句后commit()
是个坏主意吗?还是应该等到循环执行完所有delete语句后再提交。。如果是这样,我不应该考虑使用事务吗?
基本上,我希望有人能告诉我,尽管一个无限制的计数和偏移"只"需要60秒,但为什么事情如此缓慢/失败??
DELETE FROM xxx
具有与SELECT COUNT(*) FROM xxx
几乎相同的子语法;因此,为了测试该计划,您可以运行下面的片段,并检查是否获得了索引计划:
EXPLAIN
SELECT COUNT(*)
FROM table_name tn
WHERE NOT EXISTS ( SELECT *
FROM table_name_join_1 x1 WHERE x1.table_id = tn.id
)
--
AND NOT EXISTS ( SELECT *
FROM table_name_join_2 x2 WHERE x2.table_id = tn.id
)
--
AND NOT EXISTS ( SELECT *
FROM table_name_join_3 x3 WHERE x3.table_id = tn.id
)
--
-- et cetera
--
;
创建一些数据,因为很难对伪代码进行基准测试:
SELECT version();
CREATE TABLE table_name
( id serial NOT NULL PRIMARY KEY
, name text
);
INSERT INTO table_name ( name )
SELECT 'Name_' || gs::text
FROM generate_series(1,100000) gs;
--
CREATE TABLE table_name_join_2
( id serial NOT NULL PRIMARY KEY
, table_id INTEGER REFERENCES table_name(id)
, name text
);
INSERT INTO table_name_join_2(table_id,name)
SELECT src.id , 'Name_' || src.id :: text
FROM table_name src
WHERE src.id % 2 = 0
;
--
CREATE TABLE table_name_join_3
( id serial NOT NULL PRIMARY KEY
, table_id INTEGER REFERENCES table_name(id)
, name text
);
INSERT INTO table_name_join_3(table_id,name)
SELECT src.id , 'Name_' || src.id :: text
FROM table_name src
WHERE src.id % 3 = 0
;
--
CREATE TABLE table_name_join_5
( id serial NOT NULL PRIMARY KEY
, table_id INTEGER REFERENCES table_name(id)
, name text
);
INSERT INTO table_name_join_5(table_id,name)
SELECT src.id , 'Name_' || src.id :: text
FROM table_name src
WHERE src.id % 5 = 0
;
--
CREATE TABLE table_name_join_7
( id serial NOT NULL PRIMARY KEY
, table_id INTEGER REFERENCES table_name(id)
, name text
);
INSERT INTO table_name_join_7(table_id,name)
SELECT src.id , 'Name_' || src.id :: text
FROM table_name src
WHERE src.id % 7 = 0
;
--
CREATE TABLE table_name_join_11
( id serial NOT NULL PRIMARY KEY
, table_id INTEGER REFERENCES table_name(id)
, name text
);
INSERT INTO table_name_join_11(table_id,name)
SELECT src.id , 'Name_' || src.id :: text
FROM table_name src
WHERE src.id % 11 = 0
;
现在,运行DELETE查询:
VACUUM ANALYZE table_name;
VACUUM ANALYZE table_name_join_2;
VACUUM ANALYZE table_name_join_3;
VACUUM ANALYZE table_name_join_5;
VACUUM ANALYZE table_name_join_7;
EXPLAIN ANALYZE
DELETE
FROM table_name tn
WHERE 1=1
AND NOT EXISTS ( SELECT * FROM table_name_join_2 x2 WHERE x2.table_id = tn.id)
--
AND NOT EXISTS ( SELECT * FROM table_name_join_3 x3 WHERE x3.table_id = tn.id)
--
AND NOT EXISTS ( SELECT * FROM table_name_join_5 x5 WHERE x5.table_id = tn.id)
--
AND NOT EXISTS ( SELECT * FROM table_name_join_7 x7 WHERE x7.table_id = tn.id)
--
AND NOT EXISTS ( SELECT * FROM table_name_join_11 x11 WHERE x11.table_id = tn.id)
--
-- et cetera
--
;
SELECT count(*) FROM table_name;
现在,完全相同,但FKs上有支持索引:
CREATE INDEX table_name_join_2_2 ON table_name_join_2( table_id);
CREATE INDEX table_name_join_3_3 ON table_name_join_3( table_id);
CREATE INDEX table_name_join_5_5 ON table_name_join_5( table_id);
CREATE INDEX table_name_join_7_7 ON table_name_join_7( table_id);
CREATE INDEX table_name_join_11_11 ON table_name_join_11( table_id);
VACUUM ANALYZE table_name;
VACUUM ANALYZE table_name_join_2;
VACUUM ANALYZE table_name_join_3;
VACUUM ANALYZE table_name_join_5;
VACUUM ANALYZE table_name_join_7;
EXPLAIN ANALYZE
DELETE
FROM table_name tn
WHERE 1=1
...
;
----------
Query plan#1:
----------
DROP SCHEMA
CREATE SCHEMA
SET
version
----------------------------------------------------------------------------------------------------------
PostgreSQL 11.6 on armv7l-unknown-linux-gnueabihf, compiled by gcc (Raspbian 8.3.0-6+rpi1) 8.3.0, 32-bit
(1 row)
CREATE TABLE
INSERT 0 100000
CREATE TABLE
INSERT 0 50000
CREATE TABLE
INSERT 0 33333
CREATE TABLE
INSERT 0 20000
CREATE TABLE
INSERT 0 14285
CREATE TABLE
INSERT 0 9090
SET
SET
VACUUM
VACUUM
VACUUM
VACUUM
VACUUM QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Delete on table_name tn (cost=3969.52..7651.94 rows=11429 width=36) (actual time=812.010..812.011 rows=0 loops=1)
-> Hash Anti Join (cost=3969.52..7651.94 rows=11429 width=36) (actual time=206.775..712.982 rows=20779 loops=1)
Hash Cond: (tn.id = x7.table_id)
-> Hash Anti Join (cost=3557.10..7088.09 rows=13334 width=34) (actual time=183.070..654.030 rows=24242 loops=1)
Hash Cond: (tn.id = x5.table_id)
-> Hash Anti Join (cost=2979.10..6329.25 rows=16667 width=28) (actual time=149.870..578.173 rows=30303 loops=1)
Hash Cond: (tn.id = x3.table_id)
-> Hash Anti Join (cost=2016.11..5124.59 rows=25000 width=22) (actual time=95.589..461.053 rows=45455 loops=1)
Hash Cond: (tn.id = x2.table_id)
-> Merge Anti Join (cost=572.11..3271.21 rows=50000 width=16) (actual time=14.486..261.955 rows=90910 loops=1)
Merge Cond: (tn.id = x11.table_id)
-> Index Scan using table_name_pkey on table_name tn (cost=0.29..2344.99 rows=100000 width=10) (actual time=0.031..118.968 rows=100000 loops=1)
-> Sort (cost=571.82..589.22 rows=6960 width=10) (actual time=14.446..20.365 rows=9090 loops=1)
Sort Key: x11.table_id
Sort Method: quicksort Memory: 612kB
-> Seq Scan on table_name_join_11 x11 (cost=0.00..127.60 rows=6960 width=10) (actual time=0.029..6.939 rows=9090 loops=1)
-> Hash (cost=819.00..819.00 rows=50000 width=10) (actual time=80.439..80.440 rows=50000 loops=1)
Buckets: 65536 Batches: 1 Memory Usage: 2014kB
-> Seq Scan on table_name_join_2 x2 (cost=0.00..819.00 rows=50000 width=10) (actual time=0.019..36.848 rows=50000 loops=1)
-> Hash (cost=546.33..546.33 rows=33333 width=10) (actual time=53.678..53.678 rows=33333 loops=1)
Buckets: 65536 Batches: 1 Memory Usage: 1428kB
-> Seq Scan on table_name_join_3 x3 (cost=0.00..546.33 rows=33333 width=10) (actual time=0.027..24.132 rows=33333 loops=1)
-> Hash (cost=328.00..328.00 rows=20000 width=10) (actual time=32.884..32.885 rows=20000 loops=1)
Buckets: 32768 Batches: 1 Memory Usage: 832kB
-> Seq Scan on table_name_join_5 x5 (cost=0.00..328.00 rows=20000 width=10) (actual time=0.017..15.135 rows=20000 loops=1)
-> Hash (cost=233.85..233.85 rows=14285 width=10) (actual time=23.542..23.542 rows=14285 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 567kB
-> Seq Scan on table_name_join_7 x7 (cost=0.00..233.85 rows=14285 width=10) (actual time=0.016..10.742 rows=14285 loops=1)
Planning Time: 4.470 ms
Trigger for constraint table_name_join_2_table_id_fkey: time=172949.350 calls=20779
Trigger for constraint table_name_join_3_table_id_fkey: time=116772.757 calls=20779
Trigger for constraint table_name_join_5_table_id_fkey: time=71218.348 calls=20779
Trigger for constraint table_name_join_7_table_id_fkey: time=51760.503 calls=20779
Trigger for constraint table_name_join_11_table_id_fkey: time=36120.128 calls=20779
Execution Time: 449783.490 ms
(35 rows)
count
-------
79221
(1 row)
查询计划#2:
SET
CREATE INDEX
CREATE INDEX
CREATE INDEX
CREATE INDEX
CREATE INDEX
SET
VACUUM
VACUUM
VACUUM
VACUUM
VACUUM
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Delete on table_name tn (cost=1.73..6762.95 rows=11429 width=36) (actual time=776.987..776.988 rows=0 loops=1)
-> Merge Anti Join (cost=1.73..6762.95 rows=11429 width=36) (actual time=0.212..676.794 rows=20779 loops=1)
Merge Cond: (tn.id = x7.table_id)
-> Merge Anti Join (cost=1.44..6322.99 rows=13334 width=34) (actual time=0.191..621.986 rows=24242 loops=1)
Merge Cond: (tn.id = x5.table_id)
-> Merge Anti Join (cost=1.16..5706.94 rows=16667 width=28) (actual time=0.172..550.669 rows=30303 loops=1)
Merge Cond: (tn.id = x3.table_id)
-> Merge Anti Join (cost=0.87..4661.02 rows=25000 width=22) (actual time=0.147..438.036 rows=45455 loops=1)
Merge Cond: (tn.id = x2.table_id)
-> Merge Anti Join (cost=0.58..2938.75 rows=50000 width=16) (actual time=0.125..250.082 rows=90910 loops=1)
Merge Cond: (tn.id = x11.table_id)
-> Index Scan using table_name_pkey on table_name tn (cost=0.29..2344.99 rows=100000 width=10) (actual time=0.031..116.630 rows=100000 loops=1)
-> Index Scan using table_name_join_11_11 on table_name_join_11 x11 (cost=0.29..230.14 rows=9090 width=10) (actual time=0.090..11.228 rows=9090 loops=1)
-> Index Scan using table_name_join_2_2 on table_name_join_2 x2 (cost=0.29..1222.29 rows=50000 width=10) (actual time=0.019..59.500 rows=50000 loops=1)
-> Index Scan using table_name_join_3_3 on table_name_join_3 x3 (cost=0.29..816.78 rows=33333 width=10) (actual time=0.022..40.473 rows=33333 loops=1)
-> Index Scan using table_name_join_5_5 on table_name_join_5 x5 (cost=0.29..491.09 rows=20000 width=10) (actual time=0.016..23.105 rows=20000 loops=1)
-> Index Scan using table_name_join_7_7 on table_name_join_7 x7 (cost=0.29..351.86 rows=14285 width=10) (actual time=0.017..16.903 rows=14285 loops=1)
Planning Time: 4.737 ms
Trigger for constraint table_name_join_2_table_id_fkey: time=1114.497 calls=20779
Trigger for constraint table_name_join_3_table_id_fkey: time=1096.065 calls=20779
Trigger for constraint table_name_join_5_table_id_fkey: time=1094.951 calls=20779
Trigger for constraint table_name_join_7_table_id_fkey: time=1090.509 calls=20779
Trigger for constraint table_name_join_11_table_id_fkey: time=1173.987 calls=20779
Execution Time: 6426.626 ms
(24 rows)
count
-------
79221
(1 row)
因此,查询速度从450秒提高到7秒。大部分时间似乎都花在检查FK约束上,在基表中实际删除之后。[这些约束在Postgres中被实现为不可见的触发器]
汇总表:
query type | indexes on all 5 FKs | workmem | total time(ms) | time for triggers
----------------+-----------------------+---------------+-----------------------+-------------------
NOT EXISTS() | No | 4Mb | 449783.490 | 448821.083
NOT EXISTS() | Yes | 4Mb | 6426.626 | 5570.009
NOT EXISTS() | Yes | 64Kb | 6405.273 | 5545.352
NOT IN() | No | 4Mb | 449435.530 | 448829.179
NOT IN() | Yes | 4Mb | 6113.690 | 5443.505
NOT IN() | Yes | 64Kb | 8595341.467 | 5545.796
结论:是否需要外键索引由您决定。