Postgres死锁带有(选择用于共享+插入)和(选择用于更新+更新)



我有下表(所有示例都使用psycopg2python库(:

CREATE TABLE IF NOT EXISTS TeamTasks
(
id              SERIAL PRIMARY KEY,
round           INTEGER,
task_id         INTEGER,
team_id         INTEGER,
score           FLOAT         DEFAULT 0,
UNIQUE (round, task_id, team_id)
)

我有两个功能:

  1. 从上一轮中获取TeamTasks并将其复制到新一轮,SELECTINSERT之间没有更新,该功能实现如下:
query = """
WITH prev_table AS (
SELECT score FROM teamtasks 
WHERE task_id = %(task_id)s AND team_id = %(team_id)s AND round <= %(round)s - 1
ORDER BY round DESC LIMIT 1 
FOR SHARE
)
INSERT INTO TeamTasks (task_id, team_id, round, score) 
SELECT %(task_id)s, %(team_id)s, %(round)s, score
FROM prev_table;
"""
with aux.db_cursor() as (conn, curs):    
for team_id in range(team_count):
for task_id in range(task_count): 
curs.execute(
query, 
{
'task_id': task_id,
'team_id': team_id,
'round': cur_round + 1,
},
)
conn.commit()

aux.db_cursor只是获取psycopg2连接和游标的一个方便的包装器。

  1. 更新TeamTasks中特定团队对、特定任务和多轮的行。它是这样实现的:
# I have team1_id, team2_id and task_id
query1 = "SELECT score from teamtasks WHERE team_id=%s AND task_id=%s AND round=%s FOR NO KEY UPDATE"
query2 = "UPDATE teamtasks SET score = %s WHERE team_id=%s AND task_id=%s AND round >= %s"
with aux.db_cursor() as (conn, curs):
curs.execute(query1, (team1_id, task_id, cur_round))
score1, = curs.fetchone()
curs.execute(query1, (team2_id, task_id, cur_round))
score2, = curs.fetchone()
sleep(0.1)  # Here happens something time-consuming
curs.execute(query2, (score1 + 0.1, team1_id, task_id, cur_round))
curs.execute(query2, (score2 - 0.1, team2_id, task_id, cur_round))
conn.commit()

我可以保证,在第二个函数中,每个团队只能是一个更新的主体,因此所有同时更新的团队总是不同的。

此外,第一个函数很少运行,除了这两个函数之外,没有其他人更新此表,因此第一个函数中的锁正是为了在TeamTasks复制期间不会更改表。

在上述环境中,我遇到了很多死锁,如下所示:

postgres_1  | 2019-11-17 20:43:08.510 UTC [49] ERROR:  deadlock detected
postgres_1  | 2019-11-17 20:43:08.510 UTC [49] DETAIL:  Process 49 waits for ShareLock on transaction 685; blocked by process 65.
postgres_1  |   Process 65 waits for ShareLock on transaction 658; blocked by process 49.
postgres_1  |   Process 49:
postgres_1  |           WITH prev_table AS (
postgres_1  |               SELECT score FROM teamtasks
postgres_1  |               WHERE task_id = 8 AND team_id = 6 AND round <= 1 - 1
postgres_1  |               ORDER BY round DESC LIMIT 1
postgres_1  |               FOR SHARE
postgres_1  |           )
postgres_1  |           INSERT INTO TeamTasks (task_id, team_id, round, score)
postgres_1  |           SELECT 8, 6, 1, score
postgres_1  |           FROM prev_table;
postgres_1  |
postgres_1  |   Process 65: SELECT score from teamtasks WHERE team_id=0 AND task_id=8 AND round=0 FOR NO KEY UPDATE
postgres_1  | 2019-11-17 20:43:08.510 UTC [49] HINT:  See server log for query details.
postgres_1  | 2019-11-17 20:43:08.510 UTC [49] CONTEXT:  while locking tuple (0,69) in relation "teamtasks"
postgres_1  | 2019-11-17 20:43:08.510 UTC [49] STATEMENT:
postgres_1  |           WITH prev_table AS (
postgres_1  |               SELECT score FROM teamtasks
postgres_1  |               WHERE task_id = 8 AND team_id = 6 AND round <= 1 - 1
postgres_1  |               ORDER BY round DESC LIMIT 1
postgres_1  |               FOR SHARE
postgres_1  |           )
postgres_1  |           INSERT INTO TeamTasks (task_id, team_id, round, score)
postgres_1  |           SELECT 8, 6, 1, score
postgres_1  |           FROM prev_table;

如何修复这些死锁?有没有一个我看不到的巧妙的解决方案?

select for share在这里似乎没有必要。这种语法是为了保持引用的完整性。在您的情况下,您从同一个teamtasks表中进行选择并插入到该表中,因此您不必要地在表上持有锁,这会导致您的两个连接相互阻塞(最终,重构代码使您只使用一个连接会很好,但我不知道这对您来说有多可行(。据我所知,select for share语法更多地与更新和对其他表的引用完整性有关,而不是与对同一表的插入有关。

问题在于,在第一个aux_db_cursor()调用中,当您循环通过range(team_count)range(task_count)时,您要将FOR SHARE锁带到teamtasks中的几行——然后在第二个aux_db_cursor()调用中,在对某些行执行UPDATE之前,您要执行一项耗时的任务——这些UPDATE锁请求将与这些FOR SHARE锁发生冲突。我会去掉FOR SHARE锁,除非你真的需要它们(在这一点上,如果可能的话,我会寻找将所有锁合并到一个DB连接中的方法(。

披露:我在EnterpriseDB(EDB(工作

最新更新