我有一个包含 3 个 percona xtradb 5.5.34-55 服务器的集群,由于它们都是可写的,因此在任何重大负载下都会出现死锁错误。增加变量wsrep_retry_autocommit
在一定程度上有所帮助,但ER_LOCK_DEADLOCK
并没有完全消失。所以我尝试将wsrep_retry_autocommit
设置为 10000(似乎是最大值(,认为这会让一些查询非常慢,但没有一个会失败ER_LOCK_DEADLOCK
:
mysql-shm -ss -e 'show global variables like "%wsrep_retry_auto%"'
wsrep_retry_autocommit 10000
------------------------
LATEST DETECTED DEADLOCK
------------------------
140414 10:29:23
*** (1) TRANSACTION:
TRANSACTION 72D8, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 376, 1 row lock(s), undo log entries 1
MySQL thread id 34, OS thread handle 0x7f11840d4700, query id 982 localhost shm update
REPLACE INTO metric(host, name, userid, sampleid, type, priority) VALUES
('localhost','cpu-3/cpu-nice',8,0,0,0),('localhost','cpu-3/cpu-system',8,0,0,0),
('localhost','cpu-3/cpu-idle',8,0,0,0),('localhost','cpu-3/cpu-wait',8,0,0,0),
('localhost','cpu-3/cpu-interrupt',8,0,0,0),('localhost','cpu-3/cpu-softirq',8,0,0,0),
('localhost','cpu-3/cpu-steal',8,0,0,0),('localhost','cpu-4/cpu-user',8,0,0,0),
('localhost','cpu-4/cpu-nice',8,0,0,0),('localhost','cpu-4/cpu-system',8,0,0,0),
('localhost','cpu-4/cpu-idle',8,0,0,0),('localhost','cpu-4/cpu-wait',8,0,0,0),
('localhost','cpu-4/cpu-interrupt',8,0,0,0),('localhost','cpu-4/cpu-softirq',8,0,0,0),
('localhost','cpu-4/cpu-steal',8,0,0,0)
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 344 n bits 488 index `unique-metric` of
table `shm`.`metric` trx id 72D8 lock_mode X waiting
*** (2) TRANSACTION:
TRANSACTION 72D7, ACTIVE 0 sec updating or deleting
mysql tables in use 1, locked 1
7 lock struct(s), heap size 3112, 141 row lock(s), undo log entries 40
MySQL thread id 50, OS thread handle 0x7f1184115700, query id 980 localhost shm update
REPLACE INTO metric(host, name, userid, sampleid, type, priority) VALUES
('localhost','cpu-3/cpu-nice',8,0,0,0),('localhost','cpu-3/cpu-system',8,0,0,0),
('localhost','cpu-3/cpu-idle',8,0,0,0),('localhost','cpu-3/cpu-wait',8,0,0,0),
('localhost','cpu-3/cpu-interrupt',8,0,0,0),('localhost','cpu-3/cpu-softirq',8,0,0,0),
('localhost','cpu-3/cpu-steal',8,0,0,0),('localhost','cpu-4/cpu-user',8,0,0,0),
('localhost','cpu-4/cpu-nice',8,0,0,0),('localhost','cpu-4/cpu-system',8,0,0,0),
('localhost','cpu-4/cpu-idle',8,0,0,0),('localhost','cpu-4/cpu-wait',8,0,0,0),
('localhost','cpu-4/cpu-interrupt',8,0,0,0),('localhost','cpu-4/cpu-softirq',8,0,0,0),
('localhost','cpu-4/cpu-steal',8,0,0,0),('localhost','cpu-3/cpu-nice',8,0,0,0),
('localhost','cpu-3/cpu-system',8,0,0,0),('localhost','cpu-3/cpu-idle',8,0,0,0),
('localhost','cpu-3/cpu-wait',8,0,0,0),('localhost','cpu-3/cpu-interrupt',8,0,0,0),
('localhost','cpu-3/cpu-softirq',8,0,0,0),('localhost'
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 0 page no 344 n bits 488 index `unique-metric` of table
`shm`.`metric` trx id 72D7 lock_mode X
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 344 n bits 504 index `unique-metric` of table
`shm`.`metric` trx id 72D7 lock_mode X locks gap before rec insert intention waiting
*** WE ROLL BACK TRANSACTION (1)
难道不应该重试吗?有没有办法验证percona是否真的重试了查询10000次?
对您的问题没有确切的答案,但是对于任何写入密集型负载(如果您尝试插入与该死的Drupal相同的数据(,则会发生死锁,对我来说唯一的解决方案(仍在等待确认这是100%确定的解决方案(- 是在Galera节点前面使用haproxy,并定义要使用的第一个节点(haproxy后端定义(, 和其他 2 个节点用作备份。
这样,所有 mysql 流量都将从客户端通过 haproxy 流向单个 galera 节点,如果该节点发生故障,将使用其他节点。
希望有帮助...安德里亚
在您的答案中,可扩展性是一个问题,因为我们在集群中,但只使用一个节点确实是资源的糟糕使用。所以替代方案是,您可以使用任何负载均衡器,如果它的haproxy您可以在两个端口上创建2个侦听器,例如3306和3305; 然后说
Lister bind to 3306 获取来自应用程序的所有写入请求,其后端将有节点 1 然后节点 2 和节点 3 作为备份;Lister 绑定到 3305 将具有来自应用程序的所有读取请求,其后端将具有指定为正常的所有节点。因此,它的读取可扩展和写入具有有限的可扩展性,其中死锁可以减少到非常扩展。