我运行发出与Rabbitmq的Celery worker连接在Gevent模式下遇到管道破裂错误。当芹菜工人在进程池模式下工作时(没有gevent没有monkey patch)没有问题。
之后,Celery worker将不再从Rabbitmq获得任务消息,直到它们重新启动。
当Celery worker使用task消息的速度比Django应用程序产生消息的速度慢时,这个问题就会发生,并且在Rabbitmq中堆积了大约3000个消息。
Gevent version 1.1.0
芹菜版本3.1.22
======芹菜日志======
[2016-08-08 13:52:06,913: CRITICAL/MainProcess] Couldn't ack 293, reason:error(32, 'Broken pipe')
Traceback (most recent call last):
File "/usr/local/lib/python2.7/site-packages/kombu/message.py", line 93, in ack_log_error
self.ack()
File "/usr/local/lib/python2.7/site-packages/kombu/message.py", line 88, in ack
self.channel.basic_ack(self.delivery_tag)
File "/usr/local/lib/python2.7/site-packages/amqp/channel.py", line 1584, in basic_ack
self._send_method((60, 80), args)
File "/usr/local/lib/python2.7/site-packages/amqp/abstract_channel.py", line 56, in _send_method
self.channel_id, method_sig, args, content,
File "/usr/local/lib/python2.7/site-packages/amqp/method_framing.py", line 221, in write_method
write_frame(1, channel, payload)
File "/usr/local/lib/python2.7/site-packages/amqp/transport.py", line 182, in write_frame
frame_type, channel, size, payload, 0xce,
File "/usr/local/lib/python2.7/site-packages/gevent/_socket2.py", line 412, in sendall
timeleft = self.__send_chunk(chunk, flags, timeleft, end)
File "/usr/local/lib/python2.7/site-packages/gevent/_socket2.py", line 351, in __send_chunk
data_sent += self.send(chunk, flags)
File "/usr/local/lib/python2.7/site-packages/gevent/_socket2.py", line 320, in send
return sock.send(data, flags)
error: [Errno 32] Broken pipe
======= Rabbitmq log ==================
=ERROR REPORT==== 8-Aug-2016::14:28:33 ===
closing AMQP connection <0.15928.4> (10.26.39.183:60732 -> 10.26.39.183:5672):
{writer,send_failed,{error,enotconn}}
=ERROR REPORT==== 8-Aug-2016::14:29:03 ===
closing AMQP connection <0.15981.4> (10.26.39.183:60736 -> 10.26.39.183:5672):
{writer,send_failed,{error,enotconn}}
=ERROR REPORT==== 8-Aug-2016::14:29:03 ===
closing AMQP connection <0.15955.4> (10.26.39.183:60734 -> 10.26.39.183:5672):
{writer,send_failed,{error,enotconn}}
类似的问题出现在Celery worker使用eventlet时。
[2016-08-09 17:41:37,952: CRITICAL/MainProcess] Couldn't ack 583, reason:error(32, 'Broken pipe')
Traceback (most recent call last):
File "/usr/local/lib/python2.7/site-packages/kombu/message.py", line 93, in ack_log_error
self.ack()
File "/usr/local/lib/python2.7/site-packages/kombu/message.py", line 88, in ack
self.channel.basic_ack(self.delivery_tag)
File "/usr/local/lib/python2.7/site-packages/amqp/channel.py", line 1584, in basic_ack
self._send_method((60, 80), args)
File "/usr/local/lib/python2.7/site-packages/amqp/abstract_channel.py", line 56, in _send_method
self.channel_id, method_sig, args, content,
File "/usr/local/lib/python2.7/site-packages/amqp/method_framing.py", line 221, in write_method
write_frame(1, channel, payload)
File "/usr/local/lib/python2.7/site-packages/amqp/transport.py", line 182, in write_frame
frame_type, channel, size, payload, 0xce,
File "/usr/local/lib/python2.7/site-packages/eventlet/greenio/base.py", line 385, in sendall
tail = self.send(data, flags)
File "/usr/local/lib/python2.7/site-packages/eventlet/greenio/base.py", line 379, in send
return self._send_loop(self.fd.send, data, flags)
File "/usr/local/lib/python2.7/site-packages/eventlet/greenio/base.py", line 366, in _send_loop
return send_method(data, *args)
error: [Errno 32] Broken pipe
添加设置和负载测试信息
我们用以下选项来启动芹菜
celery worker -A celerytasks.celery_worker_init -Q default -P gevent -c 1000 --loglevel=info
和芹菜使用Rabbitmq作为代理。
通过在上级配置中指定"numprocs=4",我们有4个芹菜工作进程。
我们使用jmeter来模拟web访问负载,Django应用程序将生成任务供Celery worker使用。这些任务基本上需要访问Mysql数据库来获取/更新一些数据。
从rabbitmq web管理页面,任务生成速度是50/秒,而消耗速度是20/秒。经过大约1分钟的负载测试,日志文件显示Rabbitmq和Celery之间的许多连接遇到了break - pipe错误
我们注意到这个问题也是由于高级长计数和高并发性的结合引起的。
我们将并发设置为500
,预取设置为100
,这意味着每个worker的最终预取为500*100= 50,000
。我们有大约100k的任务堆积,由于这个配置,一个工人为自己保留了所有的任务,其他工人甚至没有使用,这个工人一直得到Broken pipe
错误,从不承认任何任务,导致任务从未从队列中清除。
然后我们将预取更改为3
并重新启动所有修复该问题的工作器,在将预取更改为较低的数字后,我们已经看到了0个断开管道错误的实例,因为我们以前经常看到它。