在gevent或eventlet模式下,Celery worker与Rabbitmq的连接遇到了断管错误



我运行发出与Rabbitmq的Celery worker连接在Gevent模式下遇到管道破裂错误。当芹菜工人在进程池模式下工作时(没有gevent没有monkey patch)没有问题。

之后,Celery worker将不再从Rabbitmq获得任务消息,直到它们重新启动。

当Celery worker使用task消息的速度比Django应用程序产生消息的速度慢时,这个问题就会发生,并且在Rabbitmq中堆积了大约3000个消息。

Gevent version 1.1.0

芹菜版本3.1.22

======芹菜日志======

[2016-08-08 13:52:06,913: CRITICAL/MainProcess] Couldn't ack 293, reason:error(32, 'Broken pipe')
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/site-packages/kombu/message.py", line 93, in ack_log_error
    self.ack()
  File "/usr/local/lib/python2.7/site-packages/kombu/message.py", line 88, in ack
    self.channel.basic_ack(self.delivery_tag)
  File "/usr/local/lib/python2.7/site-packages/amqp/channel.py", line 1584, in basic_ack
    self._send_method((60, 80), args)
  File "/usr/local/lib/python2.7/site-packages/amqp/abstract_channel.py", line 56, in _send_method
    self.channel_id, method_sig, args, content,
  File "/usr/local/lib/python2.7/site-packages/amqp/method_framing.py", line 221, in write_method
    write_frame(1, channel, payload)
  File "/usr/local/lib/python2.7/site-packages/amqp/transport.py", line 182, in write_frame
    frame_type, channel, size, payload, 0xce,
  File "/usr/local/lib/python2.7/site-packages/gevent/_socket2.py", line 412, in sendall
    timeleft = self.__send_chunk(chunk, flags, timeleft, end)
  File "/usr/local/lib/python2.7/site-packages/gevent/_socket2.py", line 351, in __send_chunk
    data_sent += self.send(chunk, flags)
  File "/usr/local/lib/python2.7/site-packages/gevent/_socket2.py", line 320, in send
    return sock.send(data, flags)
error: [Errno 32] Broken pipe

======= Rabbitmq log ==================

=ERROR REPORT==== 8-Aug-2016::14:28:33 ===
closing AMQP connection <0.15928.4> (10.26.39.183:60732 -> 10.26.39.183:5672):
{writer,send_failed,{error,enotconn}}
=ERROR REPORT==== 8-Aug-2016::14:29:03 ===
closing AMQP connection <0.15981.4> (10.26.39.183:60736 -> 10.26.39.183:5672):
{writer,send_failed,{error,enotconn}}
=ERROR REPORT==== 8-Aug-2016::14:29:03 ===
closing AMQP connection <0.15955.4> (10.26.39.183:60734 -> 10.26.39.183:5672):
{writer,send_failed,{error,enotconn}}

类似的问题出现在Celery worker使用eventlet时。

[2016-08-09 17:41:37,952: CRITICAL/MainProcess] Couldn't ack 583, reason:error(32, 'Broken pipe')
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/site-packages/kombu/message.py", line 93, in ack_log_error
    self.ack()
  File "/usr/local/lib/python2.7/site-packages/kombu/message.py", line 88, in ack
    self.channel.basic_ack(self.delivery_tag)
  File "/usr/local/lib/python2.7/site-packages/amqp/channel.py", line 1584, in basic_ack
    self._send_method((60, 80), args)
  File "/usr/local/lib/python2.7/site-packages/amqp/abstract_channel.py", line 56, in _send_method
    self.channel_id, method_sig, args, content,
  File "/usr/local/lib/python2.7/site-packages/amqp/method_framing.py", line 221, in write_method
    write_frame(1, channel, payload)
  File "/usr/local/lib/python2.7/site-packages/amqp/transport.py", line 182, in write_frame
    frame_type, channel, size, payload, 0xce,
  File "/usr/local/lib/python2.7/site-packages/eventlet/greenio/base.py", line 385, in sendall
    tail = self.send(data, flags)
  File "/usr/local/lib/python2.7/site-packages/eventlet/greenio/base.py", line 379, in send
    return self._send_loop(self.fd.send, data, flags)
  File "/usr/local/lib/python2.7/site-packages/eventlet/greenio/base.py", line 366, in _send_loop
    return send_method(data, *args)
error: [Errno 32] Broken pipe

添加设置和负载测试信息

我们用以下选项来启动芹菜

celery worker -A celerytasks.celery_worker_init -Q default -P gevent -c 1000 --loglevel=info

和芹菜使用Rabbitmq作为代理。

通过在上级配置中指定"numprocs=4",我们有4个芹菜工作进程。

我们使用jmeter来模拟web访问负载,Django应用程序将生成任务供Celery worker使用。这些任务基本上需要访问Mysql数据库来获取/更新一些数据。

从rabbitmq web管理页面,任务生成速度是50/秒,而消耗速度是20/秒。经过大约1分钟的负载测试,日志文件显示Rabbitmq和Celery之间的许多连接遇到了break - pipe错误

我们注意到这个问题也是由于高级长计数和高并发性的结合引起的。

我们将并发设置为500,预取设置为100,这意味着每个worker的最终预取为500*100= 50,000。我们有大约100k的任务堆积,由于这个配置,一个工人为自己保留了所有的任务,其他工人甚至没有使用,这个工人一直得到Broken pipe错误,从不承认任何任务,导致任务从未从队列中清除。

然后我们将预取更改为3并重新启动所有修复该问题的工作器,在将预取更改为较低的数字后,我们已经看到了0个断开管道错误的实例,因为我们以前经常看到它。

相关内容

  • 没有找到相关文章

最新更新