redis.exceptions.ConnectionError大约在芹菜运行一天后发生



这是我的完整跟踪:

    Traceback (most recent call last):
  File "/home/server/backend/venv/lib/python3.4/site-packages/celery/app/trace.py", line 283, in trace_task
    uuid, retval, SUCCESS, request=task_request,
  File "/home/server/backend/venv/lib/python3.4/site-packages/celery/backends/base.py", line 256, in store_result
    request=request, **kwargs)
  File "/home/server/backend/venv/lib/python3.4/site-packages/celery/backends/base.py", line 490, in _store_result
    self.set(self.get_key_for_task(task_id), self.encode(meta))
  File "/home/server/backend/venv/lib/python3.4/site-packages/celery/backends/redis.py", line 160, in set
    return self.ensure(self._set, (key, value), **retry_policy)
  File "/home/server/backend/venv/lib/python3.4/site-packages/celery/backends/redis.py", line 149, in ensure
    **retry_policy
  File "/home/server/backend/venv/lib/python3.4/site-packages/kombu/utils/__init__.py", line 243, in retry_over_time
    return fun(*args, **kwargs)
  File "/home/server/backend/venv/lib/python3.4/site-packages/celery/backends/redis.py", line 169, in _set
    pipe.execute()
  File "/home/server/backend/venv/lib/python3.4/site-packages/redis/client.py", line 2593, in execute
    return execute(conn, stack, raise_on_error)
  File "/home/server/backend/venv/lib/python3.4/site-packages/redis/client.py", line 2447, in _execute_transaction
    connection.send_packed_command(all_cmds)
  File "/home/server/backend/venv/lib/python3.4/site-packages/redis/connection.py", line 532, in send_packed_command
    self.connect()
  File "/home/pserver/backend/venv/lib/python3.4/site-packages/redis/connection.py", line 436, in connect
    raise ConnectionError(self._error_message(e))
redis.exceptions.ConnectionError: Error 0 connecting to localhost:6379. Error.
[2016-09-21 10:47:18,814: WARNING/Worker-747] Data collector is not contactable. This can be because of a network issue or because of the data collector being restarted. In the event that contact cannot be made after a period of time then please report this problem to New Relic support for further investigation. The error raised was ConnectionError(ProtocolError('Connection aborted.', BlockingIOError(11, 'Resource temporarily unavailable')),).

我确实搜索了ConnectionError,但我的没有匹配问题。

我的平台是ubuntu 14.04。这是我的redis配置的一部分。(如果你需要整个redis.conf文件,我可以分享。顺便说一句,所有参数都在LIMITS部分关闭。(

# By default Redis listens for connections from all the network interfaces
# available on the server. It is possible to listen to just one or multiple
# interfaces using the "bind" configuration directive, followed by one or
# more IP addresses.
#
# Examples:
#
# bind 192.168.1.100 10.0.0.1
bind 127.0.0.1
# Specify the path for the unix socket that will be used to listen for
# incoming connections. There is no default, so Redis will not listen
# on a unix socket when not specified.
#
# unixsocket /var/run/redis/redis.sock
# unixsocketperm 755
# Close the connection after a client is idle for N seconds (0 to disable)
timeout 0
# TCP keepalive.
#
# If non-zero, use SO_KEEPALIVE to send TCP ACKs to clients in absence
# of communication. This is useful for two reasons:
#
# 1) Detect dead peers.
# 2) Take the connection alive from the point of view of network
#    equipment in the middle.
#
# On Linux, the specified value (in seconds) is the period used to send ACKs.
# Note that to close the connection the double of the time is needed.
# On other kernels the period depends on the kernel configuration.
#
# A reasonable value for this option is 60 seconds.
tcp-keepalive 60

这是我的迷你redis包装:

import redis
from django.conf import settings

REDIS_POOL = redis.ConnectionPool(host=settings.REDIS_HOST, port=settings.REDIS_PORT)

def get_redis_server():
    return redis.Redis(connection_pool=REDIS_POOL)

这就是我使用它的方式:

from redis_wrapper import get_redis_server
# view and task are working in different, indipendent processes
def sample_view(request):
    rs = get_redis_server()
    # some get-set stuff with redis

@shared_task
def sample_celery_task():
    rs = get_redis_server()
    # some get-set stuff with redis

软件包版本:

celery==3.1.18
django-celery==3.1.16
kombu==3.0.26
redis==2.10.3

所以问题是;此连接错误发生在启动芹菜工作程序一段时间后。在第一次出现这个错误之后,所有的任务都以这个错误结束,直到我重新启动所有的芹菜工人。(有趣的是,芹菜花在那个有问题的时期也会失败(

我怀疑我的redis连接池使用方法,或者redis配置,或者不太可能是网络问题。你知道原因吗?我做错了什么?

(附言:当我今天看到这个错误时,我会添加redis-cli-info结果(

更新:

我通过在workerstarter命令中添加--maxtasksperchild参数暂时解决了这个问题。我把它设置为200。当然,这不是解决这个问题的正确方法,它只是治标。它基本上定期刷新worker实例(关闭旧进程,当旧进程达到200任务时创建新进程(,并刷新我的全局redi池和连接所以我认为我应该关注全局redis连接池的使用方式,我仍在等待新的想法和评论

抱歉我英语不好,提前谢谢。

您在redis中启用rdb后台保存方法了吗
如果是,则检查CCD_ 2中的CCD_ 1文件的大小
有时文件的大小会增加并填充root目录,redis实例无法再保存到该文件中。

您可以通过发出
来停止后台保存过程config set stop-writes-on-bgsave-error no
redis-cli 上的命令

最新更新