完全加载的多租户Django应用程序,具有使用Daphne/Channels的1000个WebSockets,运行了几个月,突然租户都称其为支持线,该应用程序运行缓慢或完全挂起。将其缩小到WebSockets,因为HTTP REST API命中率快速且无错误。
并没有任何应用程序日志或操作系统日志表明存在问题,所以唯一要做的就是下面提到的异常。在这两天里,这种事一次又一次地发生在这里和那里。
我不希望有任何深入的调试帮助,只是一些关于可能性的即兴建议。
AWS Linux 1
Python 3.6.4
Elasticache Redis 5.0
channels==2.4.0
channels-redis==2.4.2
daphne==2.5.0
Django==2.2.13
uwsgi服务的拆分配置HTTP,daphne服务于asgi,Nginx
May 10 08:08:16 prod-b-web1: [pid 15053] [version 119.5.10.5086] [tenant_id -] [domain_name -] [pathname /opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/daphne/server.py] [lineno 288] [priority ERROR] [funcname application_checker] [request_path -] [request_method -] [request_data -] [request_user -] [request_stack -] Exception inside application: Lock is not acquired.
Traceback (most recent call last):
File "/opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/channels_redis/core.py", line 435, in receive
real_channel
File "/opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/channels_redis/core.py", line 484, in receive_single
await self.receive_clean_locks.acquire(channel_key)
File "/opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/channels_redis/core.py", line 152, in acquire
return await self.locks[channel].acquire()
File "/opt/python3.6/lib/python3.6/asyncio/locks.py", line 176, in acquire
yield from fut
concurrent.futures._base.CancelledError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/channels/sessions.py", line 183, in __call__
return await self.inner(receive, self.send)
File "/opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/channels/middleware.py", line 41, in coroutine_call
await inner_instance(receive, send)
File "/opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/channels/consumer.py", line 59, in __call__
[receive, self.channel_receive], self.dispatch
File "/opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/channels/utils.py", line 58, in await_many_dispatch
await task
File "/opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/channels_redis/core.py", line 447, in receive
self.receive_lock.release()
File "/opt/python3.6/lib/python3.6/asyncio/locks.py", line 201, in release
raise RuntimeError('Lock is not acquired.')
RuntimeError: Lock is not acquired.
首先,让我们看看RuntimeError: Lock is not acquired.
错误的来源。如回溯所示,文件/opt/python3.6/lib/python3.6/asyncio/locks.py
中的release()
方法定义如下:
def release(self):
"""Release a lock.
When the lock is locked, reset it to unlocked, and return.
If any other coroutines are blocked waiting for the lock to become
unlocked, allow exactly one of them to proceed.
When invoked on an unlocked lock, a RuntimeError is raised.
There is no return value.
"""
if self._locked:
self._locked = False
self._wake_up_first()
else:
raise RuntimeError('Lock is not acquired.')
基元锁是一种同步基元,在锁定时不属于特定线程。
当试图通过调用release()
方法来释放未锁定的锁时,将引发RuntimeError
,因为该方法只能在锁定状态下调用。在锁定状态下调用时,状态将更改为未锁定。
现在,对于同一文件中acquire()
方法中出现的上一个错误,acquire()
方法的定义如下:
async def acquire(self):
"""Acquire a lock.
This method blocks until the lock is unlocked, then sets it to
locked and returns True.
"""
if (not self._locked and (self._waiters is None or
all(w.cancelled() for w in self._waiters))):
self._locked = True
return True
if self._waiters is None:
self._waiters = collections.deque()
fut = self._loop.create_future()
self._waiters.append(fut)
# Finally block should be called before the CancelledError
# handling as we don't want CancelledError to call
# _wake_up_first() and attempt to wake up itself.
try:
try:
await fut
finally:
self._waiters.remove(fut)
except exceptions.CancelledError:
if not self._locked:
self._wake_up_first()
raise
self._locked = True
return True
因此,为了引发concurrent.futures._base.CancelledError
错误,必须是await fut
导致了该问题。
要修复它,您可以查看等待异步。Future引发concurrent.forets.base.CancelledError,而不是等待设置值/异常
基本上,您的代码中可能有一个未等待的awaitable,如果不等待它,您就从未将控制权交回事件循环或存储该awaitable。这会导致它立即被清理,完全取消它(以及它控制的所有awaitable(。
只需确保您等待代码中awaitables的结果,找到您遗漏的任何内容。