当 gunicorn worker 遇到超时时,是否有可能获得堆栈跟踪?



具体来说,我正在运行一个Flask应用程序,其中包含gunicorn中的默认工人。我正在尝试弄清楚如何调试/跟踪当工人在服务请求时因超时而被杀死时会发生什么。有没有办法获取堆栈跟踪或分析请求以调试它?

老实说,这是一个非常好的问题,我不确定是否有可能达到相同的目标。但是当我开始挖掘时,我发现了很多有趣的线索

显示正在运行的 Python 应用程序的堆栈跟踪

从卡住的python进程中获取堆栈跟踪

https://github.com/khamidou/lptrace

https://gist.github.com/reywood/e221c4061bbf2eccea885c9b2e4ef496

所以首先我用下面的代码创建了一个简单的烧瓶应用程序

app.py

from flask import Flask
app = Flask(__name__)
import time
def a():
b()
def b():
c ()
def c():
i = 0
while i < 900:
time.sleep(1)
i += 1
@app.route('/', defaults={'path': ''})
@app.route('/<path:path>')
def catch_all(path):
a()
return 'You want path: %s' % path
if __name__ == '__main__':
app.run()

wsgi.py

from app import app
if __name__ == "__main__":
app.run()

现在运行如下所示的应用程序并执行curl localhost:8000/abc

$ gunicorn wsgi:app
[2019-08-01 08:19:06 +0000] [26825] [INFO] Starting gunicorn 19.9.0
[2019-08-01 08:19:06 +0000] [26825] [INFO] Listening at: http://127.0.0.1:8000 (26825)
[2019-08-01 08:19:06 +0000] [26825] [INFO] Using worker: sync
[2019-08-01 08:19:06 +0000] [26828] [INFO] Booting worker with pid: 26828
[2019-08-01 08:19:40 +0000] [26825] [CRITICAL] WORKER TIMEOUT (pid:26828)
[2019-08-01 08:19:40 +0000] [26828] [INFO] Worker exiting (pid: 26828)
[2019-08-01 08:19:40 +0000] [26832] [INFO] Booting worker with pid: 26832

现在我们需要的是一个钩子,可以在工作被杀死之前调用。gunicorn支持配置文件中的服务器事件

所以现在我们创建一个配置文件

gunicorn_config.py

timeout = 3
def worker_abort(worker):
pid = worker.pid
print("worker is being killed - {}".format(pid))

我们的产出现在是

$ gunicorn -c gunicorn_config.py wsgi:app
[2019-08-01 08:22:17 +0000] [26837] [INFO] Starting gunicorn 19.9.0
[2019-08-01 08:22:17 +0000] [26837] [INFO] Listening at: http://127.0.0.1:8000 (26837)
[2019-08-01 08:22:17 +0000] [26837] [INFO] Using worker: sync
[2019-08-01 08:22:17 +0000] [26840] [INFO] Booting worker with pid: 26840
[2019-08-01 08:22:22 +0000] [26837] [CRITICAL] WORKER TIMEOUT (pid:26840)
worker is being killed - 26840
[2019-08-01 08:22:22 +0000] [26840] [INFO] Worker exiting (pid: 26840)
[2019-08-01 08:22:22 +0000] [26844] [INFO] Booting worker with pid: 26844

这很好,现在我们需要混合我们以前关于pyrasite的知识,以获得堆栈。所以我们更新了配置文件,如下所示

gunicorn_config.py

timeout = 3
__code_dump_stack__ = """
import sys, traceback
for thread, frame in sys._current_frames().items():
print('Thread 0x%x' % thread)
traceback.print_stack(frame)
print()
"""
def dump_stack_for_process(pid):
import pyrasite
ipc = pyrasite.PyrasiteIPC(pid)
ipc.connect()
print(ipc.cmd(__code_dump_stack__))
ipc.close()
def worker_abort(worker):
pid = worker.pid
print("worker is being killed - {}".format(pid))
dump_stack_for_process(pid)

现在我们的输出是

$ [2019-08-01 08:25:29 +0000] [26848] [INFO] Starting gunicorn 19.9.0
[2019-08-01 08:25:29 +0000] [26848] [INFO] Listening at: http://127.0.0.1:8000 (26848)
[2019-08-01 08:25:29 +0000] [26848] [INFO] Using worker: sync
[2019-08-01 08:25:29 +0000] [26851] [INFO] Booting worker with pid: 26851
[2019-08-01 08:25:38 +0000] [26848] [CRITICAL] WORKER TIMEOUT (pid:26851)
worker is being killed - 26851
Thread 0x7ff0a7a4b700
File "/usr/lib/python3.5/threading.py", line 882, in _bootstrap
self._bootstrap_inner()
File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "<string>", line 72, in run
File "<string>", line 92, in on_command
File "<string>", line 6, in <module>
Thread 0x7ff0ac512700
File "/home/vagrant/.local/bin/gunicorn", line 11, in <module>
sys.exit(run())
File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/app/wsgiapp.py", line 61, in run
WSGIApplication("%(prog)s [OPTIONS] [APP_MODULE]").run()
File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/app/base.py", line 223, in run
super(Application, self).run()
File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/app/base.py", line 72, in run
Arbiter(self).run()
File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/arbiter.py", line 203, in run
self.manage_workers()
File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/arbiter.py", line 545, in manage_workers
self.spawn_workers()
File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/arbiter.py", line 616, in spawn_workers
self.spawn_worker()
File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/arbiter.py", line 583, in spawn_worker
worker.init_process()
File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/workers/base.py", line 134, in init_process
self.run()
File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 124, in run
self.run_for_one(timeout)
File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 68, in run_for_one
self.accept(listener)
File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 30, in accept
self.handle(listener, client, addr)
File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 135, in handle
self.handle_request(listener, req, client, addr)
File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 176, in handle_request
respiter = self.wsgi(environ, resp.start_response)
File "/home/vagrant/.local/lib/python3.5/site-packages/flask/app.py", line 2463, in __call__
return self.wsgi_app(environ, start_response)
File "/home/vagrant/.local/lib/python3.5/site-packages/flask/app.py", line 2446, in wsgi_app
response = self.full_dispatch_request()
File "/home/vagrant/.local/lib/python3.5/site-packages/flask/app.py", line 1949, in full_dispatch_request
rv = self.dispatch_request()
File "/home/vagrant/.local/lib/python3.5/site-packages/flask/app.py", line 1935, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/home/vagrant/remotedebug/app.py", line 20, in catch_all
a()
File "/home/vagrant/remotedebug/app.py", line 6, in a
b()
File "/home/vagrant/remotedebug/app.py", line 9, in b
c ()
File "/home/vagrant/remotedebug/app.py", line 14, in c
time.sleep(1)
File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/workers/base.py", line 195, in handle_abort
self.cfg.worker_abort(self)
File "gunicorn_config.py", line 23, in worker_abort
dump_stack_for_process(pid)
File "gunicorn_config.py", line 17, in dump_stack_for_process
print(ipc.cmd(__code_dump_stack__))
File "/home/vagrant/.local/lib/python3.5/site-packages/pyrasite/ipc.py", line 161, in cmd
return self.recv()
File "/home/vagrant/.local/lib/python3.5/site-packages/pyrasite/ipc.py", line 174, in recv
header_data = self.recv_bytes(4)
File "/home/vagrant/.local/lib/python3.5/site-packages/pyrasite/ipc.py", line 187, in recv_bytes
chunk = self.sock.recv(n - len(data))

[2019-08-01 08:25:38 +0000] [26851] [INFO] Worker exiting (pid: 26851)
[2019-08-01 08:25:38 +0000] [26862] [INFO] Booting worker with pid: 26862

堆栈跟踪很大,但它为我们提供了所需的内容

只是想在公认的答案中添加另一种方法,还有另一种方法可以使用 v3.3 中引入的内置 Python 3 模块获得类似的结果faulthandlerhttps://docs.python.org/dev/library/faulthandler.html

您可以将以下两行添加到接受的答案中提到的wsgi.py

import faulthandler
faulthandler.enable()

当发生超时时,以下输出将打印到 stderr:

[2020-12-24 13:38:32 +0000] [31304] [INFO] Starting gunicorn 20.0.4
[2020-12-24 13:38:32 +0000] [31304] [INFO] Listening at: http://0.0.0.0:8888 (31304)
[2020-12-24 13:38:32 +0000] [31304] [INFO] Using worker: sync
[2020-12-24 13:38:32 +0000] [31307] [INFO] Booting worker with pid: 31307
[2020-12-24 13:38:55 +0000] [31304] [CRITICAL] WORKER TIMEOUT (pid:31307)
Fatal Python error: Aborted
Current thread 0x00007f411d781700 (most recent call first):
File "/tmp/app.py", line 14 in c
File "/tmp/app.py", line 9 in b
File "/tmp/app.py", line 6 in a
File "/tmp/app.py", line 20 in catch_all
File "/tmp/venv/lib/python3.5/site-packages/flask/app.py", line 1936 in dispatch_request
File "/tmp/venv/lib/python3.5/site-packages/flask/app.py", line 1950 in full_dispatch_request
File "/tmp/venv/lib/python3.5/site-packages/flask/app.py", line 2447 in wsgi_app
File "/tmp/venv/lib/python3.5/site-packages/flask/app.py", line 2464 in __call__
File "/tmp/venv/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 175 in handle_request
File "/tmp/venv/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 134 in handle
File "/tmp/venv/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 29 in accept
File "/tmp/venv/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 67 in run_for_one
File "/tmp/venv/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 123 in run
File "/tmp/venv/lib/python3.5/site-packages/gunicorn/workers/base.py", line 140 in init_process
File "/tmp/venv/lib/python3.5/site-packages/gunicorn/arbiter.py", line 583 in spawn_worker
File "/tmp/venv/lib/python3.5/site-packages/gunicorn/arbiter.py", line 616 in spawn_workers
File "/tmp/venv/lib/python3.5/site-packages/gunicorn/arbiter.py", line 545 in manage_workers
File "/tmp/venv/lib/python3.5/site-packages/gunicorn/arbiter.py", line 202 in run
File "/tmp/venv/lib/python3.5/site-packages/gunicorn/app/base.py", line 72 in run
File "/tmp/venv/lib/python3.5/site-packages/gunicorn/app/base.py", line 228 in run
File "/tmp/venv/lib/python3.5/site-packages/gunicorn/app/wsgiapp.py", line 58 in run
File "/tmp/venv/bin/gunicorn", line 8 in <module>
[2020-12-24 13:38:55 +0000] [31307] [INFO] Worker exiting (pid: 31307)
[2020-12-24 13:38:55 +0000] [31503] [INFO] Booting worker with pid: 31503

另外,除了接受的答案之外,我发现import traceback; traceback.print_stack()对我有用,因为worker_abort在工作进程的上下文中运行。所以,只需要将gunicorn_config.py更改为这样的东西

import traceback
timeout = 3
def worker_abort(worker):
pid = worker.pid
print("worker is being killed - {}".format(pid))
traceback.print_stack()

相关内容

  • 没有找到相关文章

最新更新