气流网络服务器在长时间没有问题后突然停止,"No response from gunicorn"



长期在计算机(CentOS 7(上运行airflow webserver -Ddeamon进程(v1.10.7(。突然发现Web服务器无法再访问,检查airflow-webserver.log看到。。。

[airflow@airflowetl airflow]$ cat airflow-webserver.log
2020-10-23 00:57:15,648 ERROR - No response from gunicorn master within 120 seconds
2020-10-23 00:57:15,649 ERROR - Shutting down webserver

(airflow-webserver.err中无任何注释(

[airflow@airflowetl airflow]$ cat airflow-webserver.err
/home/airflow/.local/lib/python3.6/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
""")

Web服务器部分的airflow.cfg值看起来像。。。

[webserver]
# The base url of your website as airflow cannot guess what domain or
# cname you are using. This is used in automated emails that
# airflow sends to point links to the right web server
#base_url = http://localhost:8080
base_url = http://airflowetl.co.local:8080
# The ip specified when starting the web server
web_server_host = 0.0.0.0
# The port on which to run the web server
web_server_port = 8080
# Paths to the SSL certificate and key for the web server. When both are
# provided SSL will be enabled. This does not change the web server port.
web_server_ssl_cert =
web_server_ssl_key =
# Number of seconds the webserver waits before killing gunicorn master that doesn't respond
web_server_master_timeout = 120
# Number of seconds the gunicorn webserver waits before timing out on a worker
#web_server_worker_timeout = 120
web_server_worker_timeout = 300
# Number of workers to refresh at a time. When set to 0, worker refresh is
# disabled. When nonzero, airflow periodically refreshes webserver workers by
# bringing up new ones and killing old ones.
worker_refresh_batch_size = 1
# Number of seconds to wait before refreshing a batch of workers.
worker_refresh_interval = 30
# Secret key used to run your flask app
secret_key = my_key
# Number of workers to run the Gunicorn web server
workers = 4
# The worker class gunicorn should use. Choices include
# sync (default), eventlet, gevent
worker_class = sync

最终,只是再次以守护进程的身份重新启动了进程(airflow webserver -D(我应该先删除旧的airflow-webserer.log.err文件吗?((,但不确定是什么原因导致了这种情况,因为在此之前,它运行了几个月都没有问题。

有谁能有更多的经验来解释这么长时间后可能发生的事情,以及我如何在未来防止它?运行dags或其他任何我应该检查的问题,这可能是网络服务器的临时意外关闭造成的吗?

我遇到了同样的问题,它是在我更改Web服务器中的以下两个配置参数时才开始的(非常罕见(。

worker_refresh_interval = 120
workers = 2

然而,我的参数设置也与您的不同,将在此处分享。

rbac = True
web_server_host = 0.0.0.0
web_server_port = 8080
web_server_master_timeout = 600
web_server_worker_timeout = 600
default_ui_timezone = Europe/Amsterdam
reload_on_plugin_change = True 

在比较了这两个之后,由于我更改的两个参数的设置都设置为默认值(与更改之前的设置相同(,所以这似乎是更多参数的组合。

相关内容

最新更新