如何同时运行气流调度程序和Web服务器



我正在尝试使用postgresdb(localhost(在本地运行Airflow 2。我可以让Web服务器运行,但是我不能让调度程序与Web服务器同时运行。运行airflow scheduler:

____    |__( )_________  __/__  /________      __
____  /| |_  /__  ___/_  /_ __  /_  __ _ | /| / /
___  ___ |  / _  /   _  __/ _  / / /_/ /_ |/ |/ /
_/_/  |_/_/  /_/    /_/    /_/  ____/____/|__/
[2022-08-27 16:10:50,543] {scheduler_job.py:709} INFO - Starting the scheduler
[2022-08-27 16:10:50,544] {scheduler_job.py:714} INFO - Processing each file at most -1 times
[2022-08-27 16:10:50 -0500] [48113] [INFO] Starting gunicorn 20.1.0
[2022-08-27 16:10:50,546] {executor_loader.py:105} INFO - Loaded executor: SequentialExecutor
[2022-08-27 16:10:50 -0500] [48113] [INFO] Listening at: http://[::]:8793 (48113)
[2022-08-27 16:10:50 -0500] [48113] [INFO] Using worker: sync
[2022-08-27 16:10:50,550] {manager.py:160} INFO - Launched DagFileProcessorManager with pid: 48114
[2022-08-27 16:10:50,552] {scheduler_job.py:1231} INFO - Resetting orphaned tasks for active dag runs
[2022-08-27 16:10:50 -0500] [48115] [INFO] Booting worker with pid: 48115
[2022-08-27 16:10:50,556] {settings.py:55} INFO - Configured default timezone Timezone('UTC')
[2022-08-27T16:10:50.567-0500] {manager.py:406} WARNING - Because we cannot use more than 1 thread (parsing_processes = 2) when using sqlite. So we set parallelism to 1.
[2022-08-27 16:10:50 -0500] [48116] [INFO] Booting worker with pid: 48116
[2022-08-27 16:15:50,663] {scheduler_job.py:1231} INFO - Resetting orphaned tasks for active dag runs
[2022-08-27 16:20:50,749] {scheduler_job.py:1231} INFO - Resetting orphaned tasks for active dag runs
[2022-08-27 16:25:50,834] {scheduler_job.py:1231} INFO - Resetting orphaned tasks for active dag runs
[2022-08-27 16:30:50,911] {scheduler_job.py:1231} INFO - Resetting orphaned tasks for active dag runs
[2022-08-27 16:35:50,991] {scheduler_job.py:1231} INFO - Resetting orphaned tasks for active dag runs
[2022-08-27 16:40:51,064] {scheduler_job.py:1231} INFO - Resetting orphaned tasks for active dag runs

我可以使用airflow standalone运行数据库、调度程序和Web服务器,但我的理解是,这种做法实际上只是用于开发,而不是用于生产,所以我想避免这种情况。初始化数据库时,我没有任何问题。然而,当我转到Web服务器UI时,它将发出没有调度程序正在运行的信号。然后,我需要终止UI才能从CLI运行airflow scheduler。现在,根据上面的代码,没有一点可以在不杀死调度器的情况下将控制从调度器返回到我的终端,这意味着我无法返回到Web服务器UI。然后,我如何在不为另一个进程杀死任何一个进程的情况下同时运行调度程序和Web服务器?

Airflow有多个核心组件,如wbeserverscheduler,这些组件在单独的进程中运行,当您运行airflow standalone时,Airflow在3个进程中运行webserverschedulertriggerer(一个支持可延迟运算符的进程((查看源代码(。

如果你想手动运行它们,你应该在一个单独的终端中运行每个服务,或者在后台运行它们:

airflow scheduler &
airflow webserver &

最新更新