气流DAG场设置方式不同



我想创建一个气流DAG,并想了解在field_1vsdefault_argsvsargs中应该设置哪些参数?

my_dag = DAG(
"my_dag",
"field_1"="xxx",
default_agrs=default_args,
**args
)

我检查了文档,我理解一些参数,如"owner";必须通过default_args设置,不能在field_1中设置。但看起来大多数参数都没有区别。我测试了一些字段,如"catchup";和";on_failure_callback",他们都在这三个地方工作。

所以我想知道在创建日期时设置参数的最佳实践是什么?

最佳实践是类似于气流教程

with DAG(
'tutorial',
# These args will get passed on to each operator
# You can override them on a per-task basis during operator initialization
default_args={
'depends_on_past': False,
'email': ['airflow@example.com'],
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=5),
# 'queue': 'bash_queue',
# 'pool': 'backfill',
# 'priority_weight': 10,
# 'end_date': datetime(2016, 1, 1),
# 'wait_for_downstream': False,
# 'sla': timedelta(hours=2),
# 'execution_timeout': timedelta(seconds=300),
# 'on_failure_callback': some_function,
# 'on_success_callback': some_other_function,
# 'on_retry_callback': another_function,
# 'sla_miss_callback': yet_another_function,
# 'trigger_rule': 'all_success'
},
description='A simple tutorial DAG',
schedule_interval=timedelta(days=1),
start_date=datetime(2021, 1, 1),
catchup=False,
tags=['example'],
) as dag:
...

参考:https://airflow.apache.org/docs/apache-airflow/stable/tutorial.html example-pipeline-definition

但是我用了类似的东西,这就足够了:

import pendulum
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'retries': 1,
'retry_delay': timedelta(minutes=1)
}
with DAG(
default_args=default_args,
dag_id='dag_etl',
catchup=False,
start_date=pendulum.datetime(year=2022, month=1, day=1, tz='America/Chicago'),
schedule_interval='0 8 * * *',  # https://crontab.guru/#0_8_*_*_*
description='DAG Extract Transform Load'
) as dag:
...

最新更新