我是Airflow的新手。我能够跟随一个视频,创建docker compose yml文件、Dockerfile和dag文件。我可以查看我的dag并运行它。在我的脚本中,我试图打开一个文本文件(.txt
(,但我得到了以下错误:FileNotFoundError: [Errno 2] No such file or directory
。
我把文本文件放在正确的位置。该脚本在我的本地python环境中运行。我不知道为什么当我在Airflow中运行时,它会显示为错误。
我的docker-compose.yml
、Dockerfile
和dag文件将显示在下面。如果有任何帮助,我将不胜感激!非常感谢。
docker compose.yml
version: '3.7'
services:
postgres:
image: postgres:9.6
environment:
- POSTGRES_USER=airflow
- POSTGRES_PASSWORD=airflow
- POSTGRES_DB=airflow
logging:
options:
max-size: 10m
max-file: "3"
webserver:
build: ./dockerfiles
restart: always
depends_on:
- postgres
environment:
- LOAD_EX=n
- EXECUTOR=Local
logging:
options:
max-size: 10m
max-file: "3"
volumes:
- ./dags:/usr/local/airflow/dags
# - ./plugins:/usr/local/airflow/plugins
ports:
- "8080:8080"
command: webserver
healthcheck:
test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid]"]
interval: 30s
timeout: 30s
retries: 3
Dockerfile
FROM puckel/docker-airflow:1.10.9
RUN pip install requests
RUN pip install bs4
RUN pip install pandas
RUN pip install xlrd
RUN pip install openpyxl
dag文件
try:
from datetime import timedelta
from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from datetime import datetime
import requests
from bs4 import BeautifulSoup
import pandas as pd
import smtplib
from email.message import EmailMessage
import os
import sys
import xlrd
from datetime import datetime
from openpyxl import load_workbook
print("All Dag modules are ok.........")
except Exception as e:
print("Error {} ".format(e))
def craigslist_search_function():
***PYTHON CODE***
with DAG(
dag_id="craigslist_dag",
schedule_interval="*/30 * * * *",
default_args={
"owner": "airflow",
"retries": 1,
"retry_delay": timedelta(minutes=5),
"start_date": datetime(2022, 1, 1),
},
catchup=False) as f:
craigslist_search_function = PythonOperator(
task_id="craigslist_search_function",
python_callable=craigslist_search_function)
我原以为它能毫无问题地运行脚本。该脚本在我的本地python环境中运行得非常好。我不知道为什么它在Airflow不起作用。
容器无法访问未装载在其上的文件。气流可以看到你的DAG,因为你把它安装在关键体积下。尝试将包含文本文件的目录添加到气流服务器中的卷:
volumes:
- ./dags:/usr/local/airflow/dags
- local_directory_path:container_directory_path
当您从DAG任务中读取此文件时,请确保您是从container_directory_path而不是本地路径中读取的。