I have a folder tree like this in my project
- project
- dags
- python_scripts
- libraries
- docker-compose.yml
- Dockerfile
- docker_resources
I create an airflow service in a docker container with:
dockerfile #Base image FROM puckel/docker-airflow:1.10.1 #Impersonate USER root #Los automatically thrown to the I/O strem and not buffered. ENV PYTHONUNBUFFERED 1 ENV AIRFLOW_HOME=/usr/local/airflow ENV PYTHONPATH "${PYTHONPATH}:/libraries" WORKDIR / #Add docker source files to the docker machine ADD ./docker_resources ./docker_resources #Install libraries and dependencies RUN apt-get update && apt-get install -y vim RUN pip install --user psycopg2-binary RUN pip install -r docker_resources/requirements.pip Docker-compose.yml version: '3' services: postgres: image: postgres:9.6 container_name: "postgres" environment: - POSTGRES_USER=airflow - POSTGRES_PASSWORD=airflow - POSTGRES_DB=airflow ports: - "5432:5432" webserver: build: . restart: always depends_on: - postgres volumes: - ./dags:/usr/local/airflow/dags - ./libraries:/libraries - ./python_scripts:/python_scripts ports: - "8080:8080" command: webserver healthcheck: test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid ]"] interval: 30s timeout: 30s retries: 3 scheduler: build: . restart: always depends_on: - postgres volumes: - ./dags:/usr/local/airflow/dags - ./logs:/usr/local/airflow/logs ports: - "8793:8793" command: scheduler healthcheck: test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-scheduler.pid ]"] interval: 30s timeout: 30s retries: 3 My dag folder has a tutorial with:
from datetime import timedelta # The DAG object; we'll need this to instantiate a DAG from airflow import DAG # Operators; we need this to operate! from airflow.operators.bash_operator import BashOperator from airflow.utils.dates import days_ago # These args will get passed on to each operator # You can override them on a per-task basis during operator initialization default_args = { 'owner': 'airflow', 'depends_on_past': False, 'start_date': days_ago(2), 'email': ['[email protected] '], 'email_on_failure': False, 'email_on_retry': False, 'retries': 0, 'retry_delay': timedelta(minutes=5), 'schedule_interval': '@daily', } dag = DAG( 'Tutorial', default_args=default_args, description='A simple tutorial DAG with production tables', catchup=False ) task_1 = BashOperator( task_id='my_task', bash_command='python /python_scripts/my_script.py', dag=dag, ) I tried changing bash_command='python /python_scripts/my_script.py', for:
bash_command='python python_scripts/my_script.py',bash_command='python ~/../python_scripts/my_script.py',bash_command='python ~/python_scripts/my_script.py',
And all of them fails. I tried them because BashOperator run the command in a tmp folder. If I get in the machine, and run ls command I find the file, under python_scripts. Even if I run python /python_scripts/my_script.py from /usr/local/airflowit works.
The error is always:
INFO - python: can't open file
I searched and people solved the issue with absolute paths, but I can't fix it.
Edit If in the dockerfile I add ADD ./ ./ below WORKDIR / and I delete these volumes from docker-compose.yml:
1. ./libraries:/libraries 2. ./python_scripts:/python_scripts The error is not file not found, is libraries not found. Import module error. Which is an improvement, but doesn't make sense cause PYTHONPATH is defined to have /libraries folder.
Makes more sense the volumes that the ADD statement, because I need to have the changes applied into the code instantly into the docker.
Edit 2: Volumes are mounted but no file is inside the container folders, this is why is not able to find the files. When run Add ./ ./ the folder has the files cause there add all the files inside the folder. Despite it doesn't work due libraries are not found neither.