7

I have a folder tree like this in my project

  • project
    • dags
    • python_scripts
    • libraries
    • docker-compose.yml
    • Dockerfile
    • docker_resources

I create an airflow service in a docker container with:

dockerfile #Base image FROM puckel/docker-airflow:1.10.1 #Impersonate USER root #Los automatically thrown to the I/O strem and not buffered. ENV PYTHONUNBUFFERED 1 ENV AIRFLOW_HOME=/usr/local/airflow ENV PYTHONPATH "${PYTHONPATH}:/libraries" WORKDIR / #Add docker source files to the docker machine ADD ./docker_resources ./docker_resources #Install libraries and dependencies RUN apt-get update && apt-get install -y vim RUN pip install --user psycopg2-binary RUN pip install -r docker_resources/requirements.pip Docker-compose.yml version: '3' services: postgres: image: postgres:9.6 container_name: "postgres" environment: - POSTGRES_USER=airflow - POSTGRES_PASSWORD=airflow - POSTGRES_DB=airflow ports: - "5432:5432" webserver: build: . restart: always depends_on: - postgres volumes: - ./dags:/usr/local/airflow/dags - ./libraries:/libraries - ./python_scripts:/python_scripts ports: - "8080:8080" command: webserver healthcheck: test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid ]"] interval: 30s timeout: 30s retries: 3 scheduler: build: . restart: always depends_on: - postgres volumes: - ./dags:/usr/local/airflow/dags - ./logs:/usr/local/airflow/logs ports: - "8793:8793" command: scheduler healthcheck: test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-scheduler.pid ]"] interval: 30s timeout: 30s retries: 3 

My dag folder has a tutorial with:

from datetime import timedelta # The DAG object; we'll need this to instantiate a DAG from airflow import DAG # Operators; we need this to operate! from airflow.operators.bash_operator import BashOperator from airflow.utils.dates import days_ago # These args will get passed on to each operator # You can override them on a per-task basis during operator initialization default_args = { 'owner': 'airflow', 'depends_on_past': False, 'start_date': days_ago(2), 'email': ['[email protected] '], 'email_on_failure': False, 'email_on_retry': False, 'retries': 0, 'retry_delay': timedelta(minutes=5), 'schedule_interval': '@daily', } dag = DAG( 'Tutorial', default_args=default_args, description='A simple tutorial DAG with production tables', catchup=False ) task_1 = BashOperator( task_id='my_task', bash_command='python /python_scripts/my_script.py', dag=dag, ) 

I tried changing bash_command='python /python_scripts/my_script.py', for:

  • bash_command='python python_scripts/my_script.py',
  • bash_command='python ~/../python_scripts/my_script.py',
  • bash_command='python ~/python_scripts/my_script.py',

And all of them fails. I tried them because BashOperator run the command in a tmp folder. If I get in the machine, and run ls command I find the file, under python_scripts. Even if I run python /python_scripts/my_script.py from /usr/local/airflowit works.

The error is always:

INFO - python: can't open file

I searched and people solved the issue with absolute paths, but I can't fix it.

Edit If in the dockerfile I add ADD ./ ./ below WORKDIR / and I delete these volumes from docker-compose.yml:

 1. ./libraries:/libraries 2. ./python_scripts:/python_scripts 

The error is not file not found, is libraries not found. Import module error. Which is an improvement, but doesn't make sense cause PYTHONPATH is defined to have /libraries folder.

Makes more sense the volumes that the ADD statement, because I need to have the changes applied into the code instantly into the docker.

Edit 2: Volumes are mounted but no file is inside the container folders, this is why is not able to find the files. When run Add ./ ./ the folder has the files cause there add all the files inside the folder. Despite it doesn't work due libraries are not found neither.

2 Answers 2

2

Did you try

bash_command='python /usr/local/airflow/python_scripts/my_script.py' 

And you have to check if the folder have the good permissions (access and execute for your user)

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for your answer. I tried, and doesn't work. It makes sense, cause volumes for the python_scripts folder is not dependent from /usr/local/airflow. Now I am running a recursive permision change to test if its something related with permisions.
Just finished recursive permision change. And it doesnt works. I run 'chmod -R 700 /' inside docker machine.
0

Finally I solved the issue, I discard all previous work, and restart DOCKERFILE using an UBUNTU base image, and not puckel/docker-airflow image which is based in python:3.7-slim-buster.

I don't use any other user that its not root know.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.